feat(ai): Redact base64 data URLs in image_url content blocks#5953
feat(ai): Redact base64 data URLs in image_url content blocks#5953ericapisani merged 1 commit intomasterfrom
Conversation
Extend redact_blob_message_parts to detect and redact base64 data URLs inside image_url content blocks, in addition to the existing blob type handling. Move DATA_URL_BASE64_REGEX to sentry_sdk/ai/consts.py since it is now shared across AI monitoring code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Semver Impact of This PR🟡 Minor (new features) 📋 Changelog PreviewThis is how your changes will appear in the changelog. New Features ✨
Internal Changes 🔧
🤖 This preview updates automatically when you update the PR. |
|
bugbot run |
Codecov Results 📊✅ 13 passed | Total: 13 | Pass Rate: 100% | Execution Time: 9.77s All tests are passing successfully. ❌ Patch coverage is 25.00%. Project has 14842 uncovered lines. Files with missing lines (3)
Generated by Codecov Action |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Crashes when
image_urlvalue is a string shorthand- Added type checking to handle both string shorthand and dict formats for image_url in _is_image_type_with_blob_content and redact_blob_message_parts functions, preventing AttributeError when calling .get() on a string.
Or push these changes by commenting:
@cursor push cde852a515
Preview (cde852a515)
diff --git a/sentry_sdk/ai/utils.py b/sentry_sdk/ai/utils.py
--- a/sentry_sdk/ai/utils.py
+++ b/sentry_sdk/ai/utils.py
@@ -597,7 +597,14 @@
if item.get("type") != "image_url":
return False
- image_url = item.get("image_url", {}).get("url", "")
+ image_url_data = item.get("image_url")
+ if isinstance(image_url_data, str):
+ image_url = image_url_data
+ elif isinstance(image_url_data, dict):
+ image_url = image_url_data.get("url", "")
+ else:
+ return False
+
data_url_match = DATA_URL_BASE64_REGEX.match(image_url)
return bool(data_url_match)
@@ -682,7 +689,11 @@
if item.get("type") == "blob":
item["content"] = BLOB_DATA_SUBSTITUTE
elif _is_image_type_with_blob_content(item):
- item["image_url"]["url"] = BLOB_DATA_SUBSTITUTE
+ image_url_data = item.get("image_url")
+ if isinstance(image_url_data, str):
+ item["image_url"] = BLOB_DATA_SUBSTITUTE
+ elif isinstance(image_url_data, dict):
+ item["image_url"]["url"] = BLOB_DATA_SUBSTITUTE
return messages_copy
diff --git a/tests/test_ai_monitoring.py b/tests/test_ai_monitoring.py
--- a/tests/test_ai_monitoring.py
+++ b/tests/test_ai_monitoring.py
@@ -845,6 +845,38 @@
assert result[0]["content"][1]["type"] == "image_url"
assert result[0]["content"][1]["image_url"]["url"] == BLOB_DATA_SUBSTITUTE
+ def test_redacts_image_url_string_shorthand_with_blob(self):
+ """Test redacting image_url using string shorthand format with base64 data"""
+ messages = [
+ {
+ "role": "user",
+ "content": [
+ {
+ "text": "How many ponies do you see in the image?",
+ "type": "text",
+ },
+ {
+ "type": "image_url",
+ "image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRg==",
+ },
+ ],
+ }
+ ]
+
+ original_blob_content = messages[0]["content"][1]
+
+ result = redact_blob_message_parts(messages)
+
+ assert messages[0]["content"][1] == original_blob_content
+
+ assert (
+ result[0]["content"][0]["text"]
+ == "How many ponies do you see in the image?"
+ )
+ assert result[0]["content"][0]["type"] == "text"
+ assert result[0]["content"][1]["type"] == "image_url"
+ assert result[0]["content"][1]["image_url"] == BLOB_DATA_SUBSTITUTE
+
def test_does_not_redact_image_url_content_with_non_blobs(self):
messages = [
{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 51ff087. Configure here.


Extend
redact_blob_message_partsto detect and redact base64 data URLs insideimage_urlcontent blocks (e.g.data:image/jpeg;base64,...), in addition to the existingblobtype handling.Some AI integrations send image content as
image_urlitems with inline base64 data URLs rather than theblobcontent type. Without this change, those base64 payloads are sent as span data, which inflates event size and can leak image content.Also moves
DATA_URL_BASE64_REGEXfromsentry_sdk/integrations/pydantic_ai/consts.pytosentry_sdk/ai/consts.pysince it's now shared across AI monitoring code beyond pydantic_ai.Fixes PY-2280 and #5948