Skip to content

feat(ai): Redact base64 data URLs in image_url content blocks#5953

Merged
ericapisani merged 1 commit intomasterfrom
ep/py-2280-1mo
Apr 8, 2026
Merged

feat(ai): Redact base64 data URLs in image_url content blocks#5953
ericapisani merged 1 commit intomasterfrom
ep/py-2280-1mo

Conversation

@ericapisani
Copy link
Copy Markdown
Member

@ericapisani ericapisani commented Apr 7, 2026

Extend redact_blob_message_parts to detect and redact base64 data URLs inside image_url content blocks (e.g. data:image/jpeg;base64,...), in addition to the existing blob type handling.

Some AI integrations send image content as image_url items with inline base64 data URLs rather than the blob content type. Without this change, those base64 payloads are sent as span data, which inflates event size and can leak image content.

Also moves DATA_URL_BASE64_REGEX from sentry_sdk/integrations/pydantic_ai/consts.py to sentry_sdk/ai/consts.py since it's now shared across AI monitoring code beyond pydantic_ai.

Fixes PY-2280 and #5948

Extend redact_blob_message_parts to detect and redact base64 data URLs
inside image_url content blocks, in addition to the existing blob type
handling. Move DATA_URL_BASE64_REGEX to sentry_sdk/ai/consts.py since
it is now shared across AI monitoring code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code bot commented Apr 7, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Semver Impact of This PR

🟡 Minor (new features)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (ai) Redact base64 data URLs in image_url content blocks by ericapisani in #5953
  • (integrations) Instrument pyreqwest tracing by servusdei2018 in #5682

Internal Changes 🔧

  • (openai) Split token counting by API for easier deprecation by ericapisani in #5930
  • (opentelemetry) Ignore mypy error by alexander-alderman-webb in #5927
  • Fix license metadata in setup.py by sl0thentr0py in #5934
  • Update validate-pr workflow by stephanie-anderson in #5931

🤖 This preview updates automatically when you update the PR.

@ericapisani
Copy link
Copy Markdown
Member Author

bugbot run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Codecov Results 📊

13 passed | Total: 13 | Pass Rate: 100% | Execution Time: 9.77s

All tests are passing successfully.

❌ Patch coverage is 25.00%. Project has 14842 uncovered lines.

Files with missing lines (3)
File Patch % Lines
utils.py 15.25% ⚠️ 239 Missing
utils.py 0.00% ⚠️ 27 Missing
consts.py 0.00% ⚠️ 1 Missing

Generated by Codecov Action

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Crashes when image_url value is a string shorthand
    • Added type checking to handle both string shorthand and dict formats for image_url in _is_image_type_with_blob_content and redact_blob_message_parts functions, preventing AttributeError when calling .get() on a string.

Create PR

Or push these changes by commenting:

@cursor push cde852a515
Preview (cde852a515)
diff --git a/sentry_sdk/ai/utils.py b/sentry_sdk/ai/utils.py
--- a/sentry_sdk/ai/utils.py
+++ b/sentry_sdk/ai/utils.py
@@ -597,7 +597,14 @@
     if item.get("type") != "image_url":
         return False
 
-    image_url = item.get("image_url", {}).get("url", "")
+    image_url_data = item.get("image_url")
+    if isinstance(image_url_data, str):
+        image_url = image_url_data
+    elif isinstance(image_url_data, dict):
+        image_url = image_url_data.get("url", "")
+    else:
+        return False
+
     data_url_match = DATA_URL_BASE64_REGEX.match(image_url)
 
     return bool(data_url_match)
@@ -682,7 +689,11 @@
                     if item.get("type") == "blob":
                         item["content"] = BLOB_DATA_SUBSTITUTE
                     elif _is_image_type_with_blob_content(item):
-                        item["image_url"]["url"] = BLOB_DATA_SUBSTITUTE
+                        image_url_data = item.get("image_url")
+                        if isinstance(image_url_data, str):
+                            item["image_url"] = BLOB_DATA_SUBSTITUTE
+                        elif isinstance(image_url_data, dict):
+                            item["image_url"]["url"] = BLOB_DATA_SUBSTITUTE
 
     return messages_copy
 

diff --git a/tests/test_ai_monitoring.py b/tests/test_ai_monitoring.py
--- a/tests/test_ai_monitoring.py
+++ b/tests/test_ai_monitoring.py
@@ -845,6 +845,38 @@
         assert result[0]["content"][1]["type"] == "image_url"
         assert result[0]["content"][1]["image_url"]["url"] == BLOB_DATA_SUBSTITUTE
 
+    def test_redacts_image_url_string_shorthand_with_blob(self):
+        """Test redacting image_url using string shorthand format with base64 data"""
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "text": "How many ponies do you see in the image?",
+                        "type": "text",
+                    },
+                    {
+                        "type": "image_url",
+                        "image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRg==",
+                    },
+                ],
+            }
+        ]
+
+        original_blob_content = messages[0]["content"][1]
+
+        result = redact_blob_message_parts(messages)
+
+        assert messages[0]["content"][1] == original_blob_content
+
+        assert (
+            result[0]["content"][0]["text"]
+            == "How many ponies do you see in the image?"
+        )
+        assert result[0]["content"][0]["type"] == "text"
+        assert result[0]["content"][1]["type"] == "image_url"
+        assert result[0]["content"][1]["image_url"] == BLOB_DATA_SUBSTITUTE
+
     def test_does_not_redact_image_url_content_with_non_blobs(self):
         messages = [
             {

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 51ff087. Configure here.

@ericapisani ericapisani marked this pull request as ready for review April 7, 2026 19:44
@ericapisani ericapisani requested a review from a team as a code owner April 7, 2026 19:44
@ericapisani ericapisani merged commit 9c360eb into master Apr 8, 2026
161 checks passed
@ericapisani ericapisani deleted the ep/py-2280-1mo branch April 8, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants