Implemented a way for Google Model to analyze JSON file links using DocumentUrl #3269

Kamal-Moha · 2025-10-27T22:42:10Z

This PR solves the issue that google model fails in analyzing/interpreting JSON file links when using DocumentUrl.

from pydantic_ai import Agent, DocumentUrl
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider

provider = GoogleProvider(api_key=os.getenv('GOOGLE_API_KEY'))
model = GoogleModel('gemini-2.5-pro', provider=provider)
agent = Agent(model=model)

result = agent.run_sync(
    [
        'What is the main content of this document?',
        DocumentUrl(url='https://storage.googleapis.com/bhadala-test/transcript_playground-22xj-NR8y_20250925_091155.json'),
    ]
)
print(result.output)
#> This document is the technical report introducing Gemini 1.5, Google's latest large language model...

For example, the above code leads to a 500 Server Error from google when I enter a JSON link.

This PR solves this issue. I have taken inspiration from #2851 when implementing this.

…ing DocumentUrl

DouweM · 2025-10-28T14:27:03Z

pydantic_ai_slim/pydantic_ai/models/google.py

+            or media_type in ('application/x-yaml', 'application/yaml')
+        )
+    @staticmethod
+    def _inline_text_file_part(text: str, *, media_type: str, identifier: str) -> ChatCompletionContentPartTextParam:


We should use this method

In my latest commit, I have removed this method because google model doesn't need it.

DouweM · 2025-10-28T14:27:10Z

pydantic_ai_slim/pydantic_ai/models/google.py

+  def __init__(self, num):
+    self.num = num
+  def multiply(self):
+    return self.num * 3


This seems unrelated

DouweM · 2025-10-28T14:27:29Z

pydantic_ai_slim/pydantic_ai/models/google.py

-                    if item.vendor_metadata:
-                        part_dict['video_metadata'] = cast(VideoMetadataDict, item.vendor_metadata)
-                    content.append(part_dict)
+                    if self._is_text_like_media_type(item.media_type):


We need tests for this behavior like we have in test_openai.py

Check the function test_google_model_json_document_url_input in test_google.py. That should work

We need to use the same _inline_text_file_part we use in OpenAI, so that the text is properly formatted as representing a file.

I suggest moving it to a method on BinaryContent that returns the text with the fencing.

_is_text_like_media_type can become a method on BinaryContent and DocumentUrl as well.

When we check isinstance(item, DocumentUrl) and then do downloaded_text = await download_item(item, data_format='text'), we can create a BinaryContent from the result of download_item, and the call the new inline_text_file method on it.

DouweM · 2025-10-28T20:53:52Z

pydantic_ai_slim/pydantic_ai/models/google.py

-                    if item.vendor_metadata:
-                        part_dict['video_metadata'] = cast(VideoMetadataDict, item.vendor_metadata)
-                    content.append(part_dict)
+                    if self._is_text_like_media_type(item.media_type):


We need to use the same _inline_text_file_part we use in OpenAI, so that the text is properly formatted as representing a file.

I suggest moving it to a method on BinaryContent that returns the text with the fencing.

_is_text_like_media_type can become a method on BinaryContent and DocumentUrl as well.

When we check isinstance(item, DocumentUrl) and then do downloaded_text = await download_item(item, data_format='text'), we can create a BinaryContent from the result of download_item, and the call the new inline_text_file method on it.

…t in DocumentUrl and BinaryContent

Kamal-Moha · 2025-10-28T21:24:56Z

@DouweM As suggested, I have now used _is_text_like_media_type and _inline_text_file_part in DocumentUrl and BinaryContent. Check now

DouweM · 2025-10-29T20:15:57Z

@Kamal-Moha Can you also please implement the other suggestions in https://github.com/pydantic/pydantic-ai/pull/3269/files#r2471009741, so we reduce duplication between the Google and OpenAI models?

Also don't forget to run make format to satisfy the linter!

Kamal-Moha · 2025-10-30T17:07:06Z

@DouweM I have already implemented the suggestion in https://github.com/pydantic/pydantic-ai/pull/3269/files#r2471009741 which is about having the method _is_text_like_media_type in BinaryContent and DocumentUrl.I have already done that, please double check. If not satisfied, please clarify what you're looking for.

I have also formatted the code to satisfy the linter.

DouweM · 2025-10-30T19:02:43Z

@Kamal-Moha I suggested creating new methods on BinaryContent and DocumentUrl, so that we reduce the duplication between openai.py and google.py and have the "inline text fencing" logic in just one place. Can you please do that?

Also note that tests are failing.

… duplication. Create a new cassette in test_google

DouweM · 2025-11-03T20:49:07Z

@Kamal-Moha Please have a look at the failing CI jobs

Implemented a way for Google Model to analyze JSON file links when us…

e0adb97

…ing DocumentUrl

DouweM requested changes Oct 28, 2025

View reviewed changes

DouweM self-assigned this Oct 28, 2025

DouweM added the awaiting author revision label Oct 28, 2025

Modified JSON DocumentUrl when using a google model and included test

c319843

DouweM requested changes Oct 28, 2025

View reviewed changes

Modified google.py and started using the method _inline_text_file_par…

e252f8c

…t in DocumentUrl and BinaryContent

Formatted to fix the linter

76acdc4

Kamal-Moha and others added 2 commits November 1, 2025 10:11

Created the static methods in BinaryContent and DocumentUrl to reduce…

8ec0aaa

… duplication. Create a new cassette in test_google

Merge branch 'main' into implemented_json_file_support_for_google

24828d7

Implemented a way for Google Model to analyze JSON file links using DocumentUrl #3269

Are you sure you want to change the base?

Implemented a way for Google Model to analyze JSON file links using DocumentUrl #3269

Conversation

Kamal-Moha commented Oct 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kamal-Moha commented Oct 28, 2025

Uh oh!

DouweM commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kamal-Moha commented Oct 30, 2025

Uh oh!

DouweM commented Oct 30, 2025

Uh oh!

DouweM commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DouweM commented Oct 29, 2025 •

edited

Loading