- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.3k
 
Implemented a way for Google Model to analyze JSON file links using DocumentUrl #3269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implemented a way for Google Model to analyze JSON file links using DocumentUrl #3269
Conversation
| or media_type in ('application/x-yaml', 'application/yaml') | ||
| ) | ||
| @staticmethod | ||
| def _inline_text_file_part(text: str, *, media_type: str, identifier: str) -> ChatCompletionContentPartTextParam: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my latest commit, I have removed this method because google model doesn't need it.
| def __init__(self, num): | ||
| self.num = num | ||
| def multiply(self): | ||
| return self.num * 3 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unrelated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| if item.vendor_metadata: | ||
| part_dict['video_metadata'] = cast(VideoMetadataDict, item.vendor_metadata) | ||
| content.append(part_dict) | ||
| if self._is_text_like_media_type(item.media_type): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need tests for this behavior like we have in test_openai.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the function test_google_model_json_document_url_input in test_google.py. That should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use the same _inline_text_file_part we use in OpenAI, so that the text is properly formatted as representing a file.
I suggest moving it to a method on BinaryContent that returns the text with the fencing.
_is_text_like_media_type can become a method on BinaryContent and DocumentUrl as well.
When we check isinstance(item, DocumentUrl) and then do downloaded_text = await download_item(item, data_format='text'), we can create a BinaryContent from the result of download_item, and the call the new inline_text_file method on it.
| if item.vendor_metadata: | ||
| part_dict['video_metadata'] = cast(VideoMetadataDict, item.vendor_metadata) | ||
| content.append(part_dict) | ||
| if self._is_text_like_media_type(item.media_type): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use the same _inline_text_file_part we use in OpenAI, so that the text is properly formatted as representing a file.
I suggest moving it to a method on BinaryContent that returns the text with the fencing.
_is_text_like_media_type can become a method on BinaryContent and DocumentUrl as well.
When we check isinstance(item, DocumentUrl) and then do downloaded_text = await download_item(item, data_format='text'), we can create a BinaryContent from the result of download_item, and the call the new inline_text_file method on it.
…t in DocumentUrl and BinaryContent
| 
           @DouweM As suggested, I have now used   | 
    
| 
           @Kamal-Moha Can you also please implement the other suggestions in https://github.com/pydantic/pydantic-ai/pull/3269/files#r2471009741, so we reduce duplication between the Google and OpenAI models? Also don't forget to run   | 
    
| 
           @DouweM I have already implemented the suggestion in https://github.com/pydantic/pydantic-ai/pull/3269/files#r2471009741 which is about having the method  I have also formatted the code to satisfy the linter.  | 
    
| 
           @Kamal-Moha I suggested creating new methods on  Also note that tests are failing.  | 
    
… duplication. Create a new cassette in test_google
| 
           @Kamal-Moha Please have a look at the failing CI jobs  | 
    
This PR solves the issue that google model fails in analyzing/interpreting JSON file links when using
DocumentUrl.For example, the above code leads to a 500 Server Error from google when I enter a JSON link.
This PR solves this issue. I have taken inspiration from #2851 when implementing this.