- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.7k
 
          [Vertex AI] Add responseModalities to GenerationConfig
          #14658
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
          
 Generated by 🚫 Danger  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the responseModalities parameter to the GenerationConfig, enabling support for multimodal responses (text and images) from the Gemini 2.0 Flash model. The changes include adding the ResponseModality type, updating the GenerationConfig struct and initializer, and adding a test case for image generation. Overall, the code is well-structured and addresses the intended functionality. However, there are a few areas that could be improved for clarity and maintainability.
Summary of Findings
- Documentation for Public Preview Features: The documentation and comments clearly indicate the Public Preview status of the image generation feature. This is important for users to understand the limitations and potential changes associated with experimental features.
 - Error Handling in Example Usage: The example usage includes 
fatalErrorcalls for unexpected scenarios. While this is acceptable for example code, consider using more robust error handling mechanisms in production code. - Naming Consistency: Consider renaming 
rawValueinResponseModalitytoprotoEnumValueto align with the naming convention used in otherDecodableProtoEnumimplementations. 
Merge Readiness
The pull request is almost ready for merging. The added functionality is well-implemented and the code is generally clear. However, addressing the naming consistency issue and considering more robust error handling in the example usage would further improve the quality of the code. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. I would recommend that the naming consistency issue be addressed before merging.
        
          
                FirebaseVertexAI/Tests/TestApp/Tests/Integration/GenerateContentIntegrationTests.swift
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                FirebaseVertexAI/Tests/TestApp/Tests/Integration/GenerateContentIntegrationTests.swift
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the responseModalities parameter to the GenerationConfig, enabling support for multimodal responses (text and images) using the Gemini 2.0 Flash model. The changes include updates to the GenerationConfig struct, the addition of a ResponseModality enum, and a new integration test to verify image generation. Overall, the code is well-structured and addresses the intended functionality.
Summary of Findings
- Copyright Year: The copyright year in 
ResponseModality.swiftis set to 2025. Please confirm if this is intentional or if it should be updated to the current year. - Test Coverage: The new integration test 
generateImageincludes checks for image dimensions. Consider adding more comprehensive tests to cover different scenarios and edge cases, such as handling errors or invalid image data. - Documentation Completeness: The documentation for 
ResponseModalityis good, but consider adding a note about the potential cost implications of requesting image generation, as it might consume more resources than text-only responses. 
Merge Readiness
The pull request is well-structured and introduces a valuable feature. However, before merging, please address the copyright year discrepancy and consider enhancing the test coverage as suggested. I am unable to directly approve this pull request, and other reviewers should review and approve this code before merging.
Added support for setting response modalities (text or images) in
GenerationConfigto enable image generation (public experimental) using Gemini 2.0 Flash (gemini-2.0-flash-exponly).Usage Example