Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enhance gemini models #11497

Merged
merged 11 commits into from
Dec 17, 2024
Merged

feat: enhance gemini models #11497

merged 11 commits into from
Dec 17, 2024

Conversation

hjlarry
Copy link
Contributor

@hjlarry hjlarry commented Dec 9, 2024

Summary

Tip

Close issue syntax: Fixes #<issue number> or Resolves #<issue number>, see documentation for more details.

  1. all the google models support video file. can close LLM video understanding #10720 . maybe resolve Concurrent API requests to Gemini vision model cause non-responsive behavior and block all tasks forever #9273
  2. use google's upload file API instead of base64 string. Now we can cache the uploaded file instead of transfer base64 string each talk. Google support 1 hour video, transfer file each time is so terriable.
  3. "gemini-pro-vision" was deprecated, remove this part of logic.
  4. make the DocumentPromptMessageContent ImagePromptMessageContent VideoPromptMessageContent and AudioPromptMessageContent unified, both include format and contains mime_type in their data.
  5. please help test pdf of claude and audio of openai

Screenshots

52f3a45d3c3cfb71be7e93fe16f340b

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. ⚙️ feat:model-runtime 💪 enhancement New feature or request labels Dec 9, 2024
@hjlarry hjlarry marked this pull request as draft December 9, 2024 10:19
@hjlarry hjlarry marked this pull request as ready for review December 10, 2024 03:33
@crazywoola crazywoola requested review from Yeuoly and laipz8200 and removed request for Yeuoly December 11, 2024 02:09
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 17, 2024
@laipz8200 laipz8200 merged commit 74fdc16 into langgenius:main Dec 17, 2024
5 checks passed
jiangbo721 pushed a commit to jiangbo721/dify that referenced this pull request Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request ⚙️ feat:model-runtime lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
2 participants