feat: enhance gemini models #11497

hjlarry · 2024-12-09T10:03:42Z

Summary

Tip

Close issue syntax: Fixes #<issue number> or Resolves #<issue number>, see documentation for more details.

all the google models support video file. can close LLM video understanding #10720 . maybe resolve Concurrent API requests to Gemini vision model cause non-responsive behavior and block all tasks forever #9273
use google's upload file API instead of base64 string. Now we can cache the uploaded file instead of transfer base64 string each talk. Google support 1 hour video, transfer file each time is so terriable.
"gemini-pro-vision" was deprecated, remove this part of logic.
make the DocumentPromptMessageContent ImagePromptMessageContent VideoPromptMessageContent and AudioPromptMessageContent unified, both include format and contains mime_type in their data.
please help test pdf of claude and audio of openai

Screenshots

Checklist

Important

Please review the checklist below before submitting your pull request.

This change requires a documentation update, included: Dify Document
I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.
I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

api/core/file/file_manager.py

api/core/model_runtime/model_providers/anthropic/llm/llm.py

hjlarry added 2 commits December 9, 2024 17:47

enhance gemini models

93ab6f3

enhance gemini models

f1dce52

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. ⚙️ feat:model-runtime 💪 enhancement New feature or request labels Dec 9, 2024

fix CI

447d266

hjlarry marked this pull request as draft December 9, 2024 10:19

hjlarry added 4 commits December 10, 2024 10:00

fix CI

c5a6967

fix test for gemini

e5aebc7

fix test for gemini

c85563b

fix CI

be44964

hjlarry marked this pull request as ready for review December 10, 2024 03:33

crazywoola requested review from Yeuoly and laipz8200 and removed request for Yeuoly December 11, 2024 02:09

hjlarry added 4 commits December 12, 2024 09:20

add gemini-2.0-flash-exp

ea6253d

support audio for gemini models

25945d9

support audio for gemini models

21986ed

Merge remote-tracking branch 'myfork/p155' into p152

dd6b688

hjlarry mentioned this pull request Dec 12, 2024

Unable to Upload and Process Audio Using Multimodal Models (GPT Audio Preview, Gemini 1.5 Pro) #11567

Open

5 tasks

This was referenced Dec 12, 2024

Pending tools and model runtimes #11588

Open

After I upgraded dify to the latest version, the LLM is unable to read video data. #11590

Closed

laipz8200 requested changes Dec 13, 2024

View reviewed changes

api/core/file/file_manager.py Show resolved Hide resolved

api/core/model_runtime/model_providers/anthropic/llm/llm.py Show resolved Hide resolved

laipz8200 approved these changes Dec 17, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 17, 2024

laipz8200 merged commit 74fdc16 into langgenius:main Dec 17, 2024
5 checks passed

jiangbo721 pushed a commit to jiangbo721/dify that referenced this pull request Dec 20, 2024

feat: enhance gemini models (langgenius#11497)

641958c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enhance gemini models #11497

feat: enhance gemini models #11497

hjlarry commented Dec 9, 2024 •

edited

Loading

feat: enhance gemini models #11497

feat: enhance gemini models #11497

Conversation

hjlarry commented Dec 9, 2024 • edited Loading

Summary

Screenshots

Checklist

hjlarry commented Dec 9, 2024 •

edited

Loading