-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys #3813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys #3813
Conversation
) | ||
# Handle providers that don't return usage statistics (e.g., Gemini) | ||
# Use default values of 0 when usage is not provided | ||
if response.usage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if response.usage: | |
if hasattr(response, "usage"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small nit but otherwise this lgtm.
i pulled this down locally and tested the curl and the minimal RAG/Responses demo and they both worked (aside from quota maximization errors I received).
@jperezdealgaba can you confirm that you were able to run the RAG demo please? I sent this to you over DM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 fixing model names
-1 adding gemini specific workaround to common code. please add gemini workarounds to the gemini adapter.
Changed the embedding model key from "text-embedding-004" to "models/text-embedding-004" in the GeminiInferenceAdapter class to align with the new model structure.
…InferenceAdapter Introduced a new embedding model with a dimension of 3072 and a context length of 2048 to the GeminiInferenceAdapter class.
This method addresses the absence of usage statistics in Gemini's embedding API by providing default values.
a5a91fa
to
50fd998
Compare
@franciscojavierarceo Yes. The result of your test file is:
|
@mattf Just addressed your comments |
Add support for Google Gemini
gemini-embedding-001
embedding model and correctly registers model typeMR message created with the assistance of Claude-4.5-sonnet
This resolves #3755
What does this PR do?
This PR adds support for the
gemini-embedding-001
Google embedding model to the llama-stack Gemini provider. This model provides high-dimensional embeddings (3072 dimensions) compared to the existingtext-embedding-004
model (768 dimensions). Old embeddings models (such as text-embedding-004) will be deprecated soon according to Google (Link)Problem
The Gemini provider only supported the
text-embedding-004
embedding model. The newergemini-embedding-001
model, which provides higher-dimensional embeddings for improved semantic representation, was not available through llama-stack.Solution
This PR consists of three commits that implement, fix the model registration, and enable embedding generation:
Commit 1: Initial addition of gemini-embedding-001
Added metadata for
gemini-embedding-001
to theembedding_model_metadata
dictionary:Issue discovered: The model was not being registered correctly because the dictionary keys didn't match the model IDs returned by Gemini's API.
Commit 2: Fix model ID matching with
models/
prefixUpdated both dictionary keys to include the
models/
prefix to match Gemini's OpenAI-compatible API response format:Root cause: Gemini's OpenAI-compatible API returns model IDs with the
models/
prefix (e.g.,models/text-embedding-004
). TheOpenAIMixin.list_models()
method directly matches these IDs against theembedding_model_metadata
dictionary keys. Without the prefix, the models were being registered as LLMs instead of embedding models.Commit 3: Fix embedding generation for providers without usage stats
Fixed a bug in
OpenAIMixin.openai_embeddings()
that prevented embedding generation for providers (like Gemini) that don't return usage statistics:Impact: This fix enables embedding generation for all Gemini embedding models, not just the newly added one.
Changes
Modified Files
llama_stack/providers/remote/inference/gemini/gemini.py
text-embedding-004
key tomodels/text-embedding-004
models/gemini-embedding-001
with correct metadatallama_stack/providers/utils/inference/openai_mixin.py
response.usage
to handle providers without usage statisticsKey Technical Details
Model ID Matching Flow
list_provider_model_ids()
calls Gemini's/v1/models
endpointmodels/text-embedding-004
,models/gemini-embedding-001
OpenAIMixin.list_models()
(line 410) checks:if metadata := self.embedding_model_metadata.get(provider_model_id)
model_type: "embedding"
with metadata; otherwise registers asmodel_type: "llm"
Why Both Keys Needed the Prefix
The
text-embedding-004
model was already working because there was likely separate configuration or manual registration handling it. For auto-discovery to work correctly for both models, both keys must match the API's model ID format exactly.How to test this PR
Verified the changes by:
Model Auto-Discovery: Started llama-stack server and confirmed models are auto-discovered from Gemini API
Model Registration: Confirmed both embedding models are correctly registered and visible
Results:
gemini/models/text-embedding-004
- 768 dimensions -model_type: "embedding"
gemini/models/gemini-embedding-001
- 3072 dimensions -model_type: "embedding"
Before Fix (Commit 1): Models appeared as
model_type: "llm"
without embedding metadataAfter Fix (Commit 2): Models correctly identified as
model_type: "embedding"
with proper metadataGenerate Embeddings: Verified embedding generation works