feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys #3813

jperezdealgaba · 2025-10-15T09:18:12Z

Add support for Google Gemini `gemini-embedding-001` embedding model and correctly registers model type

MR message created with the assistance of Claude-4.5-sonnet

This resolves #3755

What does this PR do?

This PR adds support for the gemini-embedding-001 Google embedding model to the llama-stack Gemini provider. This model provides high-dimensional embeddings (3072 dimensions) compared to the existing text-embedding-004 model (768 dimensions). Old embeddings models (such as text-embedding-004) will be deprecated soon according to Google (Link)

Problem

The Gemini provider only supported the text-embedding-004 embedding model. The newer gemini-embedding-001 model, which provides higher-dimensional embeddings for improved semantic representation, was not available through llama-stack.

Solution

This PR consists of three commits that implement, fix the model registration, and enable embedding generation:

Commit 1: Initial addition of gemini-embedding-001

Added metadata for gemini-embedding-001 to the embedding_model_metadata dictionary:

embedding_model_metadata: dict[str, dict[str, int]] = {
    "text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},
    "gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048},  # NEW
}

Issue discovered: The model was not being registered correctly because the dictionary keys didn't match the model IDs returned by Gemini's API.

Commit 2: Fix model ID matching with `models/` prefix

Updated both dictionary keys to include the models/ prefix to match Gemini's OpenAI-compatible API response format:

embedding_model_metadata: dict[str, dict[str, int]] = {
    "models/text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},      # UPDATED
    "models/gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048},  # UPDATED
}

Root cause: Gemini's OpenAI-compatible API returns model IDs with the models/ prefix (e.g., models/text-embedding-004). The OpenAIMixin.list_models() method directly matches these IDs against the embedding_model_metadata dictionary keys. Without the prefix, the models were being registered as LLMs instead of embedding models.

Commit 3: Fix embedding generation for providers without usage stats

Fixed a bug in OpenAIMixin.openai_embeddings() that prevented embedding generation for providers (like Gemini) that don't return usage statistics:

# Before (Line 351-354):
usage = OpenAIEmbeddingUsage(
    prompt_tokens=response.usage.prompt_tokens,  # ← Crashed with AttributeError
    total_tokens=response.usage.total_tokens,
)

# After (Lines 351-362):
if response.usage:
    usage = OpenAIEmbeddingUsage(
        prompt_tokens=response.usage.prompt_tokens,
        total_tokens=response.usage.total_tokens,
    )
else:
    usage = OpenAIEmbeddingUsage(
        prompt_tokens=0,  # Default when not provided
        total_tokens=0,   # Default when not provided
    )

Impact: This fix enables embedding generation for all Gemini embedding models, not just the newly added one.

Changes

Modified Files

llama_stack/providers/remote/inference/gemini/gemini.py

Line 17: Updated text-embedding-004 key to models/text-embedding-004
Line 18: Added models/gemini-embedding-001 with correct metadata

llama_stack/providers/utils/inference/openai_mixin.py

Lines 351-362: Added null check for response.usage to handle providers without usage statistics

Key Technical Details

Model ID Matching Flow

list_provider_model_ids() calls Gemini's /v1/models endpoint
API returns model IDs like: models/text-embedding-004, models/gemini-embedding-001
OpenAIMixin.list_models() (line 410) checks: if metadata := self.embedding_model_metadata.get(provider_model_id)
If matched, registers as model_type: "embedding" with metadata; otherwise registers as model_type: "llm"

Why Both Keys Needed the Prefix

The text-embedding-004 model was already working because there was likely separate configuration or manual registration handling it. For auto-discovery to work correctly for both models, both keys must match the API's model ID format exactly.

How to test this PR

Verified the changes by:

Model Auto-Discovery: Started llama-stack server and confirmed models are auto-discovered from Gemini API
Model Registration: Confirmed both embedding models are correctly registered and visible

curl http://localhost:8325/v1/models | jq '.data[] | select(.provider_id == "gemini" and .model_type == "embedding")'

Results:

✅ gemini/models/text-embedding-004 - 768 dimensions - model_type: "embedding"
✅ gemini/models/gemini-embedding-001 - 3072 dimensions - model_type: "embedding"

Before Fix (Commit 1): Models appeared as model_type: "llm" without embedding metadata
After Fix (Commit 2): Models correctly identified as model_type: "embedding" with proper metadata
Generate Embeddings: Verified embedding generation works

curl -X POST http://localhost:8325/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini/models/gemini-embedding-001", "input": "test"}' | \
  jq '.data[0].embedding | length'

franciscojavierarceo · 2025-10-15T14:00:15Z

llama_stack/providers/utils/inference/openai_mixin.py

-        )
+        # Handle providers that don't return usage statistics (e.g., Gemini)
+        # Use default values of 0 when usage is not provided
+        if response.usage:


Suggested change

if response.usage:

if hasattr(response, "usage"):

franciscojavierarceo

one small nit but otherwise this lgtm.

i pulled this down locally and tested the curl and the minimal RAG/Responses demo and they both worked (aside from quota maximization errors I received).

@jperezdealgaba can you confirm that you were able to run the RAG demo please? I sent this to you over DM.

mattf

+1 fixing model names
-1 adding gemini specific workaround to common code. please add gemini workarounds to the gemini adapter.

Changed the embedding model key from "text-embedding-004" to "models/text-embedding-004" in the GeminiInferenceAdapter class to align with the new model structure.

…InferenceAdapter Introduced a new embedding model with a dimension of 3072 and a context length of 2048 to the GeminiInferenceAdapter class.

This method addresses the absence of usage statistics in Gemini's embedding API by providing default values.

jperezdealgaba · 2025-10-15T15:24:37Z

@franciscojavierarceo Yes. The result of your test file is:

$ source .venv/bin/activate && python llama_stack/test_file.py
Vector store: vs_3a127f7a-01d5-4ff1-97a1-97c04bb379f7
File embedded: file-f2492ba97849483e8ee777fe89708304
Embedding model: gemini-embedding-001 (3072-dim)

Testing gemini/gemini-2.5-flash with file_search...
Vector store ID: vs_3a127f7a-01d5-4ff1-97a1-97c04bb379f7

RAG Response:
To achieve great work, it's essential to embark on projects that align with your natural aptitudes and deep interests, and that also offer the potential for significant impact. This journey typically involves four key steps: selecting a field, immersing yourself in it until you reach the forefront of current knowledge, identifying existing gaps, and then actively exploring those promising areas. While hard work is a prerequisite, it's equally important to be driven by an excited curiosity, which acts as both the engine and rudder of your endeavors.

Great work is rarely the result of rigid planning; instead, it often emerges from a process of "staying upwind"—continually pursuing what is most interesting and preserves future options. It's also crucial to cultivate earnestness, intellectual honesty, and a willingness to admit mistakes. This involves focusing on what truly matters rather than appearances, and embracing an optimistic outlook even if it means risking looking foolish sometimes. Furthermore, be prepared to revise and even discard work that doesn't fit, and strive for elegance and essence in your creations. Ultimately, the most powerful combination of motives is curiosity, delight, and the desire to do something impressive.

jperezdealgaba · 2025-10-15T15:25:02Z

@mattf Just addressed your comments

jperezdealgaba requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 15, 2025 09:18

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 15, 2025

franciscojavierarceo reviewed Oct 15, 2025

View reviewed changes

mattf requested changes Oct 15, 2025

View reviewed changes

jperezdealgaba added 3 commits October 15, 2025 17:16

fix: Update embedding model key format in GeminiInferenceAdapter

b155f92

Changed the embedding model key from "text-embedding-004" to "models/text-embedding-004" in the GeminiInferenceAdapter class to align with the new model structure.

feat: Add new embedding model "models/gemini-embedding-001" to Gemini…

a0c0e59

…InferenceAdapter Introduced a new embedding model with a dimension of 3072 and a context length of 2048 to the GeminiInferenceAdapter class.

feat: Implement openai_embeddings method in GeminiInferenceAdapter

50fd998

This method addresses the absence of usage statistics in Gemini's embedding API by providing default values.

jperezdealgaba force-pushed the add-gemini-embedding branch from a5a91fa to 50fd998 Compare October 15, 2025 15:22

mattf approved these changes Oct 15, 2025

View reviewed changes

franciscojavierarceo approved these changes Oct 15, 2025

View reviewed changes

franciscojavierarceo merged commit add8cd8 into llamastack:main Oct 15, 2025
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys #3813

feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys #3813

Uh oh!

jperezdealgaba commented Oct 15, 2025 •

edited

Loading

Uh oh!

franciscojavierarceo Oct 15, 2025 •

edited

Loading

Uh oh!

franciscojavierarceo left a comment

Uh oh!

mattf left a comment

Uh oh!

jperezdealgaba commented Oct 15, 2025

Uh oh!

jperezdealgaba commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys #3813

feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys #3813

Uh oh!

Conversation

jperezdealgaba commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add support for Google Gemini gemini-embedding-001 embedding model and correctly registers model type

What does this PR do?

Problem

Solution

Commit 1: Initial addition of gemini-embedding-001

Commit 2: Fix model ID matching with models/ prefix

Commit 3: Fix embedding generation for providers without usage stats

Changes

Modified Files

Key Technical Details

Model ID Matching Flow

Why Both Keys Needed the Prefix

How to test this PR

Uh oh!

franciscojavierarceo Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo left a comment

Choose a reason for hiding this comment

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

jperezdealgaba commented Oct 15, 2025

Uh oh!

jperezdealgaba commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jperezdealgaba commented Oct 15, 2025 •

edited

Loading

Add support for Google Gemini `gemini-embedding-001` embedding model and correctly registers model type

Commit 2: Fix model ID matching with `models/` prefix

franciscojavierarceo Oct 15, 2025 •

edited

Loading