Skip to content

Conversation

jperezdealgaba
Copy link
Contributor

@jperezdealgaba jperezdealgaba commented Oct 15, 2025

Add support for Google Gemini gemini-embedding-001 embedding model and correctly registers model type

MR message created with the assistance of Claude-4.5-sonnet

This resolves #3755

What does this PR do?

This PR adds support for the gemini-embedding-001 Google embedding model to the llama-stack Gemini provider. This model provides high-dimensional embeddings (3072 dimensions) compared to the existing text-embedding-004 model (768 dimensions). Old embeddings models (such as text-embedding-004) will be deprecated soon according to Google (Link)

Problem

The Gemini provider only supported the text-embedding-004 embedding model. The newer gemini-embedding-001 model, which provides higher-dimensional embeddings for improved semantic representation, was not available through llama-stack.

Solution

This PR consists of three commits that implement, fix the model registration, and enable embedding generation:

Commit 1: Initial addition of gemini-embedding-001

Added metadata for gemini-embedding-001 to the embedding_model_metadata dictionary:

embedding_model_metadata: dict[str, dict[str, int]] = {
    "text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},
    "gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048},  # NEW
}

Issue discovered: The model was not being registered correctly because the dictionary keys didn't match the model IDs returned by Gemini's API.

Commit 2: Fix model ID matching with models/ prefix

Updated both dictionary keys to include the models/ prefix to match Gemini's OpenAI-compatible API response format:

embedding_model_metadata: dict[str, dict[str, int]] = {
    "models/text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},      # UPDATED
    "models/gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048},  # UPDATED
}

Root cause: Gemini's OpenAI-compatible API returns model IDs with the models/ prefix (e.g., models/text-embedding-004). The OpenAIMixin.list_models() method directly matches these IDs against the embedding_model_metadata dictionary keys. Without the prefix, the models were being registered as LLMs instead of embedding models.

Commit 3: Fix embedding generation for providers without usage stats

Fixed a bug in OpenAIMixin.openai_embeddings() that prevented embedding generation for providers (like Gemini) that don't return usage statistics:

# Before (Line 351-354):
usage = OpenAIEmbeddingUsage(
    prompt_tokens=response.usage.prompt_tokens,  # ← Crashed with AttributeError
    total_tokens=response.usage.total_tokens,
)

# After (Lines 351-362):
if response.usage:
    usage = OpenAIEmbeddingUsage(
        prompt_tokens=response.usage.prompt_tokens,
        total_tokens=response.usage.total_tokens,
    )
else:
    usage = OpenAIEmbeddingUsage(
        prompt_tokens=0,  # Default when not provided
        total_tokens=0,   # Default when not provided
    )

Impact: This fix enables embedding generation for all Gemini embedding models, not just the newly added one.

Changes

Modified Files

llama_stack/providers/remote/inference/gemini/gemini.py

  • Line 17: Updated text-embedding-004 key to models/text-embedding-004
  • Line 18: Added models/gemini-embedding-001 with correct metadata

llama_stack/providers/utils/inference/openai_mixin.py

  • Lines 351-362: Added null check for response.usage to handle providers without usage statistics

Key Technical Details

Model ID Matching Flow

  1. list_provider_model_ids() calls Gemini's /v1/models endpoint
  2. API returns model IDs like: models/text-embedding-004, models/gemini-embedding-001
  3. OpenAIMixin.list_models() (line 410) checks: if metadata := self.embedding_model_metadata.get(provider_model_id)
  4. If matched, registers as model_type: "embedding" with metadata; otherwise registers as model_type: "llm"

Why Both Keys Needed the Prefix

The text-embedding-004 model was already working because there was likely separate configuration or manual registration handling it. For auto-discovery to work correctly for both models, both keys must match the API's model ID format exactly.

How to test this PR

Verified the changes by:

  1. Model Auto-Discovery: Started llama-stack server and confirmed models are auto-discovered from Gemini API

  2. Model Registration: Confirmed both embedding models are correctly registered and visible

curl http://localhost:8325/v1/models | jq '.data[] | select(.provider_id == "gemini" and .model_type == "embedding")'

Results:

  • gemini/models/text-embedding-004 - 768 dimensions - model_type: "embedding"
  • gemini/models/gemini-embedding-001 - 3072 dimensions - model_type: "embedding"
  1. Before Fix (Commit 1): Models appeared as model_type: "llm" without embedding metadata

  2. After Fix (Commit 2): Models correctly identified as model_type: "embedding" with proper metadata

  3. Generate Embeddings: Verified embedding generation works

curl -X POST http://localhost:8325/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini/models/gemini-embedding-001", "input": "test"}' | \
  jq '.data[0].embedding | length'

)
# Handle providers that don't return usage statistics (e.g., Gemini)
# Use default values of 0 when usage is not provided
if response.usage:
Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if response.usage:
if hasattr(response, "usage"):

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small nit but otherwise this lgtm.

i pulled this down locally and tested the curl and the minimal RAG/Responses demo and they both worked (aside from quota maximization errors I received).

@jperezdealgaba can you confirm that you were able to run the RAG demo please? I sent this to you over DM.

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 fixing model names
-1 adding gemini specific workaround to common code. please add gemini workarounds to the gemini adapter.

Changed the embedding model key from "text-embedding-004" to "models/text-embedding-004" in the GeminiInferenceAdapter class to align with the new model structure.
…InferenceAdapter

Introduced a new embedding model with a dimension of 3072 and a context length of 2048 to the GeminiInferenceAdapter class.
This method addresses the absence of usage statistics in Gemini's embedding API by providing default values.
@jperezdealgaba
Copy link
Contributor Author

@franciscojavierarceo Yes. The result of your test file is:

$ source .venv/bin/activate && python llama_stack/test_file.py
Vector store: vs_3a127f7a-01d5-4ff1-97a1-97c04bb379f7
File embedded: file-f2492ba97849483e8ee777fe89708304
Embedding model: gemini-embedding-001 (3072-dim)

Testing gemini/gemini-2.5-flash with file_search...
Vector store ID: vs_3a127f7a-01d5-4ff1-97a1-97c04bb379f7

RAG Response:
To achieve great work, it's essential to embark on projects that align with your natural aptitudes and deep interests, and that also offer the potential for significant impact. This journey typically involves four key steps: selecting a field, immersing yourself in it until you reach the forefront of current knowledge, identifying existing gaps, and then actively exploring those promising areas. While hard work is a prerequisite, it's equally important to be driven by an excited curiosity, which acts as both the engine and rudder of your endeavors.

Great work is rarely the result of rigid planning; instead, it often emerges from a process of "staying upwind"—continually pursuing what is most interesting and preserves future options. It's also crucial to cultivate earnestness, intellectual honesty, and a willingness to admit mistakes. This involves focusing on what truly matters rather than appearances, and embracing an optimistic outlook even if it means risking looking foolish sometimes. Furthermore, be prepared to revise and even discard work that doesn't fit, and strive for elegance and essence in your creations. Ultimately, the most powerful combination of motives is curiosity, delight, and the desire to do something impressive.

@jperezdealgaba
Copy link
Contributor Author

@mattf Just addressed your comments

@franciscojavierarceo franciscojavierarceo merged commit add8cd8 into llamastack:main Oct 15, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add gemini-embedding-001 as embedding model for Gemini

3 participants