Skip to content

Model Router

TenzinGayche edited this page Dec 15, 2025 · 1 revision

Model Router

The Model Router is a core component that provides a unified interface to multiple LLM providers, handling model selection, configuration, and caching.


🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Model Router                              │
│                                                                  │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    get_model(name)                        │  │
│  │                                                           │  │
│  │  1. Check cache for existing instance                     │  │
│  │  2. Identify provider from model name                     │  │
│  │  3. Validate API key availability                         │  │
│  │  4. Create and configure model instance                   │  │
│  │  5. Cache and return                                      │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
│       ┌──────────────────────────────────────────┐              │
│       │              Provider Layer              │              │
│       ├──────────┬──────────┬──────────┬────────┤              │
│       │Anthropic │  Google  │  OpenAI  │Dharma- │              │
│       │  Claude  │  Gemini  │   GPT    │ mitra  │              │
│       └──────────┴──────────┴──────────┴────────┘              │
└─────────────────────────────────────────────────────────────────┘

🤖 Supported Models

Anthropic (Claude)

Model ID Description Context Window
claude-sonnet-4-20250514 Claude Sonnet 4.0 200,000 tokens
claude-sonnet-4-5-20250929 Claude Sonnet 4.5 200,000 tokens
claude-haiku-4-5-20251001 Claude Haiku 4.5 (fast) 200,000 tokens
claude-3-5-haiku-20241022 Claude 3.5 Haiku 200,000 tokens
claude-3-opus-20240229 Claude 3 Opus (most capable) 200,000 tokens

Environment Variable: ANTHROPIC_API_KEY

Capabilities: text, reasoning, translation, structured output


Google (Gemini)

Model ID Description Thinking Context Window
gemini-2.5-pro Gemini 2.5 Pro ✅ Enabled (12k budget) 30,720 tokens
gemini-2.5-flash Gemini 2.5 Flash (fast) ❌ Disabled 30,720 tokens
gemini-2.5-flash-thinking Flash with thinking ✅ Enabled (12k budget) 30,720 tokens

Environment Variable: GEMINI_API_KEY

Special Features:

  • Thinking mode for reasoning tasks
  • Default JSON response format
  • Automatic generation config handling

OpenAI (GPT)

Model ID Description Context Window
gpt-4 GPT-4 128,000 tokens
gpt-4-turbo GPT-4 Turbo (faster) 128,000 tokens
gpt-3.5-turbo GPT-3.5 Turbo (economical) 16,385 tokens

Environment Variable: OPENAI_API_KEY


Dharmamitra

Model ID Description Notes
dharamitra Specialized Buddhist translation Translation-only

Environment Variable: DHARMAMITRA_TOKEN

Limitations:

  • Translation endpoints only
  • No structured output support
  • Not available for UCCA/Gloss/Editor features

⚙️ Configuration

Environment Variables

# .env file

# Anthropic
ANTHROPIC_API_KEY=sk-ant-api03-...

# Google
GEMINI_API_KEY=AIzaSy...

# OpenAI
OPENAI_API_KEY=sk-...

# Dharmamitra
DHARMAMITRA_TOKEN=your-token
DHARMAMITRA_PASSWORD=your-password  # For proxy endpoints

Default Configuration

default_configs = {
    "temperature": 0.3,      # Lower for consistency
    "max_tokens": 4000,      # Default output limit
}

🔧 Usage

Basic Usage

from src.translation_api.models.model_router import get_model_router

# Get the global router instance
router = get_model_router()

# Get a model
model = router.get_model("claude-sonnet-4-20250514")

# Use the model
response = model.invoke("Translate: བྱང་ཆུབ་སེམས")

With Custom Parameters

model = router.get_model(
    "gemini-2.5-pro",
    temperature=0.1,
    max_tokens=8000
)

Checking Availability

# Get all available models (based on configured API keys)
available = router.get_available_models()

# Check if specific model is available
if router.validate_model_availability("claude-sonnet-4-20250514"):
    model = router.get_model("claude-sonnet-4-20250514")

Structured Output

from pydantic import BaseModel

class Translation(BaseModel):
    text: str
    confidence: float

# Get model with structured output
model = router.get_model("claude-sonnet-4-20250514")
structured = model.with_structured_output(Translation)

result = structured.invoke("Translate: བྱང་ཆུབ་སེམས")
# result.text = "bodhicitta"
# result.confidence = 0.95

🧠 Gemini Thinking Mode

Gemini models support "thinking" - internal reasoning before responding.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│                      Thinking Mode                               │
│                                                                  │
│  Input: "Translate this complex Buddhist text..."               │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                 Internal Thinking                        │    │
│  │  "Let me analyze the grammatical structure..."           │    │
│  │  "The term བྱང་ཆུབ་སེམས has multiple meanings..."         │    │
│  │  "Given the context, I should use..."                    │    │
│  │                                                          │    │
│  │  (Up to 12,000 tokens of reasoning)                     │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
│  Output: "Bodhicitta, the mind of awakening..."                 │
└─────────────────────────────────────────────────────────────────┘

Configuration

Model Thinking Budget
gemini-2.5-flash 0 (disabled)
gemini-2.5-flash-thinking 12,000 tokens
gemini-2.5-pro 12,000 tokens

Code Example

# Flash without thinking (fast)
fast_model = router.get_model("gemini-2.5-flash")

# Flash with thinking (slower, better quality)
thinking_model = router.get_model("gemini-2.5-flash-thinking")

# Pro with thinking (best quality)
pro_model = router.get_model("gemini-2.5-pro")

🔄 Model Caching

The router caches model instances to avoid redundant initialization:

# First call: creates new instance
model1 = router.get_model("claude-sonnet-4-20250514")

# Second call: returns cached instance
model2 = router.get_model("claude-sonnet-4-20250514")

# model1 is model2 → True

# Different params: new instance
model3 = router.get_model("claude-sonnet-4-20250514", temperature=0.5)

# model1 is model3 → False

Cache Key

cache_key = f"{model_name}_{hash(str(sorted(kwargs.items())))}"

🔌 Provider-Specific Wrappers

Gemini Wrapper

The _GeminiModelWrapper handles Gemini-specific configuration:

class _GeminiModelWrapper:
    """Injects generation_config into all Gemini calls."""
    
    def __init__(self, base_model, generation_config):
        self._base_model = base_model
        self._generation_config = generation_config
    
    async def ainvoke(self, input, **kwargs):
        # Merge default config with call-specific config
        merged = {**self._generation_config, **kwargs.get("generation_config", {})}
        return await self._base_model.ainvoke(input, generation_config=merged)

Features:

  • Automatic JSON response format
  • Thinking config injection
  • Plain text mode support

Dharmamitra Wrapper

The _DharmamitraModelWrapper integrates with the Dharmamitra API:

class _DharmamitraModelWrapper:
    """Translation-only wrapper for Dharmamitra."""
    
    def invoke(self, input, **kwargs):
        # Extract source text and target language
        # Call Dharmamitra API
        # Return translation
        pass
    
    def with_structured_output(self, schema):
        raise ValueError("'dharamitra' supports translation only")

Limitations:

  • No with_structured_output()
  • No batch operations
  • Translation endpoints only

📊 API Key Validation

Models are only available if their API key is configured:

def get_available_models(self) -> Dict[str, Dict[str, Any]]:
    available = {}
    
    if self.settings.anthropic_api_key:
        available.update({
            "claude-sonnet-4-20250514": {...},
            # ... other Claude models
        })
    
    if self.settings.gemini_api_key:
        available.update({
            "gemini-2.5-pro": {...},
            # ... other Gemini models
        })
    
    # ...
    return available

❌ Error Handling

Missing API Key

try:
    model = router.get_model("claude-sonnet-4-20250514")
except ValueError as e:
    # "ANTHROPIC_API_KEY is required for Claude models"

Unsupported Model

try:
    model = router.get_model("invalid-model")
except ValueError as e:
    # "Unsupported model: invalid-model"

API Endpoint Validation

if not router.validate_model_availability("claude-sonnet-4-20250514"):
    available = list(router.get_available_models().keys())
    raise HTTPException(
        status_code=400,
        detail=f"Model not available. Available: {available}"
    )

🧪 Testing Models

Demo Script

# examples/thinking_models_demo.py

async def demo_thinking_models():
    router = get_model_router()
    
    # Test different models
    for model_name in ["gemini-2.5-flash", "gemini-2.5-flash-thinking"]:
        if router.validate_model_availability(model_name):
            model = router.get_model(model_name)
            response = await model.ainvoke("Translate: བྱང་ཆུབ་སེམས")
            print(f"{model_name}: {response.content}")

Health Check

curl http://localhost:8001/health
{
  "status": "healthy",
  "version": "1.0.0",
  "available_models": {
    "claude-sonnet-4-20250514": {"provider": "Anthropic", ...},
    "gemini-2.5-pro": {"provider": "Google", ...}
  }
}

🔗 See Also

Clone this wiki locally