-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
🤔 What is the technical debt you think should be addressed?
Problem
Remote inference providers have inconsistent URL configuration patterns, making the API confusing for users and difficult to maintain. After surveying all providers, there are multiple naming conventions and handling approaches.
Current State Analysis
| Provider | Field Name | Type | Default Value | URL Transformation |
|---|---|---|---|---|
| OpenAI | base_url |
str |
https://api.openai.com/v1 |
None - Direct pass-through |
| Cerebras | base_url |
str |
https://api.cerebras.ai |
Append /v1 - urljoin(base_url, "v1") |
| Azure | api_base |
HttpUrl |
Required field | Append /openai/v1 - urljoin(api_base, "/openai/v1") |
| Llama OpenAI Compat | openai_compat_api_base |
str |
https://api.llama.com/compat/v1/ |
None - Direct pass-through |
| NVIDIA | url |
str |
https://integrate.api.nvidia.com |
Conditional /v1 - Append /v1 if append_api_version=True |
| Fireworks | url |
str |
https://api.fireworks.ai/inference/v1 |
Hardcoded - Ignores config, returns https://api.fireworks.ai/inference/v1 |
| Together | url |
str |
https://api.together.xyz/v1 |
Hardcoded - Ignores config, uses Together's BASE_URL constant |
| Groq | url |
str |
https://api.groq.com |
Append /openai/v1 - {url}/openai/v1 |
| Ollama | url |
str |
http://localhost:11434 |
Append /v1 - url.rstrip("/") + "/v1" |
| vLLM | url |
str | None |
None (required) |
None - Direct pass-through |
| TGI | url |
str |
Required field | Append /v1 - {url.rstrip('/')}/v1 (set during initialization) |
| Databricks | url |
str | None |
None |
Append /serving-endpoints - {url}/serving-endpoints |
| SambaNova | url |
str |
https://api.sambanova.ai/v1 |
None - Direct pass-through |
| Runpod | url |
str | None |
None |
None - Direct pass-through |
| WatsonX | url |
str |
https://us-south.ml.cloud.ibm.com |
None - Direct pass-through |
| Passthrough | url |
str |
None (required) |
None - Direct pass-through |
| Anthropic | N/A | N/A | N/A | Hardcoded - Ignores config, returns https://api.anthropic.com/v1 |
| VertexAI | project + location |
str |
Various | Custom construction - GCP-specific URL from project/location |
| Gemini | N/A | N/A | N/A | Hardcoded - Returns https://generativelanguage.googleapis.com/v1beta/openai/ |
URL Construction Patterns
Providers handle URL construction with significant inconsistencies:
- Direct pass-through: OpenAI, vLLM, SambaNova, Runpod, WatsonX, Passthrough, Llama OpenAI Compat
- Automatic
/v1appending:- Ollama (always):
url.rstrip("/") + "/v1" - NVIDIA (conditional):
/v1ifappend_api_version=True - Cerebras (always):
urljoin(base_url, "v1") - TGI (during init):
{url.rstrip('/')}/v1
- Ollama (always):
- Custom path construction:
- Azure:
urljoin(api_base, "/openai/v1") - Groq:
{url}/openai/v1 - Databricks:
{url}/serving-endpoints - VertexAI: Complex GCP URL construction
- Azure:
- Hardcoded endpoints:
- Anthropic:
https://api.anthropic.com/v1(ignores config) - Fireworks:
https://api.fireworks.ai/inference/v1(ignores config) - Together: Uses Together's
BASE_URLconstant (ignores config) - Gemini:
https://generativelanguage.googleapis.com/v1beta/openai/(ignores config)
- Anthropic:
Environment Variable Inconsistencies
OPENAI_BASE_URL,NVIDIA_BASE_URL,WATSONX_BASE_URL(some providers)OLLAMA_URL,VLLM_URL,TGI_URL(other providers)- Mixed patterns that don't always align with provider documentation
Proposed Solution
1. Standardize Field Naming
Recommendation: Use base_url consistently across all providers unless the provider has their own standard, e.g. Databricks documents host and DATABICKS_HOST.
2. Consistent Type Annotations
- Use
HttpUrlorHttpUrl | None
3. Environment Variable Alignment
Recommendation: Align with each provider's official documentation and conventions.
Examples of provider-native conventions:
OPENAI_BASE_URL(OpenAI standard)OLLAMA_URL(Ollama standard)VLLM_URL(vLLM standard)
Approach: Research each provider's official documentation and use their recommended environment variable names, rather than forcing a unified pattern that conflicts with provider conventions.
4. URL Construction Guidelines
Recommendation: Minimize modifications made to user configuration. For instance, have users input a full url with /v1 or /openai/v1 instead of appending at runtime.
Warning: This will be a breakng change for multiple providers.
💡 What is the benefit of addressing this technical debt?
(above)
Other thoughts
No response