Skip to content

Standardize URL Configuration Across Remote Inference Providers #3732

@mattf

Description

@mattf

🤔 What is the technical debt you think should be addressed?

Problem

Remote inference providers have inconsistent URL configuration patterns, making the API confusing for users and difficult to maintain. After surveying all providers, there are multiple naming conventions and handling approaches.

Current State Analysis

Provider Field Name Type Default Value URL Transformation
OpenAI base_url str https://api.openai.com/v1 None - Direct pass-through
Cerebras base_url str https://api.cerebras.ai Append /v1 - urljoin(base_url, "v1")
Azure api_base HttpUrl Required field Append /openai/v1 - urljoin(api_base, "/openai/v1")
Llama OpenAI Compat openai_compat_api_base str https://api.llama.com/compat/v1/ None - Direct pass-through
NVIDIA url str https://integrate.api.nvidia.com Conditional /v1 - Append /v1 if append_api_version=True
Fireworks url str https://api.fireworks.ai/inference/v1 Hardcoded - Ignores config, returns https://api.fireworks.ai/inference/v1
Together url str https://api.together.xyz/v1 Hardcoded - Ignores config, uses Together's BASE_URL constant
Groq url str https://api.groq.com Append /openai/v1 - {url}/openai/v1
Ollama url str http://localhost:11434 Append /v1 - url.rstrip("/") + "/v1"
vLLM url str | None None (required) None - Direct pass-through
TGI url str Required field Append /v1 - {url.rstrip('/')}/v1 (set during initialization)
Databricks url str | None None Append /serving-endpoints - {url}/serving-endpoints
SambaNova url str https://api.sambanova.ai/v1 None - Direct pass-through
Runpod url str | None None None - Direct pass-through
WatsonX url str https://us-south.ml.cloud.ibm.com None - Direct pass-through
Passthrough url str None (required) None - Direct pass-through
Anthropic N/A N/A N/A Hardcoded - Ignores config, returns https://api.anthropic.com/v1
VertexAI project + location str Various Custom construction - GCP-specific URL from project/location
Gemini N/A N/A N/A Hardcoded - Returns https://generativelanguage.googleapis.com/v1beta/openai/

URL Construction Patterns

Providers handle URL construction with significant inconsistencies:

  1. Direct pass-through: OpenAI, vLLM, SambaNova, Runpod, WatsonX, Passthrough, Llama OpenAI Compat
  2. Automatic /v1 appending:
    • Ollama (always): url.rstrip("/") + "/v1"
    • NVIDIA (conditional): /v1 if append_api_version=True
    • Cerebras (always): urljoin(base_url, "v1")
    • TGI (during init): {url.rstrip('/')}/v1
  3. Custom path construction:
    • Azure: urljoin(api_base, "/openai/v1")
    • Groq: {url}/openai/v1
    • Databricks: {url}/serving-endpoints
    • VertexAI: Complex GCP URL construction
  4. Hardcoded endpoints:
    • Anthropic: https://api.anthropic.com/v1 (ignores config)
    • Fireworks: https://api.fireworks.ai/inference/v1 (ignores config)
    • Together: Uses Together's BASE_URL constant (ignores config)
    • Gemini: https://generativelanguage.googleapis.com/v1beta/openai/ (ignores config)

Environment Variable Inconsistencies

  • OPENAI_BASE_URL, NVIDIA_BASE_URL, WATSONX_BASE_URL (some providers)
  • OLLAMA_URL, VLLM_URL, TGI_URL (other providers)
  • Mixed patterns that don't always align with provider documentation

Proposed Solution

1. Standardize Field Naming

Recommendation: Use base_url consistently across all providers unless the provider has their own standard, e.g. Databricks documents host and DATABICKS_HOST.

2. Consistent Type Annotations

  • Use HttpUrl or HttpUrl | None

3. Environment Variable Alignment

Recommendation: Align with each provider's official documentation and conventions.

Examples of provider-native conventions:

  • OPENAI_BASE_URL (OpenAI standard)
  • OLLAMA_URL (Ollama standard)
  • VLLM_URL (vLLM standard)

Approach: Research each provider's official documentation and use their recommended environment variable names, rather than forcing a unified pattern that conflicts with provider conventions.

4. URL Construction Guidelines

Recommendation: Minimize modifications made to user configuration. For instance, have users input a full url with /v1 or /openai/v1 instead of appending at runtime.

Warning: This will be a breakng change for multiple providers.

💡 What is the benefit of addressing this technical debt?

(above)

Other thoughts

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions