Skip to content

chutesai/claude-proxy

Repository files navigation

Claude Proxy for Chutes.ai

Lightweight proxy that translates Claude Messages API to OpenAI Chat Completions format with SSE streaming.

Routes Claude Code / Claude API requests to any OpenAI-compatible backend (SGLang, vLLM, Ollama, etc.)

Features

  • Full Claude API compatibility (text, images, tool_use, tool_result)
  • SSE streaming with proper event formatting
  • Thinking/reasoning content support - Auto-enables for reasoning models, streams thinking blocks
  • Client key forwarding (forwards client API keys directly to backend)
  • Model discovery with 60s cache refresh
  • Case-insensitive model matching with helpful 404 responses
  • Token counting endpoint (tiktoken-based)
  • Optional circuit breaker (disabled by default, can be enabled via env)
  • Health check endpoint
  • Request validation (max 10,000 messages, 10MB content)

Quick Start

Docker (default port 8180):

docker-compose up -d
export ANTHROPIC_BASE_URL=http://localhost:8180
export ANTHROPIC_API_KEY=cpk_your_api_key
claude

From Source (default port 8080):

cargo build --release
cargo run --release
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_API_KEY=cpk_your_api_key
claude

Configuration

Environment variables:

  • BACKEND_URL - Backend chat completions endpoint.
    • Default (source): http://127.0.0.1:8000/v1/chat/completions
    • Default (Docker): https://llm.chutes.ai/v1/chat/completions
  • HOST_PORT - Port to listen on (default: 8080)
  • RUST_LOG - Log level: error, warn, info, debug, trace (default: info)
  • BACKEND_TIMEOUT_SECS - Backend request timeout in seconds (default: 600)
  • ENABLE_CIRCUIT_BREAKER - Enable circuit breaker protection (default: false)
    • Opens after 5 consecutive failures, recovers after 30s

Example .env (for running from source):

BACKEND_URL=http://127.0.0.1:8000/v1/chat/completions
HOST_PORT=8080
RUST_LOG=info
ENABLE_CIRCUIT_BREAKER=false

Authentication:

  • Client API key (cpk_* or backend-compatible) → forwarded directly to backend
  • Anthropic OAuth tokens (sk-ant-*) → rejected with 401 (not supported)
  • No client auth → rejected with 401

API Endpoints

  • POST /v1/messages - Main Claude Messages API endpoint
  • POST /v1/messages/count_tokens - Token counting (tiktoken-based)
  • GET /health - Health check with circuit breaker status (if enabled)

Example request:

curl -N http://localhost:8080/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer cpk_your_key' \
  -d '{
    "model": "zai-org/GLM-4.5-Air",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 128,
    "stream": true
  }'

Supported Features

  • Text content - String or content blocks
  • Images - Base64 encoded, converted to OpenAI data URI format
  • Tool use/results - Full function calling support with tool_choice parameter
  • System prompts - Converted to system message
  • Multi-turn conversations - Context preservation (up to 10K messages)
  • Thinking/reasoning content - Automatic detection and streaming for reasoning models
  • Advanced sampling - Supports temperature, top_p, top_k
  • Model discovery - Auto-refresh every 60s, case-insensitive matching

Thinking/Reasoning Content

The proxy automatically handles thinking content for reasoning models:

Auto-enablement:

  • Models containing "reasoning", "r1", or "deep" in the name automatically enable thinking with a 10,000 token budget
  • Override by explicitly providing thinking parameter in request

Input transformation:

  • Assistant messages with thinking blocks → interleaved format: <think>reasoning</think>\nresponse
  • Preserves historical thinking content for multi-turn conversations

Output streaming:

  • Backend reasoning_content → proper Claude thinking blocks
  • Thinking blocks streamed before text blocks
  • Event sequence: content_block_start (thinking) → content_block_delta (thinking_delta) → content_block_stop → text blocks

Example request:

curl -N http://localhost:8080/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer cpk_your_key' \
  -d '{
    "model": "deepseek-r1",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "max_tokens": 1024,
    "stream": true
  }'

Test:

cd tests
./test_thinking.sh

Usage with Claude Code

Model selection:

/model zai-org/GLM-4.5-Air              # Free
/model deepseek/DeepSeek-R1             # Reasoning  
/model anthropic/claude-3-5-sonnet      # Standard

Other SDKs:

from anthropic import Anthropic
client = Anthropic(
    base_url="http://localhost:8080",
    api_key="cpk_your_key"
)
const client = new Anthropic({
  baseURL: 'http://localhost:8080',
  apiKey: 'cpk_your_key'
});

Deployment

Docker Compose:

docker-compose up -d

Includes Caddy reverse proxy for SSL/TLS. See docs/DOCKER.md for production setup.

Remote Client Connection:

export ANTHROPIC_BASE_URL=https://your-domain.com
export ANTHROPIC_API_KEY=cpk_your_api_key
claude

Testing

./test.sh --all    # Run full test suite

Set CHUTES_TEST_API_KEY=cpk_your_key in .env or export before running tests.

Unit Tests (81 tests ✅)

cargo test              # Run all unit tests
cargo test auth         # Run auth module tests
cargo test streaming    # Run SSE parser tests
cargo test content_extraction  # Run content translation tests

Coverage: 90%+ for critical utilities (auth, streaming, content extraction)

Building

cargo build --release    # Binary: target/release/claude_openai_proxy (~4MB)
cargo test              # Run unit tests (81 tests)
cargo test -- --nocapture  # Show test output

Documentation

API Specification Analysis

We've analyzed the official Anthropic Messages API and OpenAI Chat Completions API specs:

  • ~95% core compatibility for standard use cases (text, images, tools, streaming)
  • Tool choice - Force specific tools or disable tool usage (v0.1.5)
  • Advanced sampling - top_k parameter support (v0.1.5)
  • Long conversations - 10K message limit (v0.1.5)
  • ⚠️ Partial support for advanced features (response_format, PDFs)
  • Unsupported features: server tools, prompt caching, citations, audio

See API_COMPARISON.md and CHANGELOG.md for details.

Troubleshooting

  • 401 Unauthorized - Ensure client sends a valid backend-compatible API key. The proxy forwards the client's Authorization: Bearer <key> directly to the backend. Anthropic OAuth tokens (sk-ant-*) are not supported.
  • 404 Model Not Found - Use /model in Claude Code to see available models
  • Circuit breaker open - Backend failing; check health endpoint: curl http://localhost:8080/health (or port 8180 for Docker)
  • Debug logging - RUST_LOG=debug cargo run --release

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •