Lightweight proxy that translates Claude Messages API to OpenAI Chat Completions format with SSE streaming.
Routes Claude Code / Claude API requests to any OpenAI-compatible backend (SGLang, vLLM, Ollama, etc.)
- Full Claude API compatibility (text, images, tool_use, tool_result)
- SSE streaming with proper event formatting
- Thinking/reasoning content support - Auto-enables for reasoning models, streams thinking blocks
- Client key forwarding (forwards client API keys directly to backend)
- Model discovery with 60s cache refresh
- Case-insensitive model matching with helpful 404 responses
- Token counting endpoint (tiktoken-based)
- Optional circuit breaker (disabled by default, can be enabled via env)
- Health check endpoint
- Request validation (max 10,000 messages, 10MB content)
Docker (default port 8180):
docker-compose up -d
export ANTHROPIC_BASE_URL=http://localhost:8180
export ANTHROPIC_API_KEY=cpk_your_api_key
claudeFrom Source (default port 8080):
cargo build --release
cargo run --release
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_API_KEY=cpk_your_api_key
claudeEnvironment variables:
BACKEND_URL- Backend chat completions endpoint.- Default (source):
http://127.0.0.1:8000/v1/chat/completions - Default (Docker):
https://llm.chutes.ai/v1/chat/completions
- Default (source):
HOST_PORT- Port to listen on (default:8080)RUST_LOG- Log level:error,warn,info,debug,trace(default:info)BACKEND_TIMEOUT_SECS- Backend request timeout in seconds (default:600)ENABLE_CIRCUIT_BREAKER- Enable circuit breaker protection (default:false)- Opens after 5 consecutive failures, recovers after 30s
Example .env (for running from source):
BACKEND_URL=http://127.0.0.1:8000/v1/chat/completions
HOST_PORT=8080
RUST_LOG=info
ENABLE_CIRCUIT_BREAKER=falseAuthentication:
- Client API key (
cpk_*or backend-compatible) → forwarded directly to backend - Anthropic OAuth tokens (
sk-ant-*) → rejected with 401 (not supported) - No client auth → rejected with 401
POST /v1/messages- Main Claude Messages API endpointPOST /v1/messages/count_tokens- Token counting (tiktoken-based)GET /health- Health check with circuit breaker status (if enabled)
Example request:
curl -N http://localhost:8080/v1/messages \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer cpk_your_key' \
-d '{
"model": "zai-org/GLM-4.5-Air",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 128,
"stream": true
}'- Text content - String or content blocks
- Images - Base64 encoded, converted to OpenAI data URI format
- Tool use/results - Full function calling support with
tool_choiceparameter - System prompts - Converted to system message
- Multi-turn conversations - Context preservation (up to 10K messages)
- Thinking/reasoning content - Automatic detection and streaming for reasoning models
- Advanced sampling - Supports
temperature,top_p,top_k - Model discovery - Auto-refresh every 60s, case-insensitive matching
The proxy automatically handles thinking content for reasoning models:
Auto-enablement:
- Models containing "reasoning", "r1", or "deep" in the name automatically enable thinking with a 10,000 token budget
- Override by explicitly providing
thinkingparameter in request
Input transformation:
- Assistant messages with thinking blocks → interleaved format:
<think>reasoning</think>\nresponse - Preserves historical thinking content for multi-turn conversations
Output streaming:
- Backend
reasoning_content→ proper Claude thinking blocks - Thinking blocks streamed before text blocks
- Event sequence:
content_block_start(thinking) →content_block_delta(thinking_delta) →content_block_stop→ text blocks
Example request:
curl -N http://localhost:8080/v1/messages \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer cpk_your_key' \
-d '{
"model": "deepseek-r1",
"messages": [{"role": "user", "content": "What is 2+2?"}],
"max_tokens": 1024,
"stream": true
}'Test:
cd tests
./test_thinking.shModel selection:
/model zai-org/GLM-4.5-Air # Free
/model deepseek/DeepSeek-R1 # Reasoning
/model anthropic/claude-3-5-sonnet # StandardOther SDKs:
from anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:8080",
api_key="cpk_your_key"
)const client = new Anthropic({
baseURL: 'http://localhost:8080',
apiKey: 'cpk_your_key'
});Docker Compose:
docker-compose up -dIncludes Caddy reverse proxy for SSL/TLS. See docs/DOCKER.md for production setup.
Remote Client Connection:
export ANTHROPIC_BASE_URL=https://your-domain.com
export ANTHROPIC_API_KEY=cpk_your_api_key
claude./test.sh --all # Run full test suiteSet CHUTES_TEST_API_KEY=cpk_your_key in .env or export before running tests.
cargo test # Run all unit tests
cargo test auth # Run auth module tests
cargo test streaming # Run SSE parser tests
cargo test content_extraction # Run content translation testsCoverage: 90%+ for critical utilities (auth, streaming, content extraction)
cargo build --release # Binary: target/release/claude_openai_proxy (~4MB)
cargo test # Run unit tests (81 tests)
cargo test -- --nocapture # Show test output- API Reference - Complete API specification
- API Comparison - Detailed Anthropic vs OpenAI spec comparison
- Spec Analysis Summary - Executive summary of API compatibility
- Spec Sources - Information about cloned API specifications
- Test Coverage Summary - Unit test coverage report
- Docker Guide - Deployment with SSL/TLS
- Production Guide - Production features and monitoring
- Implementation Details - Architecture and design
We've analyzed the official Anthropic Messages API and OpenAI Chat Completions API specs:
- ✅ ~95% core compatibility for standard use cases (text, images, tools, streaming)
- ✅ Tool choice - Force specific tools or disable tool usage (v0.1.5)
- ✅ Advanced sampling -
top_kparameter support (v0.1.5) - ✅ Long conversations - 10K message limit (v0.1.5)
⚠️ Partial support for advanced features (response_format, PDFs)- ❌ Unsupported features: server tools, prompt caching, citations, audio
See API_COMPARISON.md and CHANGELOG.md for details.
- 401 Unauthorized - Ensure client sends a valid backend-compatible API key. The proxy forwards the client's
Authorization: Bearer <key>directly to the backend. Anthropic OAuth tokens (sk-ant-*) are not supported. - 404 Model Not Found - Use
/modelin Claude Code to see available models - Circuit breaker open - Backend failing; check health endpoint:
curl http://localhost:8080/health(or port 8180 for Docker) - Debug logging -
RUST_LOG=debug cargo run --release