Claude Proxy for Chutes.ai

Lightweight proxy that translates Claude Messages API to OpenAI Chat Completions format with SSE streaming.

Routes Claude Code / Claude API requests to any OpenAI-compatible backend (SGLang, vLLM, Ollama, etc.)

Features

Full Claude API compatibility (text, images, tool_use, tool_result)
SSE streaming with proper event formatting
Thinking/reasoning content support - Auto-enables for reasoning models, streams thinking blocks
Client key forwarding (forwards client API keys directly to backend)
Model discovery with 60s cache refresh
Case-insensitive model matching with helpful 404 responses
Token counting endpoint (tiktoken-based)
Optional circuit breaker (disabled by default, can be enabled via env)
Health check endpoint
Request validation (max 10,000 messages, 10MB content)

Quick Start

Docker (default port 8180):

docker-compose up -d
export ANTHROPIC_BASE_URL=http://localhost:8180
export ANTHROPIC_API_KEY=cpk_your_api_key
claude

From Source (default port 8080):

cargo build --release
cargo run --release
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_API_KEY=cpk_your_api_key
claude

Configuration

Environment variables:

BACKEND_URL - Backend chat completions endpoint.
- Default (source): http://127.0.0.1:8000/v1/chat/completions
- Default (Docker): https://llm.chutes.ai/v1/chat/completions
HOST_PORT - Port to listen on (default: 8080)
RUST_LOG - Log level: error, warn, info, debug, trace (default: info)
BACKEND_TIMEOUT_SECS - Backend request timeout in seconds (default: 600)
ENABLE_CIRCUIT_BREAKER - Enable circuit breaker protection (default: false)
- Opens after 5 consecutive failures, recovers after 30s

Example .env (for running from source):

BACKEND_URL=http://127.0.0.1:8000/v1/chat/completions
HOST_PORT=8080
RUST_LOG=info
ENABLE_CIRCUIT_BREAKER=false

Authentication:

Client API key (cpk_* or backend-compatible) → forwarded directly to backend
Anthropic OAuth tokens (sk-ant-*) → rejected with 401 (not supported)
No client auth → rejected with 401

API Endpoints

POST /v1/messages - Main Claude Messages API endpoint
POST /v1/messages/count_tokens - Token counting (tiktoken-based)
GET /health - Health check with circuit breaker status (if enabled)

Example request:

curl -N http://localhost:8080/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer cpk_your_key' \
  -d '{
    "model": "zai-org/GLM-4.5-Air",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 128,
    "stream": true
  }'

Supported Features

Text content - String or content blocks
Images - Base64 encoded, converted to OpenAI data URI format
Tool use/results - Full function calling support with tool_choice parameter
System prompts - Converted to system message
Multi-turn conversations - Context preservation (up to 10K messages)
Thinking/reasoning content - Automatic detection and streaming for reasoning models
Advanced sampling - Supports temperature, top_p, top_k
Model discovery - Auto-refresh every 60s, case-insensitive matching

Thinking/Reasoning Content

The proxy automatically handles thinking content for reasoning models:

Auto-enablement:

Models containing "reasoning", "r1", or "deep" in the name automatically enable thinking with a 10,000 token budget
Override by explicitly providing thinking parameter in request

Input transformation:

Assistant messages with thinking blocks → interleaved format: <think>reasoning</think>\nresponse
Preserves historical thinking content for multi-turn conversations

Output streaming:

Backend reasoning_content → proper Claude thinking blocks
Thinking blocks streamed before text blocks
Event sequence: content_block_start (thinking) → content_block_delta (thinking_delta) → content_block_stop → text blocks

Example request:

curl -N http://localhost:8080/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer cpk_your_key' \
  -d '{
    "model": "deepseek-r1",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "max_tokens": 1024,
    "stream": true
  }'

Test:

cd tests
./test_thinking.sh

Usage with Claude Code

Model selection:

/model zai-org/GLM-4.5-Air              # Free
/model deepseek/DeepSeek-R1             # Reasoning  
/model anthropic/claude-3-5-sonnet      # Standard

Other SDKs:

from anthropic import Anthropic
client = Anthropic(
    base_url="http://localhost:8080",
    api_key="cpk_your_key"
)

const client = new Anthropic({
  baseURL: 'http://localhost:8080',
  apiKey: 'cpk_your_key'
});

Deployment

Docker Compose:

docker-compose up -d

Includes Caddy reverse proxy for SSL/TLS. See docs/DOCKER.md for production setup.

Remote Client Connection:

export ANTHROPIC_BASE_URL=https://your-domain.com
export ANTHROPIC_API_KEY=cpk_your_api_key
claude

Testing

./test.sh --all    # Run full test suite

Set CHUTES_TEST_API_KEY=cpk_your_key in .env or export before running tests.

Unit Tests (81 tests ✅)

cargo test              # Run all unit tests
cargo test auth         # Run auth module tests
cargo test streaming    # Run SSE parser tests
cargo test content_extraction  # Run content translation tests

Coverage: 90%+ for critical utilities (auth, streaming, content extraction)

Building

cargo build --release    # Binary: target/release/claude_openai_proxy (~4MB)
cargo test              # Run unit tests (81 tests)
cargo test -- --nocapture  # Show test output

Documentation

API Reference - Complete API specification
API Comparison - Detailed Anthropic vs OpenAI spec comparison
Spec Analysis Summary - Executive summary of API compatibility
Spec Sources - Information about cloned API specifications
Test Coverage Summary - Unit test coverage report
Docker Guide - Deployment with SSL/TLS
Production Guide - Production features and monitoring
Implementation Details - Architecture and design

API Specification Analysis

We've analyzed the official Anthropic Messages API and OpenAI Chat Completions API specs:

✅ ~95% core compatibility for standard use cases (text, images, tools, streaming)
✅ Tool choice - Force specific tools or disable tool usage (v0.1.5)
✅ Advanced sampling - top_k parameter support (v0.1.5)
✅ Long conversations - 10K message limit (v0.1.5)
⚠️ Partial support for advanced features (response_format, PDFs)
❌ Unsupported features: server tools, prompt caching, citations, audio

See API_COMPARISON.md and CHANGELOG.md for details.

Troubleshooting

401 Unauthorized - Ensure client sends a valid backend-compatible API key. The proxy forwards the client's Authorization: Bearer <key> directly to the backend. Anthropic OAuth tokens (sk-ant-*) are not supported.
404 Model Not Found - Use /model in Claude Code to see available models
Circuit breaker open - Backend failing; check health endpoint: curl http://localhost:8080/health (or port 8180 for Docker)
Debug logging - RUST_LOG=debug cargo run --release

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
docs		docs
src		src
tests		tests
worker		worker
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Caddyfile		Caddyfile
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
caddy-entrypoint.sh		caddy-entrypoint.sh
docker-compose.yaml		docker-compose.yaml
install_claude_code.sh		install_claude_code.sh
test.sh		test.sh
validate_tests.sh		validate_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Claude Proxy for Chutes.ai

Features

Quick Start

Configuration

API Endpoints

Supported Features

Thinking/Reasoning Content

Usage with Claude Code

Deployment

Testing

Unit Tests (81 tests ✅)

Building

Documentation

API Specification Analysis

Troubleshooting

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

chutesai/claude-proxy

Folders and files

Latest commit

History

Repository files navigation

Claude Proxy for Chutes.ai

Features

Quick Start

Configuration

API Endpoints

Supported Features

Thinking/Reasoning Content

Usage with Claude Code

Deployment

Testing

Unit Tests (81 tests ✅)

Building

Documentation

API Specification Analysis

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages