A reverse proxy built with FastAPI that enables context resizing for LLM API calls, preserving accuracy while minimizing the context sent to models from providers like OpenAI, Anthropic, and OpenRouter.
- Auto-resize context before sending to LLMs
- Support multiple providers
- Streaming response support
- Full compatibility with OpenAI client library
- Install the required dependencies:
pip install -r requirements.txt- Set up your environment variables in a
.envfile:
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
OPENROUTER_API_KEY=your_openrouter_api_keyStart the proxy server:
python main.pyThe server will start on http://localhost:8000.
To resize context before sending to an LLM:
curl -X POST http://localhost:8000/v1/auto-resize \
-H "Content-Type: application/json" \
-d '{
"context": "{\"role\": \"user\", \"content\": \"Hello\"}\n{\"role\": \"assistant\", \"content\": \"Hi there!\"}",
"max_tokens": 1000
}'To proxy requests to AI providers:
# OpenAI
curl -X POST http://localhost:8000/v1/openai/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Anthropic
curl -X POST http://localhost:8000/v1/anthropic/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-2",
"max_tokens": 1000,
"messages": [{"role": "user", "content": "Hello!"}]
}'
# OpenRouter
curl -X POST http://localhost:8000/v1/openrouter/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello!"}]
}'-
Context Resizing: The
auto_resizefunction analyzes the conversation context and reduces its size while preserving important information. It uses semantic ordering to determine which messages are most relevant to the latest query and allocates more tokens to those messages. -
Proxy Functionality: The proxy routes requests to the appropriate AI provider, handling authentication and header management automatically.
-
Semantic Ordering: Uses ChromaDB to create embeddings of the context messages and orders them by relevance to the latest query.
-
Summarization: Uses OpenAI's GPT models to summarize messages when needed to fit within token limits.
POST /v1/auto-resize- Resize context before sending to LLMPOST /v1/{provider}/{path}- Proxy requests to AI providersGET /health- Health check endpoint
- OpenAI (
openai) - Anthropic (
anthropic) - OpenRouter (
openrouter)
The proxy can be configured using environment variables:
| Variable | Description |
|---|---|
OPENAI_API_KEY |
API key for OpenAI |
ContextResizer is fully compatible with the OpenAI Python client library, just like Helicone and other reverse proxies. You can use it as a drop-in replacement for the OpenAI API by simply changing the base_url parameter.
from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")
# Configure OpenAI client to use ContextResizer proxy
client = OpenAI(
api_key=openai_api_key,
base_url="http://localhost:8000/v1/openai" # Point to ContextResizer proxy
)
# Use the client exactly as you would with the OpenAI API
response = client.chat.completions.create(
model="gpt-4.1-nano",
messages=[
{"role": "user", "content": "Hello! This is a test message."}
],
max_tokens=50
)
print(response.choices[0].message.content)If you prefer to use HTTP requests directly:
import httpx
import asyncio
async def make_request():
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8000/v1/openai/chat/completions",
json={
"model": "gpt-4.1-nano",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 50
},
headers={"Content-Type": "application/json"}
)
return response.json()
# Usage
result = asyncio.run(make_request())
print(result['choices'][0]['message']['content'])The proxy supports all standard OpenAI API endpoints:
- Chat Completions:
POST /v1/openai/chat/completions - Completions:
POST /v1/openai/completions - Embeddings:
POST /v1/openai/embeddings - Models:
GET /v1/openai/models
The ContextResizer proxy maintains full compatibility with the OpenAI API interface while providing context resizing capabilities to reduce token usage and costs.