Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docker,auth): Add Docker, Compose, Auth, Parameter envs #6

Merged
merged 1 commit into from
Sep 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Build stage
FROM python:3.12-slim AS builder

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc libc6-dev \
&& rm -rf /var/lib/apt/lists/*

# Copy only the requirements file first to leverage Docker cache
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Final stage
FROM python:3.12-slim

# Install curl for the healthcheck
RUN apt-get update && apt-get install -y --no-install-recommends \
curl && \
apt-get clean && rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy installed dependencies from builder stage
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

# Copy application code
COPY . .

# Create a non-root user and switch to it
RUN useradd -m appuser
USER appuser

# Set environment variables
ENV PYTHONUNBUFFERED=1

# Expose the port the app runs on
EXPOSE 8000

# Run the application
ENTRYPOINT ["python", "optillm.py"]
119 changes: 93 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,16 @@ optillm is an OpenAI API compatible optimizing inference proxy which implements

### plansearch-gpt-4o-mini on LiveCodeBench (Sep 2024)

| Model | pass@1 | pass@5 | pass@10 |
|-------|--------|--------|---------|
| plansearch-gpt-4o-mini | 44.03 | 59.31 | 63.5 |
| gpt-4o-mini | 43.9 | 50.61 | 53.25 |
| claude-3.5-sonnet | 51.3 | | |
| gpt-4o-2024-05-13 | 45.2 | | |
| gpt-4-turbo-2024-04-09 | 44.2 | | |
| Model | pass@1 | pass@5 | pass@10 |
| ---------------------- | ------ | ------ | ------- |
| plansearch-gpt-4o-mini | 44.03 | 59.31 | 63.5 |
| gpt-4o-mini | 43.9 | 50.61 | 53.25 |
| claude-3.5-sonnet | 51.3 | | |
| gpt-4o-2024-05-13 | 45.2 | | |
| gpt-4-turbo-2024-04-09 | 44.2 | | |

### moa-gpt-4o-mini on Arena-Hard-Auto (Aug 2024)

![Results showing Mixture of Agents approach using gpt-4o-mini on Arena Hard Auto Benchmark](./moa-results.png)

## Installation
Expand All @@ -32,7 +33,7 @@ pip install -r requirements.txt
You can then run the optillm proxy as follows.

```bash
python optillm.py
python optillm.py
2024-09-06 07:57:14,191 - INFO - Starting server with approach: auto
2024-09-06 07:57:14,191 - INFO - Server configuration: {'approach': 'auto', 'mcts_simulations': 2, 'mcts_exploration': 0.2, 'mcts_depth': 1, 'best_of_n': 3, 'model': 'gpt-4o-mini', 'rstar_max_depth': 3, 'rstar_num_rollouts': 5, 'rstar_c': 1.4, 'base_url': ''}
* Serving Flask app 'optillm'
Expand All @@ -44,11 +45,11 @@ python optillm.py
2024-09-06 07:57:14,212 - INFO - Press CTRL+C to quit
```

### Usage
## Usage

Once the proxy is running, you can just use it as a drop in replacement for an OpenAI client by setting the `base_url` as `http://localhost:8000/v1`.
Once the proxy is running, you can use it as a drop in replacement for an OpenAI client by setting the `base_url` as `http://localhost:8000/v1`.

```bash
```python
import os
from openai import OpenAI

Expand All @@ -70,7 +71,7 @@ response = client.chat.completions.create(
print(response)
```

You can control the technique you use for optimization by prepending the slug to the model name `{slug}-model-name`. E.g. in the above code we are using `moa` or
You can control the technique you use for optimization by prepending the slug to the model name `{slug}-model-name`. E.g. in the above code we are using `moa` or
mixture of agents as the optimization approach. In the proxy logs you will see the following showing the `moa` is been used with the base model as `gpt-4o-mini`.

```bash
Expand All @@ -83,20 +84,86 @@ mixture of agents as the optimization approach. In the proxy logs you will see t

## Implemented techniques

| Technique | Slug | Description |
|-----------|----------------|-------------|
| Agent | `agent ` | Determines which of the below approaches to take and then combines the results |
| Monte Carlo Tree Search | `mcts` | Uses MCTS for decision-making in chat responses |
| Best of N Sampling | `bon` | Generates multiple responses and selects the best one |
| Mixture of Agents | `moa` | Combines responses from multiple critiques |
| Round Trip Optimization | `rto` | Optimizes responses through a round-trip process |
| Z3 Solver | `z3` | Utilizes the Z3 theorem prover for logical reasoning |
| Self-Consistency | `self_consistency` | Implements an advanced self-consistency method |
| PV Game | `pvg` | Applies a prover-verifier game approach at inference time |
| R* Algorithm | `rstar` | Implements the R* algorithm for problem-solving |
| CoT with Reflection | `cot_reflection` | Implements chain-of-thought reasoning with \<thinking\>, \<reflection> and \<output\> sections |
| PlanSearch | `plansearch` | Implements a search algorithm over candidate plans for solving a problem in natural language |
| LEAP | `leap` | Learns task-specific principles from few shot examples |
| Technique | Slug | Description |
| ----------------------- | ------------------ | ---------------------------------------------------------------------------------------------- |
| Agent | `agent` | Determines which of the below approaches to take and then combines the results |
| Monte Carlo Tree Search | `mcts` | Uses MCTS for decision-making in chat responses |
| Best of N Sampling | `bon` | Generates multiple responses and selects the best one |
| Mixture of Agents | `moa` | Combines responses from multiple critiques |
| Round Trip Optimization | `rto` | Optimizes responses through a round-trip process |
| Z3 Solver | `z3` | Utilizes the Z3 theorem prover for logical reasoning |
| Self-Consistency | `self_consistency` | Implements an advanced self-consistency method |
| PV Game | `pvg` | Applies a prover-verifier game approach at inference time |
| R* Algorithm | `rstar` | Implements the R* algorithm for problem-solving |
| CoT with Reflection | `cot_reflection` | Implements chain-of-thought reasoning with \<thinking\>, \<reflection> and \<output\> sections |
| PlanSearch | `plansearch` | Implements a search algorithm over candidate plans for solving a problem in natural language |
| LEAP | `leap` | Learns task-specific principles from few shot examples |

## Available Parameters

optillm supports various command-line arguments and environment variables for configuration.

| Parameter | Description | Default Value |
|--------------------------|-----------------------------------------------------------------|-----------------|
| `--approach` | Inference approach to use | `"auto"` |
| `--simulations` | Number of MCTS simulations | 2 |
| `--exploration` | Exploration weight for MCTS | 0.2 |
| `--depth` | Simulation depth for MCTS | 1 |
| `--best-of-n` | Number of samples for best_of_n approach | 3 |
| `--model` | OpenAI model to use | `"gpt-4o-mini"` |
| `--base-url` | Base URL for OpenAI compatible endpoint | `""` |
| `--rstar-max-depth` | Maximum depth for rStar algorithm | 3 |
| `--rstar-num-rollouts` | Number of rollouts for rStar algorithm | 5 |
| `--rstar-c` | Exploration constant for rStar algorithm | 1.4 |
| `--n` | Number of final responses to be returned | 1 |
| `--return-full-response` | Return the full response including the CoT with <thinking> tags | `False` |
| `--port` | Specify the port to run the proxy | 8000 |
| `--api-key` | Optional API key for client authentication to optillm | `""` |

When using Docker, these can be set as environment variables prefixed with `OPTILLM_`.

## Running with Docker

optillm can optionally be built and run using Docker and the provided [Dockerfile](./Dockerfile).

### Using Docker Compose

1. Make sure you have Docker and Docker Compose installed on your system.

2. Either update the environment variables in the docker-compose.yaml file or create a `.env` file in the project root directory and add any environment variables you want to set. For example, to set the OpenAI API key, add the following line to the `.env` file:

```bash
OPENAI_API_KEY=your_openai_api_key_here
```

3. Run the following command to start optillm:

```bash
docker compose up -d
```

This will build the Docker image if it doesn't exist and start the optillm service.

4. optillm will be available at `http://localhost:8000`.

When using Docker, you can set these parameters as environment variables. For example, to set the approach and model, you would use:

```bash
OPTILLM_APPROACH=mcts
OPTILLM_MODEL=gpt-4
```

To secure the optillm proxy with an API key, set the `OPTILLM_API_KEY` environment variable:

```bash
OPTILLM_API_KEY=your_secret_api_key
```

When the API key is set, clients must include it in their requests using the `Authorization` header:

```plain
Authorization: Bearer your_secret_api_key
```

## References

Expand Down
38 changes: 38 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
services:
&name optillm:
build:
context: https://github.com/codelion/optillm.git#main
# context: .
dockerfile: Dockerfile
tags:
- optillm:latest
image: optillm:latest
container_name: *name
hostname: *name
ports:
- "8000:8000"
environment:
OPENAI_API_KEY: ${OPENAI_API_KEY:-""}
OPTILLM_BASE_URL: ${OPENAI_BASE_URL:-"https://api.openai.com/v1"}
# OPTILLM_API_KEY: ${OPTILLM_API_KEY:-} # optionally sets an API key for Optillm clients
# Uncomment and set values for other arguments (prefixed with OPTILLM_) as needed, e.g.:
# OPTILLM_APPROACH: auto
# OPTILLM_MODEL: gpt-4o-mini
# OPTILLM_SIMULATIONS: 2
# OPTILLM_EXPLORATION: 0.2
# OPTILLM_DEPTH: 1
# OPTILLM_BEST_OF_N: 3
# OPTILLM_RSTAR_MAX_DEPTH: 3
# OPTILLM_RSTAR_NUM_ROLLOUTS: 5
# OPTILLM_RSTAR_C: 1.4
# OPTILLM_N: 1
# OPTILLM_RETURN_FULL_RESPONSE: false
# OPTILLM_PORT: 8000
restart: on-failure
stop_grace_period: 2s
healthcheck:
test: ["CMD", "curl", "-f", "http://127.0.0.1:8000/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 3s
Loading