This repository contains a Docker Compose configuration for running Ollama with FastAPI wrapper and Caddy reverse proxy.
- Base image:
ollama/ollama:latest
- Provides the core LLM functionality
- GPU support enabled
- Port: 11434
NVIDIA_VISIBLE_DEVICES
: Controls GPU visibility (default: all)OLLAMA_CONCURRENT_REQUESTS
: Number of concurrent requests (default: 1)OLLAMA_QUEUE_ENABLED
: Queue system status (default: true)
- Custom-built service using
Dockerfile.wrapper
- Provides API interface for Ollama
- Port: 5000
PYTHONUNBUFFERED
: Set to 1 for unbuffered outputSESSION_API_KEY
: Optional API key for session management
- Custom-built service using
Dockerfile.caddy
- Serves as reverse proxy
- Port: 3334 (configurable)
PUBLIC_ACCESS_PORT
: Port configuration (default: 3334)
- Clone this repository:
git clone https://github.com/ClinicianFOCUS/local-llm-container.git
cd local-llm-container
- Launch the services:
docker-compose up -d
After container deployment, you can launch models using either the CLI or API:
- Connect to the Ollama container:
docker exec -it ollama-service bash
- Pull your desired model:
ollama pull gemma2:2b-instruct-q8_0
# or any other model
- Run the model:
ollama run gemma2:2b-instruct-q8_0
- Pull a model via API:
curl -X POST http://localhost:3334/api/pull \
-H "Content-Type: application/json" \
-d '{"name": "gemma2:2b-instruct-q8_0"}'
- Generate with the model:
curl -X POST http://localhost:3334/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "gemma2:2b-instruct-q8_0",
"prompt": "Your prompt here"
}'
You can find available models at:
Variable | Default | Description |
---|---|---|
NVIDIA_VISIBLE_DEVICES | all | GPU devices available to Ollama |
OLLAMA_CONCURRENT_REQUESTS | 1 | Maximum concurrent requests |
OLLAMA_QUEUE_ENABLED | true | Enable/disable request queue |
SESSION_API_KEY | - | API key for FastAPI wrapper |
PUBLIC_ACCESS_PORT | 3334 | External port for Caddy |
You can set these variables using the CLI:
Windows:
$env:SESSION_API_KEY="MY_API_KEY_TO_USE__FOR_AUTHENTICATION"
Linux:
export SESSION_API_KEY MY_API_KEY_TO_USE__FOR_AUTHENTICATION
Access the LLM API through the Caddy reverse proxy:
- API Endpoint:
https://localhost:3334/api/
- Docs:
https://github.com/ollama/ollama/blob/main/docs/api.md
This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.