-
Notifications
You must be signed in to change notification settings - Fork 10
feat: Add GPT OSS 20B and 120B #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
40a1762 to
bedab1b
Compare
bedab1b to
a728f91
Compare
| limit = MODEL_CONCURRENT_RATE_LIMIT.get( | ||
| chat_request.model, MODEL_CONCURRENT_RATE_LIMIT.get("default", 50) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is the most relevant. If the MODEL_CONCURRENT_RATE_LIMIT doesn't exist for such given model, it switches to "default" which should work for any model and otherwise 50. This prevents a failure state in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for GPT OSS 20B and 120B models by creating their docker compose configurations, while also implementing defensive programming fixes for model validation and rate limiting.
- Adds docker compose files for GPT OSS 20B and 120B model deployments
- Implements null/empty string validation for model IDs in the state management
- Replaces exception-based rate limiting with default fallback logic
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| nilai-api/src/nilai_api/state.py | Adds null/empty validation for model_id parameter |
| nilai-api/src/nilai_api/routers/private.py | Replaces KeyError exception with default fallback for rate limits |
| nilai-api/src/nilai_api/config/config.yaml | Adds rate limit configuration for new GPT OSS 20B model and default |
| docker/vllm.Dockerfile | Updates base image to custom jcabrero/vllm version |
| docker/compose/docker-compose.gpt-20b-gpu.yml | New docker compose configuration for GPT OSS 20B |
| docker/compose/docker-compose.gpt-120b-gpu.yml | New docker compose configuration for GPT OSS 120B |
| .env.sample | Adds BRAVE_SEARCH_API environment variable |
| .env.ci | Adds BRAVE_SEARCH_API environment variable for CI |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
fd7c6eb to
f7a02cd
Compare
This PR adds docker compose files for GPT OSS 20B and 120B. Additionally it adds small fixes to two small problems.