Skip to content

Conversation

@jcabrero
Copy link
Member

This PR adds docker compose files for GPT OSS 20B and 120B. Additionally it adds small fixes to two small problems.

Comment on lines +112 to +114
limit = MODEL_CONCURRENT_RATE_LIMIT.get(
chat_request.model, MODEL_CONCURRENT_RATE_LIMIT.get("default", 50)
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is the most relevant. If the MODEL_CONCURRENT_RATE_LIMIT doesn't exist for such given model, it switches to "default" which should work for any model and otherwise 50. This prevents a failure state in most cases.

@jcabrero jcabrero requested review from blefo and Copilot and removed request for Copilot August 27, 2025 07:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for GPT OSS 20B and 120B models by creating their docker compose configurations, while also implementing defensive programming fixes for model validation and rate limiting.

  • Adds docker compose files for GPT OSS 20B and 120B model deployments
  • Implements null/empty string validation for model IDs in the state management
  • Replaces exception-based rate limiting with default fallback logic

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
nilai-api/src/nilai_api/state.py Adds null/empty validation for model_id parameter
nilai-api/src/nilai_api/routers/private.py Replaces KeyError exception with default fallback for rate limits
nilai-api/src/nilai_api/config/config.yaml Adds rate limit configuration for new GPT OSS 20B model and default
docker/vllm.Dockerfile Updates base image to custom jcabrero/vllm version
docker/compose/docker-compose.gpt-20b-gpu.yml New docker compose configuration for GPT OSS 20B
docker/compose/docker-compose.gpt-120b-gpu.yml New docker compose configuration for GPT OSS 120B
.env.sample Adds BRAVE_SEARCH_API environment variable
.env.ci Adds BRAVE_SEARCH_API environment variable for CI

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jcabrero jcabrero merged commit ccebd99 into main Aug 27, 2025
8 checks passed
@jcabrero jcabrero deleted the feat/add_gpt_oss branch August 27, 2025 09:38
@jcabrero jcabrero linked an issue Oct 8, 2025 that may be closed by this pull request
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new models to the catalogue

2 participants