-
Notifications
You must be signed in to change notification settings - Fork 686
feat: Add trtllm deploy examples for k8s #2133 #2207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
|
Caution Review failedFailed to post review comments. Configuration used: .coderabbit.yaml 📒 Files selected for processing (13)
🧰 Additional context used🧠 Learnings (11)📓 Common learningscomponents/backends/sglang/README.md (1)Learnt from: dmitry-tokarev-nv components/backends/vllm/README.md (1)Learnt from: dmitry-tokarev-nv examples/README.md (1)Learnt from: PeaBrane container/Dockerfile.sglang (1)Learnt from: grahamking components/backends/trtllm/README.md (1)Learnt from: julienmancuso README.md (2)Learnt from: dmitry-tokarev-nv Learnt from: biswapanda components/backends/trtllm/deploy/disagg.yaml (4)Learnt from: julienmancuso Learnt from: julienmancuso Learnt from: julienmancuso Learnt from: nnshah1 components/backends/trtllm/deploy/disagg_router.yaml (3)Learnt from: biswapanda Learnt from: julienmancuso Learnt from: julienmancuso components/backends/trtllm/deploy/agg.yaml (4)Learnt from: julienmancuso Learnt from: julienmancuso Learnt from: julienmancuso Learnt from: nnshah1 components/backends/trtllm/deploy/agg_router.yaml (3)Learnt from: biswapanda Learnt from: julienmancuso Learnt from: julienmancuso 🧬 Code Graph Analysis (1)components/backends/vllm/src/dynamo/vllm/args.py (1)
🔇 Additional comments (15)
WalkthroughThis update introduces a modular port allocation system for the vLLM backend, adds a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant vLLM Args Parser
participant Ports Module
participant ETCD
User->>vLLM Args Parser: Launch with CLI args (including --dynamo-port-min/max)
vLLM Args Parser->>Ports Module: Request port/block allocation (with range, metadata)
Ports Module->>Ports Module: Hold candidate port(s) via sockets
Ports Module->>ETCD: Atomically reserve port(s) with metadata
ETCD-->>Ports Module: Confirmation of reservation
Ports Module-->>vLLM Args Parser: Return allocated port(s)
vLLM Args Parser-->>User: Set environment/config with reserved ports
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com> (cherry picked from commit 65e89b3)
Overview:
Adds trtllm deploy example for K8s.
Cherrypick #2133
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Documentation
Refactor
Style