-
Notifications
You must be signed in to change notification settings - Fork 676
feat: python bindings for the entire KvPushRouter #2658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRegisters KvPushRouter and KvPushRouterStream in Python bindings, implements their PyO3 classes enabling async streaming generation, adds a KvRouterConfig::inner accessor for internal use, refactors the mocker engine streaming to unbounded channels, and introduces an end-to-end Python test (duplicated) validating the new push-router bindings. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor PyApp as Python App
participant PyBind as KvPushRouter (Py)
participant RustTask as Rust async task
participant Router as Kv Router
participant Engine as Mocker Engine
participant Stream as KvPushRouterStream (Py)
PyApp->>PyBind: generate(token_ids, model, opts)
activate PyBind
PyBind->>RustTask: spawn task with PreprocessedRequest
note right of RustTask: Builds request and starts streaming
RustTask->>Router: route(request)
Router->>Engine: stream tokens
Engine-->>Router: token events
Router-->>RustTask: token events
RustTask-->>Stream: mpsc send PyObject per event
deactivate PyBind
loop until exhausted
PyApp->>Stream: __anext__()
Stream-->>PyApp: next token/result
end
Stream-->>PyApp: StopAsyncIteration on completion
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
…r configs (#2658) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
…r configs (#2658) Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
…r configs (#2658) Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
…r configs (#2658) Signed-off-by: nnshah1 <neelays@nvidia.com>
Overview:
Bind the entire KvPushRouter to a python object with the generate method. It will accept token ids as List[int] to avoid the extra (de)serialization cycle. The other request fields like sampling and output options are simply pythonized. Closes #2662. Closes #2697
Others
Fixed the mocker engine always generating one fewer token than expected.
Test
Added an e2e test with mockers served with the Python-binded KvPushRouter