-
Notifications
You must be signed in to change notification settings - Fork 690
feat: JailedStream #3034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
feat: JailedStream #3034
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: ayushag <ayushag@nvidia.com>
…tion - Implement JailedStream with builder pattern for clean API - Use async-stream's stream! macro for cleaner async implementation - Support configurable jail start/end sequences - Integrate tool call detection and parsing - Add comprehensive tests with shared test utilities - Create documentation for usage patterns The JailedStream provides a standalone solution for accumulating tokens when certain sequences are detected (jailing) and releasing them as a single chunk when the jail ends, enabling proper tool call handling in streaming responses. Signed-off-by: Ryan Olson <rolson@nvidia.com>
- Remove unnecessary annotation code that was never used - Fix efficiency issue by avoiding cloning when not jailing - Only clone response data when actually entering jail state - Simplify jailed behavior to not yield empty chunks - Accumulate content silently and only yield final result - Update tests to match new behavior This makes the stream processing more efficient by: 1. Eliminating unnecessary clones during normal streaming 2. Removing pointless annotation objects 3. Not sending empty chunks while accumulating 4. Only yielding the final aggregated result when jail ends Signed-off-by: Ryan Olson <rolson@nvidia.com>
…rhead - Remove ManyOut type alias and AsyncEngineContextProvider dependency - Change apply() to return impl Stream instead of Pin<Box<ResponseStream>> - Remove unnecessary context extraction and re-wrapping - Use tokio::pin!() for stack pinning instead of Box::pin() (more efficient) - Simplify tests by removing mock context infrastructure - Update documentation with correct usage examples This makes JailedStream a pure stream combinator that: 1. Doesn't handle context (caller's responsibility) 2. Avoids unnecessary boxing in the library 3. Can be composed with other stream transformers 4. Lets callers decide when to add boxing/context Performance improvements: - No heap allocation for pinning (stack pinning instead) - No context passing overhead - Direct stream transformation without wrapper types Signed-off-by: Ryan Olson <rolson@nvidia.com>
- Remove old apply_tool_calling_jail_internal function - Remove PossibleToolCallAnnotation struct and implementations - Update apply_tool_calling_jail_with_parser to conditionally use JailedStream - Only apply jail logic when tool_call_parser is configured - Comment out old jail tests that are no longer applicable - Clean up unused imports and run cargo fmt/clippy
- Change jail entry logic from if/else to evaluate both sequence and tool call conditions - Add early exit detection via should_exit_jail_early() method - Support two paths to enter jail: explicit sequences OR tool call detection - Support two paths to exit jail: end sequences OR complete tool call parsing - Add tests for dual entry paths and early exit behavior - Improve debug logging to show which condition triggered jail start/end
- Separate context passing from stream transformations - Change transform_embedding_postprocessor_stream to return impl Stream - Update transform_postprocessor_stream to accept context as parameter - Defer boxing until the final ResponseStream::new() call - Extract context once at the beginning of generate() methods - Reduce boxing from 5-6 locations to 1 per request - Improves performance by eliminating intermediate heap allocations
- Only apply jail when tools are present AND tool_choice is not None - Error if tool_choice is 'required', 'auto', or named but no parser configured - Skip jail entirely when tool_choice is explicitly None - Add should_apply_tool_jail() to determine jail application logic - Refactor to separate decision logic from stream processing - Avoid lifetime issues by evaluating conditions before stream transformation
- Removed old jail implementation tests that are no longer compatible - Kept test_detect_tool_call_start_different_parsers as it tests parser behavior - Old tests were testing annotation behavior that no longer exists in new JailedStream - New tests in jail.rs provide equivalent coverage
Resolved merge conflicts in .gitignore and preprocessor.rs: - Fixed gitignore format for Ruler generated files - Removed duplicate jail implementation from main - Kept optimized JailedStream with conditional application - Fixed transform_postprocessor_stream signature usage - All tests passing
- Remove unused imports and dead code warnings in test module - Fix test_streaming_usage.rs function signature for transform_postprocessor_stream - Remove test_preprocessor.rs that tested old jail implementation - Add missing context parameter to transform calls - All tests now pass and clippy/fmt are clean
Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
GuanLuo
approved these changes
Sep 22, 2025
…ice; jail: gate common markers when parser is set and clarify local naming; tests: tighten dual-entry path assertions
Contributor
|
/ok test f58c24a |
Signed-off-by: ayushag <ayushag@nvidia.com>
Contributor
|
/ok to test 824bb3d |
Signed-off-by: ayushag <ayushag@nvidia.com>
Contributor
|
/ok to test e04a1cd |
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>
Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>
Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>
Contributor
|
/ok to test f59637e |
jasonqinzhou
pushed a commit
that referenced
this pull request
Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
athreesh
pushed a commit
that referenced
this pull request
Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
jasonqinzhou
pushed a commit
that referenced
this pull request
Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
kylehh
pushed a commit
that referenced
this pull request
Sep 25, 2025
Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: Kyle H <kylhuang@nvidia.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces
JailedStream, a stream processing optimization for tool calling scenarios that improves performance and reliability while maintaining full OpenAI compatibility.Key Features
Recent Major Improvements
Async Removal Refactor
tokio::sync::OnceCellinstead ofstd::sync::OnceLockUTF-8 Safety Fixes
Usage
Testing
Breaking Changes
None - this is a pure optimization that maintains full API compatibility.