Skip to content

Conversation

@ryanolson
Copy link
Contributor

@ryanolson ryanolson commented Sep 15, 2025

Summary

This PR introduces JailedStream, a stream processing optimization for tool calling scenarios that improves performance and reliability while maintaining full OpenAI compatibility.

Key Features

  • Lazy tool parsing: Tool calls only parsed when jailing ends or stream completes
  • UTF-8 safe: All string operations respect character boundaries
  • Multiple parser support: Hermes, Mistral, Phi4, Llama3, and custom formats
  • Comprehensive edge case handling: Partial matches, malformed JSON, early exits
  • Better streaming behavior: Accumulates content for complete tool calls rather than partial processing

Recent Major Improvements

Async Removal Refactor

  • Root cause: Harmony parser used tokio::sync::OnceCell instead of std::sync::OnceLock
  • Impact: Simplified tool parsing chain by removing unnecessary async from several core functions
  • Result: Cleaner and more maintainable code with synchronous tool parsing

UTF-8 Safety Fixes

  • Fixed character boundary violations in prefix matching
  • Added comprehensive Unicode test coverage

Usage

let jail = JailedStream::builder()
    .tool_call_parser("hermes")
    .build();

let optimized_stream = jail.apply(chat_completion_stream);

Testing

  • ✅ 26 comprehensive test cases covering all scenarios
  • ✅ UTF-8 safety validated with Unicode edge cases
  • ✅ All existing tests pass, clippy clean
  • ✅ Full integration with existing OpenAI protocols

Breaking Changes

None - this is a pure optimization that maintains full API compatibility.

elyasmnvidian and others added 14 commits September 10, 2025 15:30
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
…tion

- Implement JailedStream with builder pattern for clean API
- Use async-stream's stream! macro for cleaner async implementation
- Support configurable jail start/end sequences
- Integrate tool call detection and parsing
- Add comprehensive tests with shared test utilities
- Create documentation for usage patterns

The JailedStream provides a standalone solution for accumulating tokens
when certain sequences are detected (jailing) and releasing them as a
single chunk when the jail ends, enabling proper tool call handling in
streaming responses.

Signed-off-by: Ryan Olson <rolson@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 15, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- Remove unnecessary annotation code that was never used
- Fix efficiency issue by avoiding cloning when not jailing
- Only clone response data when actually entering jail state
- Simplify jailed behavior to not yield empty chunks
- Accumulate content silently and only yield final result
- Update tests to match new behavior

This makes the stream processing more efficient by:
1. Eliminating unnecessary clones during normal streaming
2. Removing pointless annotation objects
3. Not sending empty chunks while accumulating
4. Only yielding the final aggregated result when jail ends

Signed-off-by: Ryan Olson <rolson@nvidia.com>
…rhead

- Remove ManyOut type alias and AsyncEngineContextProvider dependency
- Change apply() to return impl Stream instead of Pin<Box<ResponseStream>>
- Remove unnecessary context extraction and re-wrapping
- Use tokio::pin!() for stack pinning instead of Box::pin() (more efficient)
- Simplify tests by removing mock context infrastructure
- Update documentation with correct usage examples

This makes JailedStream a pure stream combinator that:
1. Doesn't handle context (caller's responsibility)
2. Avoids unnecessary boxing in the library
3. Can be composed with other stream transformers
4. Lets callers decide when to add boxing/context

Performance improvements:
- No heap allocation for pinning (stack pinning instead)
- No context passing overhead
- Direct stream transformation without wrapper types

Signed-off-by: Ryan Olson <rolson@nvidia.com>
Base automatically changed from elyas/streamtool to main September 15, 2025 19:04
- Remove old apply_tool_calling_jail_internal function
- Remove PossibleToolCallAnnotation struct and implementations
- Update apply_tool_calling_jail_with_parser to conditionally use JailedStream
- Only apply jail logic when tool_call_parser is configured
- Comment out old jail tests that are no longer applicable
- Clean up unused imports and run cargo fmt/clippy
- Change jail entry logic from if/else to evaluate both sequence and tool call conditions
- Add early exit detection via should_exit_jail_early() method
- Support two paths to enter jail: explicit sequences OR tool call detection
- Support two paths to exit jail: end sequences OR complete tool call parsing
- Add tests for dual entry paths and early exit behavior
- Improve debug logging to show which condition triggered jail start/end
- Separate context passing from stream transformations
- Change transform_embedding_postprocessor_stream to return impl Stream
- Update transform_postprocessor_stream to accept context as parameter
- Defer boxing until the final ResponseStream::new() call
- Extract context once at the beginning of generate() methods
- Reduce boxing from 5-6 locations to 1 per request
- Improves performance by eliminating intermediate heap allocations
- Only apply jail when tools are present AND tool_choice is not None
- Error if tool_choice is 'required', 'auto', or named but no parser configured
- Skip jail entirely when tool_choice is explicitly None
- Add should_apply_tool_jail() to determine jail application logic
- Refactor to separate decision logic from stream processing
- Avoid lifetime issues by evaluating conditions before stream transformation
- Removed old jail implementation tests that are no longer compatible
- Kept test_detect_tool_call_start_different_parsers as it tests parser behavior
- Old tests were testing annotation behavior that no longer exists in new JailedStream
- New tests in jail.rs provide equivalent coverage
Resolved merge conflicts in .gitignore and preprocessor.rs:
- Fixed gitignore format for Ruler generated files
- Removed duplicate jail implementation from main
- Kept optimized JailedStream with conditional application
- Fixed transform_postprocessor_stream signature usage
- All tests passing
- Remove unused imports and dead code warnings in test module
- Fix test_streaming_usage.rs function signature for transform_postprocessor_stream
- Remove test_preprocessor.rs that tested old jail implementation
- Add missing context parameter to transform calls
- All tests now pass and clippy/fmt are clean
@ryanolson ryanolson marked this pull request as ready for review September 16, 2025 11:41
@ryanolson ryanolson requested a review from a team as a code owner September 16, 2025 11:41
elyasmnvidian and others added 3 commits September 22, 2025 14:57
Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv ayushag-nv enabled auto-merge (squash) September 22, 2025 22:13
elyasmnvidian and others added 2 commits September 22, 2025 15:19
…ice; jail: gate common markers when parser is set and clarify local naming; tests: tighten dual-entry path assertions
@ayushag-nv
Copy link
Contributor

/ok test f58c24a

Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv
Copy link
Contributor

/ok to test 824bb3d

Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv
Copy link
Contributor

/ok to test e04a1cd

ayushag-nv and others added 5 commits September 23, 2025 05:29
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>
Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>
Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>
@elyasmnvidian
Copy link
Contributor

/ok to test f59637e

@ayushag-nv ayushag-nv merged commit c63ccea into main Sep 23, 2025
17 of 18 checks passed
@ayushag-nv ayushag-nv deleted the ryan/streamtool branch September 23, 2025 22:24
jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
athreesh pushed a commit that referenced this pull request Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
kylehh pushed a commit that referenced this pull request Sep 25, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Kyle H <kylhuang@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants