Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
f81e3a9
feat: always stream for tool calling
elyasmnvidian Sep 8, 2025
03a0c56
chore: moved tool parsing to preprocessor.rs
elyasmnvidian Sep 8, 2025
218f6c5
chore: added unit tests
elyasmnvidian Sep 8, 2025
ea6fd60
chore: rebase and updated function calls
elyasmnvidian Sep 9, 2025
de7439a
fix: curl doesn't break now
ayushag-nv Sep 9, 2025
2a0b72d
chore: enabled stream=true for tool choice
elyasmnvidian Sep 9, 2025
3e00b12
chore: fix aggregator - clippy to be fixed
ayushag-nv Sep 10, 2025
cc713c6
fix: fixed rebased artifacts
ayushag-nv Sep 10, 2025
07b0572
fix: fixed tests and rebase artifacts
ayushag-nv Sep 10, 2025
f3b05d9
fix: dyn namespace scoping for trtllm (#2970)
biswapanda Sep 10, 2025
dd99c53
Merge branch 'main' into elyas/streamtool
ayushag-nv Sep 10, 2025
e9fa34c
Merge branch 'main' into elyas/streamtool
ayushag-nv Sep 15, 2025
8db8fea
Merge branch 'main' into elyas/streamtool
ayushag-nv Sep 15, 2025
290859a
feat: add standalone JailedStream implementation for token jail detec…
ryanolson Sep 15, 2025
1c83f8f
refactor: optimize JailedStream for better performance
ryanolson Sep 15, 2025
f0ed9f8
perf: optimize JailedStream to use impl Stream and remove context ove…
ryanolson Sep 15, 2025
4b83cca
refactor: update preprocessor to use new JailedStream implementation
ryanolson Sep 15, 2025
9a830aa
feat: add dual entry/exit paths for JailedStream
ryanolson Sep 15, 2025
829144e
refactor: optimize stream transformations to reduce boxing
ryanolson Sep 15, 2025
78a55a7
feat: add conditional tool jail application based on tool_choice
ryanolson Sep 16, 2025
8970fb3
chore: clean up commented-out tests and add gitignore changes
ryanolson Sep 16, 2025
1521bf9
Merge branch 'main' into ryan/streamtool
ryanolson Sep 16, 2025
ae23845
fix: resolve clippy warnings and test compilation errors
ryanolson Sep 16, 2025
e156458
test: add comprehensive jail functionality test coverage
ryanolson Sep 16, 2025
e027e61
fix: preserve Annotated metadata through jail processing
ryanolson Sep 16, 2025
3d552d3
fix: preserve trailing content after jail end markers
ryanolson Sep 16, 2025
3428610
feat: refactor jail state to support independent multi-choice processing
ryanolson Sep 16, 2025
c3cf97d
feat: implement independent multi-choice jailing architecture
ryanolson Sep 17, 2025
a3e4512
fix: separate trailing content emission for independent choice jailing
ryanolson Sep 17, 2025
5b9a665
feat: implement partial marker matching for streaming tool calls
ryanolson Sep 17, 2025
7f6c62c
refactor: standardize jail tests with human-readable assertions
ryanolson Sep 17, 2025
d40ad65
refactor: standardize jail test assertions for improved readability
ryanolson Sep 17, 2025
3128bf2
test: add comprehensive edge case tests for prefix matcher
ryanolson Sep 17, 2025
5158aea
fix: implement UTF-8 safe slicing in prefix matcher
ryanolson Sep 18, 2025
c10e14a
refactor: remove unnecessary async from tool parsing chain
ryanolson Sep 18, 2025
081126b
fix: address PR review comments for JailedStream
ryanolson Sep 19, 2025
dcc3769
chore: move tests to test_jail.rs
ayushag-nv Sep 22, 2025
728259e
chore: update cargo lock
elyasmnvidian Sep 22, 2025
a32adf4
Merge branch 'main' into ryan/streamtool
elyasmnvidian Sep 22, 2025
15b4aca
fix: bugs
ayushag-nv Sep 22, 2025
265870c
chore: fix unit test #1
elyasmnvidian Sep 22, 2025
6b26d80
fix: more bugs
ayushag-nv Sep 22, 2025
3b7c8ad
Merge branch 'main' into ryan/streamtool
ayushag-nv Sep 22, 2025
2266c81
fix: clippy
ayushag-nv Sep 22, 2025
8acf29f
chore: fix await unit test for harmony
elyasmnvidian Sep 22, 2025
d87a910
fix: harmony
ayushag-nv Sep 22, 2025
2b4c77e
Merge branch 'main' into ryan/streamtool
ayushag-nv Sep 22, 2025
72bd750
preprocessor: warn and proceed when no parser configured for tool_cho…
elyasmnvidian Sep 22, 2025
f58c24a
Merge branch 'main' into ryan/streamtool
ayushag-nv Sep 22, 2025
824bb3d
fix: cargo fmt
ayushag-nv Sep 22, 2025
e04a1cd
fix: fmt parsers
ayushag-nv Sep 22, 2025
53e6ada
fix: ci bugs
ayushag-nv Sep 23, 2025
29dff4b
Merge branch 'main' into ryan/streamtool
ayushag-nv Sep 23, 2025
f4ca63e
chore: fix await unit test for harmony #2
elyasmnvidian Sep 23, 2025
4d40c33
fix: revert back to Grahams fix and make jail async
elyasmnvidian Sep 23, 2025
f59637e
chore: clippy #3
elyasmnvidian Sep 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,6 @@ generated-values.yaml
**/.devcontainer/.env
TensorRT-LLM


# Ruler Generated Files
/.cursor/instructions.md
/.cursor/instructions.md.bak
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

128 changes: 128 additions & 0 deletions JAILED_STREAM_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# JailedStream Implementation

## Overview

The `JailedStream` is a standalone implementation for handling "jail" detection in token streams. It provides a clean, builder-based API for accumulating tokens when certain sequences are detected, then releasing them as a single chunk when the jail ends.

## Key Features

- **Builder Pattern**: Clean configuration API using the builder pattern
- **Configurable Sequences**: Support for multiple start/end jail sequences
- **Tool Call Parsing**: Integrated tool call detection and parsing
- **Stream Macro**: Uses `async-stream::stream!` for clean async implementation
- **Standalone**: Completely independent of existing code
- **Annotations**: Preserves annotations for observability

## Implementation

### Location
- Main implementation: `lib/llm/src/protocols/openai/chat_completions/jail.rs`
- Examples: `lib/llm/src/protocols/openai/chat_completions/jail_example.rs`

### Usage

```rust
use crate::protocols::openai::chat_completions::jail::JailedStream;
use dynamo_runtime::engine::{AsyncEngineContextProvider, ResponseStream};

// Get your ResponseStream with context
let response_stream: Pin<Box<ResponseStream<_>>> = get_stream_from_engine();

// Extract context BEFORE passing to apply
let context = response_stream.context();

// Apply jail transformation (ResponseStream implements Stream)
let jail = JailedStream::builder()
.tool_call_parser("nemotron_deci")
.build();

let jailed_stream = jail.apply(response_stream);

// Re-wrap with context when needed for engine consumption
let final_stream = ResponseStream::new(Box::pin(jailed_stream), context);
```

### Advanced Configuration

```rust
// With custom jail sequences
let jail = JailedStream::builder()
.jail_start_sequence("<TOOLCALL>")
.jail_end_sequence("</TOOLCALL>")
.tool_call_parser("nemotron_deci")
.build();

// With multiple sequences
let jail = JailedStream::builder()
.jail_start_sequences(vec!["<TOOLCALL>", "<FUNCTION>"])
.jail_end_sequences(vec!["</TOOLCALL>", "</FUNCTION>"])
.tool_call_parser("harmony")
.build();
```

## How It Works

1. **Detection**: When a jail start sequence (or tool call start) is detected, the stream enters "jail" mode
2. **Accumulation**: While jailed, tokens are accumulated in memory instead of being yielded
3. **Annotations**: Empty chunks with annotations are sent downstream for observability
4. **Release**: When a jail end sequence is detected OR the stream ends:
- Accumulated content is parsed for tool calls
- A single chunk with the parsed content is yielded
5. **Pass-through**: Non-jailed content passes through unchanged

## Testing

The implementation includes comprehensive tests:

- `test_jailed_stream_with_start_end_sequences`: Tests explicit jail sequences
- `test_jailed_stream_with_tool_calls`: Tests tool call detection and parsing
- `test_jailed_stream_no_jailing`: Tests normal pass-through behavior

Run tests with:
```bash
cargo test -p dynamo-llm jail --lib
```

## Benefits

1. **Standalone**: No modifications to existing code required
2. **Clean API**: Builder pattern makes configuration intuitive
3. **Flexible**: Supports multiple jail detection strategies
4. **Maintainable**: Uses `stream!` macro for cleaner async code
5. **Testable**: Comprehensive test suite with shared utilities
6. **Efficient**: No unnecessary boxing or context handling in the library
7. **Composable**: Can chain multiple stream transformers before re-adding context

## Performance Optimizations

- **No Boxing in Library**: Returns `impl Stream` instead of `Pin<Box<ResponseStream>>`
- **Stack Pinning**: Uses `tokio::pin!()` instead of `Box::pin()` for better performance
- **No Context Overhead**: JailedStream doesn't manage AsyncEngineContext
- **Lazy Evaluation**: Only processes what's needed
- **Efficient State Management**: Minimal cloning, only when entering jail state

## Integration Options

To replace the existing `apply_tool_calling_jail_internal` function:

```rust
// In preprocessor.rs
pub fn apply_tool_calling_jail_with_parser(
&self,
stream: ManyOut<Annotated<NvCreateChatCompletionStreamResponse>>,
) -> ManyOut<Annotated<NvCreateChatCompletionStreamResponse>> {
let jail = JailedStream::builder()
.tool_call_parser(self.tool_call_parser.clone())
.build();

jail.apply(stream)
}
```

## Future Enhancements

- Add support for regex patterns for jail sequences
- Add metrics/telemetry for jail detection
- Support for partial sequence matching across chunk boundaries
- Configurable accumulation limits
- Support for nested jails
1 change: 1 addition & 0 deletions lib/bindings/python/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions lib/llm/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ required-features = ["block-manager", "testing-cuda"]
dynamo-runtime = { workspace = true }

# workspace
aho-corasick = "1.1"
anyhow = { workspace = true }
dynamo-async-openai = { workspace = true }
dynamo-parsers = { workspace = true}
Expand Down
1 change: 1 addition & 0 deletions lib/llm/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ pub mod request_template;
pub mod tokenizers;
pub mod tokens;
pub mod types;
pub mod utils;

#[cfg(feature = "block-manager")]
pub mod block_manager;
Expand Down
Loading
Loading