-
Notifications
You must be signed in to change notification settings - Fork 551
fix(llm)!: extract reasoning traces to separate field instead of prepending #1468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the reasoning trace handling in LLM generation responses by introducing a dedicated reasoning_content field in GenerationResponse. Previously, reasoning traces were prepended directly to response content, but they are now exposed separately for cleaner API design and backwards compatibility.
Key Changes:
- Added
reasoning_contentfield toGenerationResponsefor structured access to reasoning traces - When using
GenerationOptions, reasoning content is separated from the main response - When not using
GenerationOptions, reasoning is wrapped in<thinking>tags for backwards compatibility
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| nemoguardrails/rails/llm/options.py | Added reasoning_content field to GenerationResponse model |
| nemoguardrails/rails/llm/llmrails.py | Refactored to populate reasoning_content field and wrap in tags for legacy mode |
| tests/test_llmrails.py | Added comprehensive tests for new reasoning content field behavior |
| tests/test_bot_thinking_events.py | Updated assertion to verify separated reasoning content |
| tests/rails/llm/test_options.py | Added unit tests for GenerationResponse with reasoning content |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
d7da48f to
8e88131
Compare
Documentation preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits + remove pr.md👍
BREAKING CHANGE: Reasoning traces are no longer prepended directly to response content. When using GenerationOptions, reasoning is now available in the separate reasoning_content field. Without GenerationOptions, reasoning is wrapped in <thinking> tags. Refactor reasoning trace handling to expose reasoning content as a separate field in GenerationResponse instead of prepending it to response content. - Add reasoning_content field to GenerationResponse for structured access - Populate reasoning_content in generate_async when using GenerationOptions - Wrap reasoning traces in <thinking> tags for dict/string responses - Update test to reflect new behavior (no longer prepending reasoning) - Add comprehensive tests for new field and behavior changes This improves API usability by separating reasoning content from the actual response, allowing clients to handle thinking traces independently. Resolves the TODO from PR #1432 about not prepending reasoning traces to final generation content.
8e88131 to
d2ce30f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR refactors the reasoning trace API by extracting reasoning content to a dedicated reasoning_content field in GenerationResponse instead of prepending it to response content. The change addresses technical debt from PR #1432 and provides cleaner separation between reasoning traces (from models like DeepSeek-R1) and actual bot responses. When GenerationOptions is used, reasoning is exposed via result.reasoning_content while result.response[0]["content"] contains only the clean response. For backward compatibility with dict/string responses (no GenerationOptions), reasoning is wrapped in <think> tags and prepended to content. This is a breaking change for clients expecting reasoning prepended to response content when using GenerationOptions.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
nemoguardrails/rails/llm/options.py |
5/5 | Added optional reasoning_content field to GenerationResponse model |
tests/rails/llm/test_options.py |
5/5 | Added comprehensive tests for new reasoning_content field covering defaults, serialization, and integration |
tests/test_bot_thinking_events.py |
4/5 | Updated assertions to validate reasoning in separate field instead of prepended to content |
nemoguardrails/rails/llm/llmrails.py |
3/5 | Core logic change: extracts reasoning to separate field or wraps in <think> tags for dict responses |
tests/test_llmrails.py |
4/5 | Added 5 tests validating reasoning_content exposure in both GenerationResponse and dict modes |
Confidence score: 3/5
- This PR requires careful review due to a breaking API change and an inconsistency between documentation and implementation
- Score lowered because: (1) PR description claims
<thinking>tags but implementation uses<think>tags (line 1232 in llmrails.py), (2)extract_bot_thinking_from_events(events)on line 1087 may include reasoning from previous cached turns not just the current response, and (3) breaking change impacts all clients using GenerationOptions that expect reasoning in response content - Pay close attention to
nemoguardrails/rails/llm/llmrails.pylines 1087-1097 and 1232-1234- verify that reasoning extraction from full event history is intentional and confirm the correct tag format (<think>vs<thinking>)
Sequence Diagram
sequenceDiagram
participant User
participant LLMRails
participant Runtime
participant LLM
participant Utils
participant GenerationResponse
User->>LLMRails: "generate_async(messages, options)"
LLMRails->>Runtime: "generate_events(events)"
Runtime->>LLM: "Generate completion with reasoning"
LLM-->>Runtime: "Response with reasoning trace"
Runtime-->>LLMRails: "new_events"
LLMRails->>Utils: "extract_bot_thinking_from_events(events)"
Utils-->>LLMRails: "reasoning_content"
alt With GenerationOptions
LLMRails->>GenerationResponse: "Create response object"
LLMRails->>GenerationResponse: "Set reasoning_content field"
LLMRails->>GenerationResponse: "Set response content (clean, no reasoning)"
GenerationResponse-->>User: "GenerationResponse with separate reasoning_content"
else Without GenerationOptions
LLMRails->>LLMRails: "Wrap reasoning in <think> tags"
LLMRails->>LLMRails: "Prepend to response content"
LLMRails-->>User: "Dict with reasoning wrapped in tags"
end
5 files reviewed, 4 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The developer has addressed two key issues from previous feedback:
-
Line 1087 - Reasoning extraction scope: Changed from extracting reasoning from the full
eventshistory to onlynew_events(current turn). This prevents reasoning content from prior cached turns from leaking into the current response when state/message caching is active. This was a critical fix identified in the previous review as a potential logic bug. -
Lines 1232-1234 - Tag consistency: Changed the backwards compatibility wrapper from
<thinking>to<think>tags to match what most reasoning models use as delimiters. This aligns with the previous review suggestion and ensures consistency with model token conventions.
These changes refine the separation of reasoning traces from response content, ensuring clients receive only the current turn's reasoning in the new reasoning_content field while maintaining backwards compatibility with cleaner tag conventions.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/rails/llm/llmrails.py | 4/5 | Fixed reasoning extraction to use new_events instead of events (line 1087) and updated backwards compatibility tags to <think> (lines 1232-1234) |
Confidence score: 4/5
- This PR is safe to merge with minimal risk
- Score reflects that the critical logic bug (extracting from full event history) has been fixed, and tag consistency has been improved per previous feedback; however, the breaking change nature of this API modification means careful communication to users is essential
- Pay close attention to nemoguardrails/rails/llm/llmrails.py to ensure the
new_eventsextraction behavior works correctly with all caching scenarios
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This update addresses previous review feedback by centralizing test mock paths across the reasoning trace test suite. A new constant REASONING_TRACE_MOCK_PATH was added to conftest.py and adopted in test_bot_thinking_events.py, test_runtime_event_logging.py, and test_llmrails.py. The autouse fixture reset_reasoning_trace was removed from conftest as it's no longer needed after the ContextVar-based reasoning trace storage was replaced with LangChain's additional_kwargs approach in PR #1427. An additional assertion was added to test_bot_thinking_events.py to verify that reasoning traces do not leak into response content, directly validating the core behavioral change described in the PR description.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| tests/conftest.py | 5/5 | Removed deprecated reset_reasoning_trace fixture and added centralized mock path constant |
| tests/test_bot_thinking_events.py | 5/5 | Added assertion to verify reasoning content separation and adopted centralized mock path |
| tests/test_runtime_event_logging.py | 5/5 | Refactored to use centralized mock path constant across three test functions |
| tests/test_llmrails.py | 5/5 | Consolidated five inline mock paths into single centralized constant import |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- All changes are test-only refactoring that improves maintainability by centralizing mock paths and adding one validation assertion; no production code is modified, and the changes directly address previous code review feedback
- No files require special attention
4 files reviewed, no comments
PR Description
Summary
This PR refactors how reasoning traces (thinking content) are exposed in the LLM generation response API. Instead of prepending reasoning traces directly to response content, they are now exposed as a separate
reasoning_contentfield inGenerationResponse.Background
The reasoning trace feature has evolved significantly:
additional_kwargs['reasoning_content']instead of token-based parsing$bot_thinkingvariable for output rails, with a TODO noting that "appending reasoning traces to the final generation might not be desired anymore"This PR addresses that TODO by providing structured access to reasoning content instead of mixing it with the actual response.
Key Changes
GenerationResponse.reasoning_contentcontains the reasoning trace when available<thinking>tags and prepended to contentBehavior Changes
With GenerationOptions (returns GenerationResponse):
result.response[0]["content"]="REASONING... Actual response"result.reasoning_content="REASONING..."ANDresult.response[0]["content"]="Actual response"(clean)Without GenerationOptions (returns dict/string):
result["content"]="REASONING... Actual response"(no wrapper)result["content"]="<think>REASONING...</think>\nActual response"(wrapped in tags)