fix(llm)!: extract reasoning traces to separate field instead of prepending #1468

Pouyanpi · 2025-10-23T08:58:07Z

PR Description

Summary

This PR refactors how reasoning traces (thinking content) are exposed in the LLM generation response API. Instead of prepending reasoning traces directly to response content, they are now exposed as a separate reasoning_content field in GenerationResponse.

Background

The reasoning trace feature has evolved significantly:

March 2025 (PR Feat: Support Models with Reasoning Traces #996): Initial support for reasoning models like DeepSeek-R1, which output reasoning traces that needed to be filtered/preserved
April 2025: Reasoning traces were first prepended to response content for "traceability and debugging"
May 2025 (PR feat!: add support for preserving and optionally applying guardrails to reasoning traces #1145): Enhanced to support applying guardrails to reasoning traces, with extraction via context variables
October 2025 (PR refactor!: drop reasoning trace extraction logic #1427): Refactored to use LangChain's additional_kwargs['reasoning_content'] instead of token-based parsing
October 2025 (PR feat: emit BotThinking events with reasoning traces #1432): Added BotThinking events and $bot_thinking variable for output rails, with a TODO noting that "appending reasoning traces to the final generation might not be desired anymore"

This PR addresses that TODO by providing structured access to reasoning content instead of mixing it with the actual response.

Key Changes

New API field: GenerationResponse.reasoning_content contains the reasoning trace when available
Backwards compatibility: When returning dict/string responses (without GenerationOptions), reasoning is wrapped in <thinking> tags and prepended to content
Cleaner separation: Clients can now access reasoning content separately from the actual response

Behavior Changes

With GenerationOptions (returns GenerationResponse):

Before: result.response[0]["content"] = "REASONING... Actual response"
After: result.reasoning_content = "REASONING..." AND result.response[0]["content"] = "Actual response" (clean)
⚠️ Breaking: Code expecting reasoning in response content will break

Without GenerationOptions (returns dict/string):

Before: result["content"] = "REASONING... Actual response" (no wrapper)
After: result["content"] = "<think>REASONING...</think>\nActual response" (wrapped in tags)

codecov-commenter · 2025-10-23T09:03:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot

Pull Request Overview

This PR refactors the reasoning trace handling in LLM generation responses by introducing a dedicated reasoning_content field in GenerationResponse. Previously, reasoning traces were prepended directly to response content, but they are now exposed separately for cleaner API design and backwards compatibility.

Key Changes:

Added reasoning_content field to GenerationResponse for structured access to reasoning traces
When using GenerationOptions, reasoning content is separated from the main response
When not using GenerationOptions, reasoning is wrapped in <thinking> tags for backwards compatibility

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
nemoguardrails/rails/llm/options.py	Added `reasoning_content` field to `GenerationResponse` model
nemoguardrails/rails/llm/llmrails.py	Refactored to populate `reasoning_content` field and wrap in tags for legacy mode
tests/test_llmrails.py	Added comprehensive tests for new reasoning content field behavior
tests/test_bot_thinking_events.py	Updated assertion to verify separated reasoning content
tests/rails/llm/test_options.py	Added unit tests for `GenerationResponse` with reasoning content

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

github-actions · 2025-10-23T09:37:37Z

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1468

trebedea

Some nits + remove pr.md👍

nemoguardrails/rails/llm/llmrails.py

BREAKING CHANGE: Reasoning traces are no longer prepended directly to response content. When using GenerationOptions, reasoning is now available in the separate reasoning_content field. Without GenerationOptions, reasoning is wrapped in <thinking> tags. Refactor reasoning trace handling to expose reasoning content as a separate field in GenerationResponse instead of prepending it to response content. - Add reasoning_content field to GenerationResponse for structured access - Populate reasoning_content in generate_async when using GenerationOptions - Wrap reasoning traces in <thinking> tags for dict/string responses - Update test to reflect new behavior (no longer prepending reasoning) - Add comprehensive tests for new field and behavior changes This improves API usability by separating reasoning content from the actual response, allowing clients to handle thinking traces independently. Resolves the TODO from PR #1432 about not prepending reasoning traces to final generation content.

greptile-apps

Greptile Overview

Greptile Summary

This PR refactors the reasoning trace API by extracting reasoning content to a dedicated reasoning_content field in GenerationResponse instead of prepending it to response content. The change addresses technical debt from PR #1432 and provides cleaner separation between reasoning traces (from models like DeepSeek-R1) and actual bot responses. When GenerationOptions is used, reasoning is exposed via result.reasoning_content while result.response[0]["content"] contains only the clean response. For backward compatibility with dict/string responses (no GenerationOptions), reasoning is wrapped in <think> tags and prepended to content. This is a breaking change for clients expecting reasoning prepended to response content when using GenerationOptions.

Important Files Changed

Filename	Score	Overview
`nemoguardrails/rails/llm/options.py`	5/5	Added optional `reasoning_content` field to `GenerationResponse` model
`tests/rails/llm/test_options.py`	5/5	Added comprehensive tests for new `reasoning_content` field covering defaults, serialization, and integration
`tests/test_bot_thinking_events.py`	4/5	Updated assertions to validate reasoning in separate field instead of prepended to content
`nemoguardrails/rails/llm/llmrails.py`	3/5	Core logic change: extracts reasoning to separate field or wraps in `<think>` tags for dict responses
`tests/test_llmrails.py`	4/5	Added 5 tests validating reasoning_content exposure in both GenerationResponse and dict modes

Confidence score: 3/5

This PR requires careful review due to a breaking API change and an inconsistency between documentation and implementation
Score lowered because: (1) PR description claims <thinking> tags but implementation uses <think> tags (line 1232 in llmrails.py), (2) extract_bot_thinking_from_events(events) on line 1087 may include reasoning from previous cached turns not just the current response, and (3) breaking change impacts all clients using GenerationOptions that expect reasoning in response content
Pay close attention to nemoguardrails/rails/llm/llmrails.py lines 1087-1097 and 1232-1234- verify that reasoning extraction from full event history is intentional and confirm the correct tag format (<think> vs <thinking>)

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMRails
    participant Runtime
    participant LLM
    participant Utils
    participant GenerationResponse

    User->>LLMRails: "generate_async(messages, options)"
    LLMRails->>Runtime: "generate_events(events)"
    Runtime->>LLM: "Generate completion with reasoning"
    LLM-->>Runtime: "Response with reasoning trace"
    Runtime-->>LLMRails: "new_events"
    LLMRails->>Utils: "extract_bot_thinking_from_events(events)"
    Utils-->>LLMRails: "reasoning_content"
    
    alt With GenerationOptions
        LLMRails->>GenerationResponse: "Create response object"
        LLMRails->>GenerationResponse: "Set reasoning_content field"
        LLMRails->>GenerationResponse: "Set response content (clean, no reasoning)"
        GenerationResponse-->>User: "GenerationResponse with separate reasoning_content"
    else Without GenerationOptions
        LLMRails->>LLMRails: "Wrap reasoning in <think> tags"
        LLMRails->>LLMRails: "Prepend to response content"
        LLMRails-->>User: "Dict with reasoning wrapped in tags"
    end

_{5 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

tests/test_bot_thinking_events.py

nemoguardrails/rails/llm/llmrails.py

tests/test_llmrails.py

greptile-apps

Greptile Overview

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The developer has addressed two key issues from previous feedback:

Line 1087 - Reasoning extraction scope: Changed from extracting reasoning from the full events history to only new_events (current turn). This prevents reasoning content from prior cached turns from leaking into the current response when state/message caching is active. This was a critical fix identified in the previous review as a potential logic bug.
Lines 1232-1234 - Tag consistency: Changed the backwards compatibility wrapper from <thinking> to <think> tags to match what most reasoning models use as delimiters. This aligns with the previous review suggestion and ensures consistency with model token conventions.

These changes refine the separation of reasoning traces from response content, ensuring clients receive only the current turn's reasoning in the new reasoning_content field while maintaining backwards compatibility with cleaner tag conventions.

Important Files Changed

Filename	Score	Overview
nemoguardrails/rails/llm/llmrails.py	4/5	Fixed reasoning extraction to use `new_events` instead of `events` (line 1087) and updated backwards compatibility tags to `<think>` (lines 1232-1234)

Confidence score: 4/5

This PR is safe to merge with minimal risk
Score reflects that the critical logic bug (extracting from full event history) has been fixed, and tag consistency has been improved per previous feedback; however, the breaking change nature of this API modification means careful communication to users is essential
Pay close attention to nemoguardrails/rails/llm/llmrails.py to ensure the new_events extraction behavior works correctly with all caching scenarios

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

Greptile Overview

Greptile Summary

This update addresses previous review feedback by centralizing test mock paths across the reasoning trace test suite. A new constant REASONING_TRACE_MOCK_PATH was added to conftest.py and adopted in test_bot_thinking_events.py, test_runtime_event_logging.py, and test_llmrails.py. The autouse fixture reset_reasoning_trace was removed from conftest as it's no longer needed after the ContextVar-based reasoning trace storage was replaced with LangChain's additional_kwargs approach in PR #1427. An additional assertion was added to test_bot_thinking_events.py to verify that reasoning traces do not leak into response content, directly validating the core behavioral change described in the PR description.

Important Files Changed

Filename	Score	Overview
tests/conftest.py	5/5	Removed deprecated `reset_reasoning_trace` fixture and added centralized mock path constant
tests/test_bot_thinking_events.py	5/5	Added assertion to verify reasoning content separation and adopted centralized mock path
tests/test_runtime_event_logging.py	5/5	Refactored to use centralized mock path constant across three test functions
tests/test_llmrails.py	5/5	Consolidated five inline mock paths into single centralized constant import

Confidence score: 5/5

This PR is safe to merge with minimal risk
All changes are test-only refactoring that improves maintainability by centralizing mock paths and adding one validation assertion; no production code is modified, and the changes directly address previous code review feedback
No files require special attention

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Pouyanpi requested review from Copilot, tgasser-nv and trebedea and removed request for tgasser-nv and trebedea October 23, 2025 09:19

Pouyanpi changed the title ~~feat(llm): add reasoning_content field to GenerationResponse~~ feat(llm)!: add reasoning_content field to GenerationResponse Oct 23, 2025

Copilot AI reviewed Oct 23, 2025

View reviewed changes

Pouyanpi changed the title ~~feat(llm)!: add reasoning_content field to GenerationResponse~~ fix(llm)!: extract reasoning traces to separate field instead of prepending Oct 23, 2025

Pouyanpi force-pushed the feat/reasoning-content-field branch from d7da48f to 8e88131 Compare October 23, 2025 09:35

trebedea approved these changes Oct 24, 2025

View reviewed changes

nemoguardrails/rails/llm/llmrails.py Outdated Show resolved Hide resolved

nemoguardrails/rails/llm/llmrails.py Outdated Show resolved Hide resolved

Pouyanpi force-pushed the feat/reasoning-content-field branch from 8e88131 to d2ce30f Compare October 24, 2025 11:40

apply review suggestions

987cc63

greptile-apps bot reviewed Oct 24, 2025

View reviewed changes

tests/test_bot_thinking_events.py Show resolved Hide resolved

nemoguardrails/rails/llm/llmrails.py Show resolved Hide resolved

nemoguardrails/rails/llm/llmrails.py Outdated Show resolved Hide resolved

tests/test_llmrails.py Outdated Show resolved Hide resolved

use new_events

8641d77

greptile-apps bot reviewed Oct 24, 2025

View reviewed changes

Pouyanpi added 2 commits October 24, 2025 14:30

update test

29bb0d3

apply review suggestions

6e1a163

greptile-apps bot reviewed Oct 24, 2025

View reviewed changes

Pouyanpi merged commit f7b5daf into develop Oct 24, 2025
8 checks passed

Pouyanpi deleted the feat/reasoning-content-field branch October 24, 2025 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(llm)!: extract reasoning traces to separate field instead of prepending #1468

fix(llm)!: extract reasoning traces to separate field instead of prepending #1468

Uh oh!

Pouyanpi commented Oct 23, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

trebedea left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(llm)!: extract reasoning traces to separate field instead of prepending #1468

fix(llm)!: extract reasoning traces to separate field instead of prepending #1468

Uh oh!

Conversation

Pouyanpi commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Summary

Background

Key Changes

Behavior Changes

Uh oh!

codecov-commenter commented Oct 23, 2025

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

github-actions bot commented Oct 23, 2025

Documentation preview

Uh oh!

trebedea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 3/5

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 4/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 5/5

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pouyanpi commented Oct 23, 2025 •

edited

Loading