Skip to content

Conversation

@Pouyanpi
Copy link
Collaborator

@Pouyanpi Pouyanpi commented Oct 23, 2025

PR Description

Summary

This PR refactors how reasoning traces (thinking content) are exposed in the LLM generation response API. Instead of prepending reasoning traces directly to response content, they are now exposed as a separate reasoning_content field in GenerationResponse.

Background

The reasoning trace feature has evolved significantly:

  1. March 2025 (PR Feat: Support Models with Reasoning Traces #996): Initial support for reasoning models like DeepSeek-R1, which output reasoning traces that needed to be filtered/preserved
  2. April 2025: Reasoning traces were first prepended to response content for "traceability and debugging"
  3. May 2025 (PR feat!: add support for preserving and optionally applying guardrails to reasoning traces #1145): Enhanced to support applying guardrails to reasoning traces, with extraction via context variables
  4. October 2025 (PR refactor!: drop reasoning trace extraction logic #1427): Refactored to use LangChain's additional_kwargs['reasoning_content'] instead of token-based parsing
  5. October 2025 (PR feat: emit BotThinking events with reasoning traces #1432): Added BotThinking events and $bot_thinking variable for output rails, with a TODO noting that "appending reasoning traces to the final generation might not be desired anymore"

This PR addresses that TODO by providing structured access to reasoning content instead of mixing it with the actual response.

Key Changes

  • New API field: GenerationResponse.reasoning_content contains the reasoning trace when available
  • Backwards compatibility: When returning dict/string responses (without GenerationOptions), reasoning is wrapped in <thinking> tags and prepended to content
  • Cleaner separation: Clients can now access reasoning content separately from the actual response

Behavior Changes

With GenerationOptions (returns GenerationResponse):

  • Before: result.response[0]["content"] = "REASONING... Actual response"
  • After: result.reasoning_content = "REASONING..." AND result.response[0]["content"] = "Actual response" (clean)
  • ⚠️ Breaking: Code expecting reasoning in response content will break

Without GenerationOptions (returns dict/string):

  • Before: result["content"] = "REASONING... Actual response" (no wrapper)
  • After: result["content"] = "<think>REASONING...</think>\nActual response" (wrapped in tags)

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Pouyanpi Pouyanpi requested review from Copilot, tgasser-nv and trebedea and removed request for tgasser-nv and trebedea October 23, 2025 09:19
@Pouyanpi Pouyanpi changed the title feat(llm): add reasoning_content field to GenerationResponse feat(llm)!: add reasoning_content field to GenerationResponse Oct 23, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the reasoning trace handling in LLM generation responses by introducing a dedicated reasoning_content field in GenerationResponse. Previously, reasoning traces were prepended directly to response content, but they are now exposed separately for cleaner API design and backwards compatibility.

Key Changes:

  • Added reasoning_content field to GenerationResponse for structured access to reasoning traces
  • When using GenerationOptions, reasoning content is separated from the main response
  • When not using GenerationOptions, reasoning is wrapped in <thinking> tags for backwards compatibility

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
nemoguardrails/rails/llm/options.py Added reasoning_content field to GenerationResponse model
nemoguardrails/rails/llm/llmrails.py Refactored to populate reasoning_content field and wrap in tags for legacy mode
tests/test_llmrails.py Added comprehensive tests for new reasoning content field behavior
tests/test_bot_thinking_events.py Updated assertion to verify separated reasoning content
tests/rails/llm/test_options.py Added unit tests for GenerationResponse with reasoning content

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@Pouyanpi Pouyanpi changed the title feat(llm)!: add reasoning_content field to GenerationResponse fix(llm)!: extract reasoning traces to separate field instead of prepending Oct 23, 2025
@Pouyanpi Pouyanpi force-pushed the feat/reasoning-content-field branch from d7da48f to 8e88131 Compare October 23, 2025 09:35
@github-actions
Copy link
Contributor

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1468

Copy link
Member

@trebedea trebedea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits + remove pr.md👍

BREAKING CHANGE: Reasoning traces are no longer prepended directly to
response content. When using GenerationOptions, reasoning is now
available
in the separate reasoning_content field. Without GenerationOptions,
reasoning is wrapped in <thinking> tags.

Refactor reasoning trace handling to expose reasoning content as a
separate
field in GenerationResponse instead of prepending it to response
content.

- Add reasoning_content field to GenerationResponse for structured
access
- Populate reasoning_content in generate_async when using
GenerationOptions
- Wrap reasoning traces in <thinking> tags for dict/string responses
- Update test to reflect new behavior (no longer prepending reasoning)
- Add comprehensive tests for new field and behavior changes

This improves API usability by separating reasoning content from the
actual
response, allowing clients to handle thinking traces independently.

Resolves the TODO from PR #1432 about not prepending reasoning traces to
final generation content.
@Pouyanpi Pouyanpi force-pushed the feat/reasoning-content-field branch from 8e88131 to d2ce30f Compare October 24, 2025 11:40
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR refactors the reasoning trace API by extracting reasoning content to a dedicated reasoning_content field in GenerationResponse instead of prepending it to response content. The change addresses technical debt from PR #1432 and provides cleaner separation between reasoning traces (from models like DeepSeek-R1) and actual bot responses. When GenerationOptions is used, reasoning is exposed via result.reasoning_content while result.response[0]["content"] contains only the clean response. For backward compatibility with dict/string responses (no GenerationOptions), reasoning is wrapped in <think> tags and prepended to content. This is a breaking change for clients expecting reasoning prepended to response content when using GenerationOptions.

Important Files Changed

Filename Score Overview
nemoguardrails/rails/llm/options.py 5/5 Added optional reasoning_content field to GenerationResponse model
tests/rails/llm/test_options.py 5/5 Added comprehensive tests for new reasoning_content field covering defaults, serialization, and integration
tests/test_bot_thinking_events.py 4/5 Updated assertions to validate reasoning in separate field instead of prepended to content
nemoguardrails/rails/llm/llmrails.py 3/5 Core logic change: extracts reasoning to separate field or wraps in <think> tags for dict responses
tests/test_llmrails.py 4/5 Added 5 tests validating reasoning_content exposure in both GenerationResponse and dict modes

Confidence score: 3/5

  • This PR requires careful review due to a breaking API change and an inconsistency between documentation and implementation
  • Score lowered because: (1) PR description claims <thinking> tags but implementation uses <think> tags (line 1232 in llmrails.py), (2) extract_bot_thinking_from_events(events) on line 1087 may include reasoning from previous cached turns not just the current response, and (3) breaking change impacts all clients using GenerationOptions that expect reasoning in response content
  • Pay close attention to nemoguardrails/rails/llm/llmrails.py lines 1087-1097 and 1232-1234- verify that reasoning extraction from full event history is intentional and confirm the correct tag format (<think> vs <thinking>)

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMRails
    participant Runtime
    participant LLM
    participant Utils
    participant GenerationResponse

    User->>LLMRails: "generate_async(messages, options)"
    LLMRails->>Runtime: "generate_events(events)"
    Runtime->>LLM: "Generate completion with reasoning"
    LLM-->>Runtime: "Response with reasoning trace"
    Runtime-->>LLMRails: "new_events"
    LLMRails->>Utils: "extract_bot_thinking_from_events(events)"
    Utils-->>LLMRails: "reasoning_content"
    
    alt With GenerationOptions
        LLMRails->>GenerationResponse: "Create response object"
        LLMRails->>GenerationResponse: "Set reasoning_content field"
        LLMRails->>GenerationResponse: "Set response content (clean, no reasoning)"
        GenerationResponse-->>User: "GenerationResponse with separate reasoning_content"
    else Without GenerationOptions
        LLMRails->>LLMRails: "Wrap reasoning in <think> tags"
        LLMRails->>LLMRails: "Prepend to response content"
        LLMRails-->>User: "Dict with reasoning wrapped in tags"
    end
Loading

5 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The developer has addressed two key issues from previous feedback:

  1. Line 1087 - Reasoning extraction scope: Changed from extracting reasoning from the full events history to only new_events (current turn). This prevents reasoning content from prior cached turns from leaking into the current response when state/message caching is active. This was a critical fix identified in the previous review as a potential logic bug.

  2. Lines 1232-1234 - Tag consistency: Changed the backwards compatibility wrapper from <thinking> to <think> tags to match what most reasoning models use as delimiters. This aligns with the previous review suggestion and ensures consistency with model token conventions.

These changes refine the separation of reasoning traces from response content, ensuring clients receive only the current turn's reasoning in the new reasoning_content field while maintaining backwards compatibility with cleaner tag conventions.

Important Files Changed

Filename Score Overview
nemoguardrails/rails/llm/llmrails.py 4/5 Fixed reasoning extraction to use new_events instead of events (line 1087) and updated backwards compatibility tags to <think> (lines 1232-1234)

Confidence score: 4/5

  • This PR is safe to merge with minimal risk
  • Score reflects that the critical logic bug (extracting from full event history) has been fixed, and tag consistency has been improved per previous feedback; however, the breaking change nature of this API modification means careful communication to users is essential
  • Pay close attention to nemoguardrails/rails/llm/llmrails.py to ensure the new_events extraction behavior works correctly with all caching scenarios

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This update addresses previous review feedback by centralizing test mock paths across the reasoning trace test suite. A new constant REASONING_TRACE_MOCK_PATH was added to conftest.py and adopted in test_bot_thinking_events.py, test_runtime_event_logging.py, and test_llmrails.py. The autouse fixture reset_reasoning_trace was removed from conftest as it's no longer needed after the ContextVar-based reasoning trace storage was replaced with LangChain's additional_kwargs approach in PR #1427. An additional assertion was added to test_bot_thinking_events.py to verify that reasoning traces do not leak into response content, directly validating the core behavioral change described in the PR description.

Important Files Changed

Filename Score Overview
tests/conftest.py 5/5 Removed deprecated reset_reasoning_trace fixture and added centralized mock path constant
tests/test_bot_thinking_events.py 5/5 Added assertion to verify reasoning content separation and adopted centralized mock path
tests/test_runtime_event_logging.py 5/5 Refactored to use centralized mock path constant across three test functions
tests/test_llmrails.py 5/5 Consolidated five inline mock paths into single centralized constant import

Confidence score: 5/5

  • This PR is safe to merge with minimal risk
  • All changes are test-only refactoring that improves maintainability by centralizing mock paths and adding one validation assertion; no production code is modified, and the changes directly address previous code review feedback
  • No files require special attention

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@Pouyanpi Pouyanpi merged commit f7b5daf into develop Oct 24, 2025
8 checks passed
@Pouyanpi Pouyanpi deleted the feat/reasoning-content-field branch October 24, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants