Skip to content

Conversation

@Pouyanpi
Copy link
Collaborator

@Pouyanpi Pouyanpi commented Apr 25, 2025

Summary Generated by Copilot

This pull request introduces enhancements to the LLM generation pipeline by adding support for reasoning traces and improving the handling of parsed task outputs. Key changes include the introduction of new utility functions, updates to existing methods to process reasoning traces, and modifications to ensure consistent handling of parsed outputs across various LLM tasks.

Enhancements to Reasoning Trace Support:

  • Added a new context variable reasoning_trace_var to store reasoning traces and utility functions such as _record_reasoning_trace, _assemble_response, and _process_parsed_output to manage reasoning traces and integrate them into responses. (nemoguardrails/context.py [1] nemoguardrails/actions/llm/generation.py [2]
  • Introduced _get_apply_to_reasoning_traces to fetch configuration values for including reasoning traces in outputs, and _include_reasoning_traces to encapsulate this logic. (nemoguardrails/actions/llm/generation.py nemoguardrails/actions/llm/generation.pyR1384-R1387)

Consistent Handling of Parsed Outputs:

  • Updated multiple methods (e.g., generate_user_intent, generate_bot_message, generate_value) to extract and use the text property from ParsedTaskOutput objects for consistency. (nemoguardrails/actions/llm/generation.py [1] [2] [3]
  • Added reasoning trace processing in key generation methods like generate_bot_message and generate_intent_steps_message. (nemoguardrails/actions/llm/generation.py [1] [2]

Utility and Structural Improvements:

Updates to Other LLM Modules:

  • Applied similar text extraction logic to other modules, including content safety and self-check actions, ensuring uniformity in handling parsed outputs. (nemoguardrails/library/content_safety/actions.py [1] nemoguardrails/library/self_check/facts/actions.py [2]

These changes collectively enhance the flexibility and traceability of LLM outputs, improving both debugging capabilities and the overall user experience.


Impact

tldr;
this feature ensures that reasoning traces are always returned to the user. Removing it would affect internals of nemoguardrails, but not the output that users receive.

Interaction between apply_to_reasoning_traces and remove_thinking_traces:

  • extraction always happens: reasoning traces are always extracted from LLM responses
  • storage always happens: extracted traces are always stored in reasoning_trace_var

conditional response inclusion:

  • when apply_to_reasoning_traces is TRUE: Traces are prepended to visible output
  • when apply_to_reasoning_traces is FALSE: Traces are not included in visible output

visible vs. stored output:

  • the traces are always part of what's returned to the user interface
  • but they are NOT stored in the $bot_message context when the setting is FALSE

implementation details:

_assemble_response(text, trace, include_reasoning) controls this behavior
Returns trace + text when include_reasoning is TRUE
Returns just text when include_reasoning is FALSE
Flow control: The decision happens in _process_parsed_output() where both storage and assembly occur

  • breaking change?
  • push llm reasoning output rails test
  • do same for colang 2

requires #1161

@Pouyanpi Pouyanpi force-pushed the refactor/reasoning-tokens-filters branch from ab96ab6 to 25a0d2b Compare April 25, 2025 12:33
@Pouyanpi Pouyanpi changed the title Refactor/reasoning tokens filters feat: implement structured reasoning trace extraction and handling Apr 25, 2025
@Pouyanpi Pouyanpi self-assigned this Apr 25, 2025
@Pouyanpi Pouyanpi added the enhancement New feature or request label Apr 25, 2025
@Pouyanpi Pouyanpi added this to the v0.14.0 milestone Apr 25, 2025
@Pouyanpi
Copy link
Collaborator Author

I feel uneasy about the way nemoguardrails.llm.generation does:

result = self.llm_task_manager.parse_task_output(…)
text   = result.text # this is new before `result` was plain str
# …then all kinds of ad-hoc cleaning: .strip(), strip_quotes(), 
# get_first_nonempty_line(), get_multiline_response(), split("\n"), etc.

all over the place. We end up with:

  • Duplication: every branch re-implements the same little clean-up steps (strip whitespace, remove quotes, pick line 1 vs N, etc.).

  • Inconsistency: sometimes we strip before removing quotes, sometimes after; sometimes we drop blank lines early, sometimes late.

  • Leaky abstractions: our action code has to know the gory details of how each prompts output is formatted, instead of just “I asked for a canonical intent, give me one.”

  • Type confusion: it mutates or shadows result (a ParsedTaskOutput) into a plain str, losing the .reasoning_trace and obligation to always deal with the richer type.

We MUST stop inlining dozens of little string hacks in every action.

@Pouyanpi
Copy link
Collaborator Author

This module generation.py is doing too much, with tangled responsibilities, ad-hoc hacks, and barely controlled state. Splitting it up, locking down state flows, adding types/tests, and centralizing ouar LLM invocation + post-processing logic will dramatically improve clarity, safety, and maintainability.

@Pouyanpi
Copy link
Collaborator Author

#1149

@Pouyanpi Pouyanpi force-pushed the refactor/reasoning-tokens-filters branch 3 times, most recently from 5f2416e to 119854a Compare May 1, 2025 09:00
@Pouyanpi Pouyanpi requested a review from trebedea May 1, 2025 09:03
@Pouyanpi Pouyanpi marked this pull request as ready for review May 1, 2025 09:03
@Pouyanpi Pouyanpi added bug Something isn't working refactoring and removed bug Something isn't working labels May 1, 2025
@Pouyanpi Pouyanpi changed the title feat: implement structured reasoning trace extraction and handling feat: add support for preserving and optionally applying guardrails to reasoning traces May 1, 2025
@Pouyanpi Pouyanpi requested a review from cparisien May 1, 2025 14:27
Copy link
Collaborator

@cparisien cparisien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this, just add some more tests for unexpected patterns that might look like reasoning traces but aren't.

@Pouyanpi Pouyanpi force-pushed the refactor/reasoning-tokens-filters branch from e0a4c98 to 63b0fa8 Compare May 2, 2025 07:24
@codecov-commenter
Copy link

codecov-commenter commented May 2, 2025

Codecov Report

Attention: Patch coverage is 92.30769% with 9 lines in your changes missing coverage. Please review.

Project coverage is 68.19%. Comparing base (5b5fea0) to head (0e34af6).
Report is 44 commits behind head on develop.

Files with missing lines Patch % Lines
nemoguardrails/actions/v2_x/generation.py 55.55% 4 Missing ⚠️
nemoguardrails/library/content_safety/actions.py 0.00% 2 Missing ⚠️
nemoguardrails/llm/filters.py 92.59% 2 Missing ⚠️
nemoguardrails/actions/llm/generation.py 95.45% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1145      +/-   ##
===========================================
+ Coverage    68.06%   68.19%   +0.13%     
===========================================
  Files          161      161              
  Lines        15822    15905      +83     
===========================================
+ Hits         10769    10847      +78     
- Misses        5053     5058       +5     
Flag Coverage Δ
python 68.19% <92.30%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
nemoguardrails/actions/llm/utils.py 79.27% <100.00%> (+0.34%) ⬆️
nemoguardrails/context.py 100.00% <100.00%> (ø)
nemoguardrails/library/self_check/facts/actions.py 89.18% <100.00%> (+0.30%) ⬆️
...ardrails/library/self_check/input_check/actions.py 97.14% <100.00%> (+0.08%) ⬆️
...rdrails/library/self_check/output_check/actions.py 96.96% <100.00%> (+0.09%) ⬆️
nemoguardrails/llm/taskmanager.py 88.95% <100.00%> (+1.54%) ⬆️
nemoguardrails/rails/llm/config.py 89.84% <100.00%> (+0.01%) ⬆️
nemoguardrails/rails/llm/llmrails.py 87.19% <100.00%> (+0.13%) ⬆️
nemoguardrails/actions/llm/generation.py 85.96% <95.45%> (+0.35%) ⬆️
nemoguardrails/library/content_safety/actions.py 20.00% <0.00%> (-0.55%) ⬇️
... and 2 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pouyanpi Pouyanpi force-pushed the refactor/reasoning-tokens-filters branch from ce89a19 to a6138f9 Compare May 2, 2025 10:48
Copy link
Member

@trebedea trebedea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heavy lifting in this MR looks good. There are some minor fixes, but I have not seen anything major that needs to be improved. Nice job!

Pouyanpi added 24 commits May 5, 2025 12:16
- Updated all instances to use `result.text` for accessing parsed
  task output text.
- Replaced direct usage of `result` with `text` where applicable
  for clarity and consistency.
- Adjusted related logic to ensure proper handling of stripped
  and formatted text.
Added logic to prepend `reasoning_trace` to the `content` of the
response if available. This ensures that reasoning traces are included
in the output for better traceability and debugging.

BREAKING CHANGE: Modifies the response structure by including
`reasoning_trace` in the `content` field when applicable.

feat(llm): add support for reasoning traces in output

Implemented `_process_parsed_output` to handle reasoning traces in LLM
outputs. Added `_should_guardrail_reasoning_traces` and
`_get_guardrail_reasoning_traces` for configuration-based trace
inclusion.

refactor(llm): simplify reasoning traces and threshold logic

refactor(llm): streamline output processing and trace handling

- Added `text.strip()` to clean up leading/trailing spaces in outputs.
- Replaced `text` with `result` for consistent variable naming.
- Refactored `_process_parsed_output` to separate trace recording and
  response assembly.
- Introduced `_record_reasoning_trace` and `_assemble_response` for
  modularity and clarity.

add tests for reasoning traces processing

- Added `TestProcessParsedOutput` to validate `_process_parsed_output`
  behavior with and without reasoning traces.
- Added `TestGuardrailReasoningTraces` to test guardrail reasoning
  traces configuration.
- Updated imports to include `_process_parsed_output` and
  `_get_guardrail_reasoning_traces`.

refactor(tests): rename parameter for clarity in tests

- Updated `_process_parsed_output` parameter from
  `guardrail_reasoning_traces` to `include_reasoning_trace`.
- Adjusted test cases to reflect the new parameter name for better
  readability and consistency.

fix: ensure reasoning_trace_var is cleared and test isolation

fix(config): correct dialog rails activation logic

Fixed the `has_dialog_rails` logic in `RailsConfig` to require both
`user_messages` and `bot_messages` for implicit activation. Removed
`flows` from the condition to align with expected behavior.

test(config): adjust tests for updated dialog rails logic

Modified test cases to reflect the corrected dialog rails activation
logic.

add reasoning trace context variable

correct ContextVar type annotation

add default value to reasoning_trace_var
review: improve apply_to_reasoning_traces description
@Pouyanpi Pouyanpi force-pushed the refactor/reasoning-tokens-filters branch from 0042ed3 to 0e34af6 Compare May 5, 2025 10:21
@Pouyanpi Pouyanpi merged commit 8215095 into develop May 5, 2025
17 checks passed
@Pouyanpi Pouyanpi deleted the refactor/reasoning-tokens-filters branch May 5, 2025 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants