NVIDIA-NeMo
diff --git a/‎examples/configs/nemotron/README.md‎
Lines changed: 107 additions & 0 deletions b/‎examples/configs/nemotron/README.md‎
Lines changed: 107 additions & 0 deletions
diff --git a/‎examples/configs/nemotron/config.yml‎
Lines changed: 6 additions & 0 deletions b/‎examples/configs/nemotron/config.yml‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎nemoguardrails/llm/prompts.py‎
Lines changed: 16 additions & 4 deletions b/‎nemoguardrails/llm/prompts.py‎
Lines changed: 16 additions & 4 deletions
diff --git a/‎nemoguardrails/llm/prompts/llama3.yml‎
Lines changed: 41 additions & 33 deletions b/‎nemoguardrails/llm/prompts/llama3.yml‎
Lines changed: 41 additions & 33 deletions
@@ -0,0 +1,107 @@
+# Nemotron Message-Based Prompts
+
+This directory contains configurations for using Nemotron models with NeMo Guardrails.
+
+## Message-Based Prompts with Detailed Thinking
+
+NeMo Guardrails implements message-based prompts for Nemotron models with _detailed thinking_ enabled for specific internal tasks:
+
+### Tasks with Detailed Thinking Enabled
+
+The following internal tasks include a `detailed thinking on` system message:
+
+- `generate_bot_message` - When generating the final response
+- `generate_value` - When extracting information from user input
+- Other complex reasoning tasks like flow generation and continuation
+
+### Tasks without Detailed Thinking
+
+The following tasks use standard system messages without detailed thinking:
+
+- `generate_user_intent` - When detecting user intent
+- `generate_next_steps` - When determining what bot actions to take
+
+## Usage
+
+To use Nemotron with NeMo Guardrails:
+
+```python
+from nemoguardrails import LLMRails, RailsConfig
+
+# Load the configuration
+config = RailsConfig.from_path("examples/configs/nemotron")
+
+# Create the LLMRails instance
+rails = LLMRails(config)
+
+# Generate a response
+response = rails.generate(messages=[
+    {"role": "user", "content": "What is NeMo Guardrails?"}
+])
+print(response)
+```
+
+When using a task that has "detailed thinking on" enabled, the model will show its reasoning process:
+
+```
+{'role': 'assistant', 'content': '<think>\nOkay, the user is asking about NeMo Guardrails. Let me start by recalling what I know. NeMo is part of NVIDIA\'s tools, right? So, Guardrails must be a component related to that. I remember that NVIDIA has been working on AI frameworks and model development. Maybe Guardrails is part of the NeMo toolkit, which is used for building and training neural networks, especially for speech and language processing.\n\nWait, I think Guardrails are safety features or constraints that prevent models from generating harmful or inappropriate content. So, if NeMo Guardrails exist, they probably integrate these safety mechanisms into the model training or inference process. But I need to be precise here. I should check if NeMo Guardrails are specifically designed for their models like the ones in the NGC catalog.\n\nI remember that NVIDIA has LMOps tools, which might include Guardrails. Oh right, they announced RAPIDS Guardrails earlier, which is a library for adding safety features. Maybe NeMo Guardrails are a similar concept but tailored for the NeMo framework. So, they would allow developers to apply filters, classifiers, or rules to ensure the outputs are safe and comply with policies.\n\nBut wait, I should make sure not to confuse it with other guardrails. For example, some models use RLHF (Reinforcement Learning from Human Feedback), but Guardrails might be more about pre-defined rules or modular safety layers. The key points would be that NeMo Guardrails provide tools to manage model outputs, making it easier to deploy safe and responsible AI applications.\n\nAlso, considering the components of NeMo, which include pre-trained models and a framework for research, adding Guardrails would make sense as a feature to support ethical AI practices. They might include things like toxicity detection, fact-checking, or steering models away from sensitive topics.\n\nI should structure the answer to explain what NeMo is, introduce Guardrails as a safety component, mention their purpose, how they work (like modular components, customizable policies), and their benefits. Also, note that it\'s part of NVIDIA\'s efforts to provide tools for responsible AI deployment. Make sure to clarify if they are part of the main NeMo toolkit or an additional module, and maybe mention associated tools like RAPIDS Guardrails if relevant.\n\nWait, but I should verify if NeMo Guardrails are indeed a released feature or part of a specific version. If the information is not up-to-date, I need to state that clearly. The user might be referring to recent updates. Since my knowledge is current up to July 2024, if there were announcements around that time, include them. Otherwise, stick to known information.\n\nSo, putting it all together: NeMo Guardrails are safety features in NVIDIA\'s NeMo toolkit, designed to enforce ethical guidelines and prevent harmful outputs. They offer customizable policies, integration of safety models, and tools for auditing, helping developers deploy responsible AI applications. They might work alongside other NVIDIA tools to streamline the process of adding safety measures without compromising performance.\n</think>\n\nNeMo Guardrails is an open-source toolkit developed by NVIDIA that provides programmable guardrails for Large Language Models (LLMs). These guardrails are designed to make LLM-based applications safer and more reliable by controlling the output of the models in specific ways...'}
+```
+
+### Controlling Detailed Thinking in Final Responses
+
+As Nemotron is a hybrid reasoning model, users can toggle the "detailed thinking" feature for final responses similar to how it works on build.nvidia.com as long as you are not using dialog rails:
+
+#### Enabling Detailed Thinking via System Message
+
+To enable detailed thinking in the response, include a system message with "detailed thinking on":
+
+```python
+response = rails.generate(messages=[
+    {"role": "system", "content": "detailed thinking on"},
+    {"role": "user", "content": "How is the weather today?"}
+])
+```
+
+This will include the model's reasoning process in a `<think>...</think>` wrapper:
+
+```
+{'role': 'assistant',
+ 'content': '<think>\n</think>I\'m sorry, but I don\'t know the weather. I\'m a large language model, I don\'t have access to real-time information or your location. However, I can guide you on how to check the weather! You can check the weather forecast for your area by:...'}
+```
+
+#### Standard Mode (No Detailed Thinking)
+
+Without the special system message, the model provides direct responses without showing its reasoning:
+
+```python
+response = rails.generate(messages=[
+    {"role": "user", "content": "How is the weather today?"}
+])
+```
+
+Response:
+
+```
+{'role': 'assistant',
+ 'content': 'The weather! Unfortunately, I don\'t have real-time access to current weather conditions or your location. I\'m a large language model...'}
+```
+
+To remove the reasoning traces from the internal tasks, you can use the `remove_thinking_traces` configuration option:
+
+For more information see [LLMs with Reasoning Traces](docs/user-guides/configuration-guide.md#using-llms-with-reasoning-traces)
+
+```yaml
+remove_thinking_traces: true
+```
+
+## Configuration Details
+
+The `config.yml` file sets:
+
+```yaml
+models:
+  - type: main
+    engine: nim
+    model: nvidia/llama-3.1-nemotron-ultra-253b-v1
+
+```
@@ -0,0 +1,6 @@
+models:
+  - type: main
+    engine: nim
+    model: nvidia/llama-3.1-nemotron-ultra-253b-v1
+    reasoning_config:
+      remove_reasoning_traces: False # Set True to remove traces from the internal tasks
@@ -14,6 +14,7 @@
 # limitations under the License.
 
 """Prompts for the various steps in the interaction."""
+
 import os
 from typing import List, Union
 
@@ -77,24 +78,35 @@ def _get_prompt(
             _score = 0.2
         else:
             for _model in prompt.models:
-                # If we have an exact match, the score is 1.
+                # If we have an exact match for the full task_model string (e.g., "engine/provider/model-variant")
                 if _model == model:
                     _score = 1
                     break
 
-                # If we match just the provider, the score is 0.5.
+                # is a provider/base_model pattern matching the model path component of `model` (task_model string).
+                parts = model.split("/", 1)
+                config_model_path = parts[1] if len(parts) > 1 else parts[0]
+
+                if "/" in _model and config_model_path.startswith(_model):
+                    if _model == config_model_path:
+                        # _model exactly matches the model path component (e.g., "nvidia/llama-3.1-nemotron-ultra-253b-v1")
+                        _score = 0.8
+                    else:
+                        # _model is a proper prefix (e.g., "nvidia/llama-3.1-nemotron" for "...-ultra-253b-v1")
+                        _score = 0.9
+                    break
+
                 elif model.startswith(_model + "/"):
                     _score = 0.5
                     break
 
-                # If we match just the model, the score is 0.8.
                 elif model.endswith("/" + _model):
                     _score = 0.8
                     break
 
-                # If we match a substring, the score is 0.4
                 elif _model in model:
                     _score = 0.4
+                    break
 
         if prompt.mode != prompting_mode:
             # Penalize matching score for being in an incorrect mode.
 
@@ -2,8 +2,9 @@
 prompts:
     - task: general
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
 
       messages:
         - type: system
@@ -18,8 +19,9 @@ prompts:
     # Prompt for detecting the user message canonical form.
     - task: generate_user_intent
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
 
       messages:
         - type: system
@@ -43,8 +45,9 @@ prompts:
     # Prompt for generating the next steps.
     - task: generate_next_steps
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
 
       messages:
         - type: system
@@ -65,8 +68,9 @@ prompts:
     # Prompt for generating the bot message from a canonical form.
     - task: generate_bot_message
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
 
       messages:
         - type: system
@@ -91,8 +95,9 @@ prompts:
     # Prompt for generating the user intent, next steps and bot message in a single call.
     - task: generate_intent_steps_message
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
 
       messages:
         - type: system
@@ -120,8 +125,9 @@ prompts:
     # Prompt for generating the value of a context variable.
     - task: generate_value
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
 
       messages:
         - type: system
@@ -148,8 +154,9 @@ prompts:
     # Prompt for detecting the user message canonical form.
     - task: generate_user_intent_from_user_action
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
       messages:
         - type: system
           content: "{{ general_instructions }}"
@@ -175,8 +182,9 @@ prompts:
 
     - task: generate_user_intent_and_bot_action_from_user_action
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
       messages:
         - type: system
           content: "{{ general_instructions }}"
@@ -212,8 +220,9 @@ prompts:
     # Prompt for generating the value of a context variable.
     - task: generate_value_from_instruction
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
       messages:
         - type: system
           content: |
@@ -238,8 +247,9 @@ prompts:
     # Prompt for generating a flow from instructions.
     - task: generate_flow_from_instructions
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
       content: |-
           # Example flows:
           {{ examples }}
@@ -251,8 +261,9 @@ prompts:
     # Prompt for generating a flow from name.
     - task: generate_flow_from_name
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
       messages:
         - type: system
           content: |
@@ -282,8 +293,9 @@ prompts:
       # Prompt for generating the continuation for the current conversation.
     - task: generate_flow_continuation
       models:
-        - llama3
-        - llama-3
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
       messages:
         - type: system
           content: "{{ general_instructions }}"
@@ -311,12 +323,8 @@ prompts:
 
     - task: generate_flow_continuation_from_flow_nld
       models:
-        - llama3
-        - llama-3
-      messages:
-        - type: system
-          content: "Directly response with expected answer. Don't provide any pre- or post-explanations."
-
-        - type: system
-          content: |-
-              {{ flow_nld }}
+        - meta/llama-3
+        - meta/llama3
+        - nvidia/usdcode-llama-3
+      content: |-
+          {{ flow_nld }}