Skip to content

Commit f334e70

Browse files
committed
feat: add Nemotron model support with message-based prompts (#1199)
1 parent 4add64b commit f334e70

File tree

8 files changed

+903
-38
lines changed

8 files changed

+903
-38
lines changed
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Nemotron Message-Based Prompts
2+
3+
This directory contains configurations for using Nemotron models with NeMo Guardrails.
4+
5+
## Message-Based Prompts with Detailed Thinking
6+
7+
NeMo Guardrails implements message-based prompts for Nemotron models with _detailed thinking_ enabled for specific internal tasks:
8+
9+
### Tasks with Detailed Thinking Enabled
10+
11+
The following internal tasks include a `detailed thinking on` system message:
12+
13+
- `generate_bot_message` - When generating the final response
14+
- `generate_value` - When extracting information from user input
15+
- Other complex reasoning tasks like flow generation and continuation
16+
17+
### Tasks without Detailed Thinking
18+
19+
The following tasks use standard system messages without detailed thinking:
20+
21+
- `generate_user_intent` - When detecting user intent
22+
- `generate_next_steps` - When determining what bot actions to take
23+
24+
## Usage
25+
26+
To use Nemotron with NeMo Guardrails:
27+
28+
```python
29+
from nemoguardrails import LLMRails, RailsConfig
30+
31+
# Load the configuration
32+
config = RailsConfig.from_path("examples/configs/nemotron")
33+
34+
# Create the LLMRails instance
35+
rails = LLMRails(config)
36+
37+
# Generate a response
38+
response = rails.generate(messages=[
39+
{"role": "user", "content": "What is NeMo Guardrails?"}
40+
])
41+
print(response)
42+
```
43+
44+
When using a task that has "detailed thinking on" enabled, the model will show its reasoning process:
45+
46+
```
47+
{'role': 'assistant', 'content': '<think>\nOkay, the user is asking about NeMo Guardrails. Let me start by recalling what I know. NeMo is part of NVIDIA\'s tools, right? So, Guardrails must be a component related to that. I remember that NVIDIA has been working on AI frameworks and model development. Maybe Guardrails is part of the NeMo toolkit, which is used for building and training neural networks, especially for speech and language processing.\n\nWait, I think Guardrails are safety features or constraints that prevent models from generating harmful or inappropriate content. So, if NeMo Guardrails exist, they probably integrate these safety mechanisms into the model training or inference process. But I need to be precise here. I should check if NeMo Guardrails are specifically designed for their models like the ones in the NGC catalog.\n\nI remember that NVIDIA has LMOps tools, which might include Guardrails. Oh right, they announced RAPIDS Guardrails earlier, which is a library for adding safety features. Maybe NeMo Guardrails are a similar concept but tailored for the NeMo framework. So, they would allow developers to apply filters, classifiers, or rules to ensure the outputs are safe and comply with policies.\n\nBut wait, I should make sure not to confuse it with other guardrails. For example, some models use RLHF (Reinforcement Learning from Human Feedback), but Guardrails might be more about pre-defined rules or modular safety layers. The key points would be that NeMo Guardrails provide tools to manage model outputs, making it easier to deploy safe and responsible AI applications.\n\nAlso, considering the components of NeMo, which include pre-trained models and a framework for research, adding Guardrails would make sense as a feature to support ethical AI practices. They might include things like toxicity detection, fact-checking, or steering models away from sensitive topics.\n\nI should structure the answer to explain what NeMo is, introduce Guardrails as a safety component, mention their purpose, how they work (like modular components, customizable policies), and their benefits. Also, note that it\'s part of NVIDIA\'s efforts to provide tools for responsible AI deployment. Make sure to clarify if they are part of the main NeMo toolkit or an additional module, and maybe mention associated tools like RAPIDS Guardrails if relevant.\n\nWait, but I should verify if NeMo Guardrails are indeed a released feature or part of a specific version. If the information is not up-to-date, I need to state that clearly. The user might be referring to recent updates. Since my knowledge is current up to July 2024, if there were announcements around that time, include them. Otherwise, stick to known information.\n\nSo, putting it all together: NeMo Guardrails are safety features in NVIDIA\'s NeMo toolkit, designed to enforce ethical guidelines and prevent harmful outputs. They offer customizable policies, integration of safety models, and tools for auditing, helping developers deploy responsible AI applications. They might work alongside other NVIDIA tools to streamline the process of adding safety measures without compromising performance.\n</think>\n\nNeMo Guardrails is an open-source toolkit developed by NVIDIA that provides programmable guardrails for Large Language Models (LLMs). These guardrails are designed to make LLM-based applications safer and more reliable by controlling the output of the models in specific ways...'}
48+
```
49+
50+
### Controlling Detailed Thinking in Final Responses
51+
52+
As Nemotron is a hybrid reasoning model, users can toggle the "detailed thinking" feature for final responses similar to how it works on build.nvidia.com as long as you are not using dialog rails:
53+
54+
#### Enabling Detailed Thinking via System Message
55+
56+
To enable detailed thinking in the response, include a system message with "detailed thinking on":
57+
58+
```python
59+
response = rails.generate(messages=[
60+
{"role": "system", "content": "detailed thinking on"},
61+
{"role": "user", "content": "How is the weather today?"}
62+
])
63+
```
64+
65+
This will include the model's reasoning process in a `<think>...</think>` wrapper:
66+
67+
```
68+
{'role': 'assistant',
69+
'content': '<think>\n</think>I\'m sorry, but I don\'t know the weather. I\'m a large language model, I don\'t have access to real-time information or your location. However, I can guide you on how to check the weather! You can check the weather forecast for your area by:...'}
70+
```
71+
72+
#### Standard Mode (No Detailed Thinking)
73+
74+
Without the special system message, the model provides direct responses without showing its reasoning:
75+
76+
```python
77+
response = rails.generate(messages=[
78+
{"role": "user", "content": "How is the weather today?"}
79+
])
80+
```
81+
82+
Response:
83+
84+
```
85+
{'role': 'assistant',
86+
'content': 'The weather! Unfortunately, I don\'t have real-time access to current weather conditions or your location. I\'m a large language model...'}
87+
```
88+
89+
To remove the reasoning traces from the internal tasks, you can use the `remove_thinking_traces` configuration option:
90+
91+
For more information see [LLMs with Reasoning Traces](docs/user-guides/configuration-guide.md#using-llms-with-reasoning-traces)
92+
93+
```yaml
94+
remove_thinking_traces: true
95+
```
96+
97+
## Configuration Details
98+
99+
The `config.yml` file sets:
100+
101+
```yaml
102+
models:
103+
- type: main
104+
engine: nim
105+
model: nvidia/llama-3.1-nemotron-ultra-253b-v1
106+
107+
```
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
models:
2+
- type: main
3+
engine: nim
4+
model: nvidia/llama-3.1-nemotron-ultra-253b-v1
5+
reasoning_config:
6+
remove_reasoning_traces: False # Set True to remove traces from the internal tasks

nemoguardrails/llm/prompts.py

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
# limitations under the License.
1515

1616
"""Prompts for the various steps in the interaction."""
17+
1718
import os
1819
from typing import List, Union
1920

@@ -77,24 +78,35 @@ def _get_prompt(
7778
_score = 0.2
7879
else:
7980
for _model in prompt.models:
80-
# If we have an exact match, the score is 1.
81+
# If we have an exact match for the full task_model string (e.g., "engine/provider/model-variant")
8182
if _model == model:
8283
_score = 1
8384
break
8485

85-
# If we match just the provider, the score is 0.5.
86+
# is a provider/base_model pattern matching the model path component of `model` (task_model string).
87+
parts = model.split("/", 1)
88+
config_model_path = parts[1] if len(parts) > 1 else parts[0]
89+
90+
if "/" in _model and config_model_path.startswith(_model):
91+
if _model == config_model_path:
92+
# _model exactly matches the model path component (e.g., "nvidia/llama-3.1-nemotron-ultra-253b-v1")
93+
_score = 0.8
94+
else:
95+
# _model is a proper prefix (e.g., "nvidia/llama-3.1-nemotron" for "...-ultra-253b-v1")
96+
_score = 0.9
97+
break
98+
8699
elif model.startswith(_model + "/"):
87100
_score = 0.5
88101
break
89102

90-
# If we match just the model, the score is 0.8.
91103
elif model.endswith("/" + _model):
92104
_score = 0.8
93105
break
94106

95-
# If we match a substring, the score is 0.4
96107
elif _model in model:
97108
_score = 0.4
109+
break
98110

99111
if prompt.mode != prompting_mode:
100112
# Penalize matching score for being in an incorrect mode.

nemoguardrails/llm/prompts/llama3.yml

Lines changed: 41 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22
prompts:
33
- task: general
44
models:
5-
- llama3
6-
- llama-3
5+
- meta/llama-3
6+
- meta/llama3
7+
- nvidia/usdcode-llama-3
78

89
messages:
910
- type: system
@@ -18,8 +19,9 @@ prompts:
1819
# Prompt for detecting the user message canonical form.
1920
- task: generate_user_intent
2021
models:
21-
- llama3
22-
- llama-3
22+
- meta/llama-3
23+
- meta/llama3
24+
- nvidia/usdcode-llama-3
2325

2426
messages:
2527
- type: system
@@ -43,8 +45,9 @@ prompts:
4345
# Prompt for generating the next steps.
4446
- task: generate_next_steps
4547
models:
46-
- llama3
47-
- llama-3
48+
- meta/llama-3
49+
- meta/llama3
50+
- nvidia/usdcode-llama-3
4851

4952
messages:
5053
- type: system
@@ -65,8 +68,9 @@ prompts:
6568
# Prompt for generating the bot message from a canonical form.
6669
- task: generate_bot_message
6770
models:
68-
- llama3
69-
- llama-3
71+
- meta/llama-3
72+
- meta/llama3
73+
- nvidia/usdcode-llama-3
7074

7175
messages:
7276
- type: system
@@ -91,8 +95,9 @@ prompts:
9195
# Prompt for generating the user intent, next steps and bot message in a single call.
9296
- task: generate_intent_steps_message
9397
models:
94-
- llama3
95-
- llama-3
98+
- meta/llama-3
99+
- meta/llama3
100+
- nvidia/usdcode-llama-3
96101

97102
messages:
98103
- type: system
@@ -120,8 +125,9 @@ prompts:
120125
# Prompt for generating the value of a context variable.
121126
- task: generate_value
122127
models:
123-
- llama3
124-
- llama-3
128+
- meta/llama-3
129+
- meta/llama3
130+
- nvidia/usdcode-llama-3
125131

126132
messages:
127133
- type: system
@@ -148,8 +154,9 @@ prompts:
148154
# Prompt for detecting the user message canonical form.
149155
- task: generate_user_intent_from_user_action
150156
models:
151-
- llama3
152-
- llama-3
157+
- meta/llama-3
158+
- meta/llama3
159+
- nvidia/usdcode-llama-3
153160
messages:
154161
- type: system
155162
content: "{{ general_instructions }}"
@@ -175,8 +182,9 @@ prompts:
175182

176183
- task: generate_user_intent_and_bot_action_from_user_action
177184
models:
178-
- llama3
179-
- llama-3
185+
- meta/llama-3
186+
- meta/llama3
187+
- nvidia/usdcode-llama-3
180188
messages:
181189
- type: system
182190
content: "{{ general_instructions }}"
@@ -212,8 +220,9 @@ prompts:
212220
# Prompt for generating the value of a context variable.
213221
- task: generate_value_from_instruction
214222
models:
215-
- llama3
216-
- llama-3
223+
- meta/llama-3
224+
- meta/llama3
225+
- nvidia/usdcode-llama-3
217226
messages:
218227
- type: system
219228
content: |
@@ -238,8 +247,9 @@ prompts:
238247
# Prompt for generating a flow from instructions.
239248
- task: generate_flow_from_instructions
240249
models:
241-
- llama3
242-
- llama-3
250+
- meta/llama-3
251+
- meta/llama3
252+
- nvidia/usdcode-llama-3
243253
content: |-
244254
# Example flows:
245255
{{ examples }}
@@ -251,8 +261,9 @@ prompts:
251261
# Prompt for generating a flow from name.
252262
- task: generate_flow_from_name
253263
models:
254-
- llama3
255-
- llama-3
264+
- meta/llama-3
265+
- meta/llama3
266+
- nvidia/usdcode-llama-3
256267
messages:
257268
- type: system
258269
content: |
@@ -282,8 +293,9 @@ prompts:
282293
# Prompt for generating the continuation for the current conversation.
283294
- task: generate_flow_continuation
284295
models:
285-
- llama3
286-
- llama-3
296+
- meta/llama-3
297+
- meta/llama3
298+
- nvidia/usdcode-llama-3
287299
messages:
288300
- type: system
289301
content: "{{ general_instructions }}"
@@ -311,12 +323,8 @@ prompts:
311323

312324
- task: generate_flow_continuation_from_flow_nld
313325
models:
314-
- llama3
315-
- llama-3
316-
messages:
317-
- type: system
318-
content: "Directly response with expected answer. Don't provide any pre- or post-explanations."
319-
320-
- type: system
321-
content: |-
322-
{{ flow_nld }}
326+
- meta/llama-3
327+
- meta/llama3
328+
- nvidia/usdcode-llama-3
329+
content: |-
330+
{{ flow_nld }}

0 commit comments

Comments
 (0)