You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user-guides/configuration-guide.md
+33-4Lines changed: 33 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -119,14 +119,14 @@ For more details about the command and its usage, see the [CLI documentation](..
119
119
120
120
#### Using LLMs with Reasoning Traces
121
121
122
-
By default, reasoning models, such as [DeepSeek-R1](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d), include the reasoning traces in the model response.
123
-
DeepSeek models use `<think>` and `</think>` as tokens to identify the traces.
122
+
By default, reasoning models, such as [DeepSeek-R1](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) and [NVIDIA Llama 3.1 Nemotron Ultra 253B V1](https://build.nvidia.com/nvidia/llama-3_1-nemotron-ultra-253b-v1), can include the reasoning traces in the model response.
123
+
DeepSeek and the Nemotron family of models use `<think>` and `</think>` as tokens to identify the traces.
124
124
125
-
The reasoning traces and the tokens usually interfere with NeMo Guardrails and result in falsely triggering output guardrails for safe responses.
125
+
The reasoning traces and the tokens can interfere with NeMo Guardrails and result in falsely triggering output guardrails for safe responses.
126
126
To use these reasoning models, you can remove the traces and tokens from the model response with a configuration like the following example.
127
127
128
128
```{code-block} yaml
129
-
:emphasize-lines: 5-
129
+
:emphasize-lines: 5-8, 13-
130
130
131
131
models:
132
132
- type: main
@@ -136,6 +136,12 @@ models:
136
136
remove_reasoning_traces: True
137
137
start_token: "<think>"
138
138
end_token: "</think>"
139
+
140
+
- type: main
141
+
engine: nim
142
+
model: nvidia/llama-3.1-nemotron-ultra-253b-v1
143
+
reasoning_config:
144
+
remove_reasoning_traces: True
139
145
```
140
146
141
147
The `reasoning_config` field for a model specifies the required configuration for a reasoning model that returns reasoning traces.
@@ -147,6 +153,29 @@ You can specify the following parameters for a reasoning model:
147
153
-`start_token`: the start token for the reasoning process (default `<think>`).
148
154
-`end_token`: the end token for the reasoning process (default `</think>`).
149
155
156
+
Even if `remove_reasoning_traces` is set to `True`, end users can still receive the thinking traces from the Nemotron models by requesting the detailed thinking, as shown in the following example:
{"role":"user","content":"Tell me about NeMo Guardrails in 50 words or less."}
169
+
],
170
+
"temperature": 0.6,
171
+
"top_p": 0.95,
172
+
"max_tokens": 4096,
173
+
"frequency_penalty": 0,
174
+
"presence_penalty": 0,
175
+
"stream": true
176
+
}'
177
+
```
178
+
150
179
#### NIM for LLMs
151
180
152
181
[NVIDIA NIM](https://docs.nvidia.com/nim/index.html) is a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across the cloud, data center, and workstations.
0 commit comments