Skip to content

Commit 92ceb41

Browse files
committed
docs: Add mention of Nemotron
Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>
1 parent 07422c1 commit 92ceb41

File tree

1 file changed

+33
-4
lines changed

1 file changed

+33
-4
lines changed

docs/user-guides/configuration-guide.md

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -119,14 +119,14 @@ For more details about the command and its usage, see the [CLI documentation](..
119119

120120
#### Using LLMs with Reasoning Traces
121121

122-
By default, reasoning models, such as [DeepSeek-R1](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d), include the reasoning traces in the model response.
123-
DeepSeek models use `<think>` and `</think>` as tokens to identify the traces.
122+
By default, reasoning models, such as [DeepSeek-R1](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d) and [NVIDIA Llama 3.1 Nemotron Ultra 253B V1](https://build.nvidia.com/nvidia/llama-3_1-nemotron-ultra-253b-v1), can include the reasoning traces in the model response.
123+
DeepSeek and the Nemotron family of models use `<think>` and `</think>` as tokens to identify the traces.
124124

125-
The reasoning traces and the tokens usually interfere with NeMo Guardrails and result in falsely triggering output guardrails for safe responses.
125+
The reasoning traces and the tokens can interfere with NeMo Guardrails and result in falsely triggering output guardrails for safe responses.
126126
To use these reasoning models, you can remove the traces and tokens from the model response with a configuration like the following example.
127127

128128
```{code-block} yaml
129-
:emphasize-lines: 5-
129+
:emphasize-lines: 5-8, 13-
130130
131131
models:
132132
- type: main
@@ -136,6 +136,12 @@ models:
136136
remove_reasoning_traces: True
137137
start_token: "<think>"
138138
end_token: "</think>"
139+
140+
- type: main
141+
engine: nim
142+
model: nvidia/llama-3.1-nemotron-ultra-253b-v1
143+
reasoning_config:
144+
remove_reasoning_traces: True
139145
```
140146

141147
The `reasoning_config` field for a model specifies the required configuration for a reasoning model that returns reasoning traces.
@@ -147,6 +153,29 @@ You can specify the following parameters for a reasoning model:
147153
- `start_token`: the start token for the reasoning process (default `<think>`).
148154
- `end_token`: the end token for the reasoning process (default `</think>`).
149155

156+
Even if `remove_reasoning_traces` is set to `True`, end users can still receive the thinking traces from the Nemotron models by requesting the detailed thinking, as shown in the following example:
157+
158+
```{code-block} bash
159+
:emphasize-lines: 7
160+
161+
curl https://integrate.api.nvidia.com/v1/chat/completions \
162+
-H "Content-Type: application/json" \
163+
-H "Authorization: Bearer ${NGC_API_KEY}" \
164+
-d '{
165+
"model": "nvidia/llama-3.1-nemotron-ultra-253b-v1",
166+
"messages": [
167+
{"role":"system","content":"detailed thinking on"},
168+
{"role":"user","content":"Tell me about NeMo Guardrails in 50 words or less."}
169+
],
170+
"temperature": 0.6,
171+
"top_p": 0.95,
172+
"max_tokens": 4096,
173+
"frequency_penalty": 0,
174+
"presence_penalty": 0,
175+
"stream": true
176+
}'
177+
```
178+
150179
#### NIM for LLMs
151180

152181
[NVIDIA NIM](https://docs.nvidia.com/nim/index.html) is a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across the cloud, data center, and workstations.

0 commit comments

Comments
 (0)