You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user-guides/advanced/nemoguard-topiccontrol-deployment.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,13 @@
1
1
# Llama 3.1 NemoGuard 8B Topic Control Deployment
2
2
3
-
The TopicControl model will be available to download as a LoRA adapter module through HuggingFace, and as an [NVIDIA NIM](https://docs.nvidia.com/nim/#nemoguard) for lowlatency optimized inference with [NVIDIA TensorRT-LLM](https://docs.nvidia.com/tensorrt-llm/index.html).
3
+
The TopicControl model is available to download as a LoRA adapter module through Hugging Face or as an [NVIDIA TopicControl NIM microservice](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-topiccontrol/latest/index.html) for low-latency optimized inference with [NVIDIA TensorRT-LLM](https://docs.nvidia.com/tensorrt-llm/index.html).
4
4
5
-
This guide covers how to deploy the TopicControl model as a NIM, and how to then use the deployed NIM in a NeMo Guardrails configuration.
5
+
This guide covers how to deploy the TopicControl model as a NIM microservice and use it in a NeMo Guardrails configuration.
6
6
7
7
## NIM Deployment
8
8
9
+
Follow the instructions below to deploy the TopicControl NIM microservice and configure it in a NeMo Guardrails application.
10
+
9
11
### Access
10
12
11
13
The first step is to ensure access to NVIDIA NIM assets through NGC using an NVAIE license.
@@ -37,11 +39,9 @@ docker run -it --name=$MODEL_NAME \
37
39
$NIM_IMAGE
38
40
```
39
41
40
-
#### Use the running NIM in your Guardrails App
41
-
42
-
Any locally running NIM exposes the standard OpenAI interface on the `v1/completions` and `v1/chat/completions` endpoints. NeMo Guardrails provides out of the box support engines that support the standard LLM interfaces. For locally deployed NIMs, you need to use the engine `nim`.
42
+
### Use TopicControl NIM Microservice in NeMo Guardrails App
43
43
44
-
Thus, your Guardrails configuration file can look like:
44
+
A locally running TopicControl NIM microservice exposes the standard OpenAI interface on the `v1/chat/completions` endpoint. NeMo Guardrails provides out-of-the-box support for engines that support the standard LLM interfaces. In Guardrails configuration, use the engine `nim` for the TopicControl NIM microservice as follows.
45
45
46
46
```yaml
47
47
models:
@@ -67,7 +67,7 @@ A few things to note:
67
67
- `parameters.model_name`in the Guardrails configuration needs to match the `$MODEL_NAME` used when running the NIM container.
68
68
- The `rails` definitions should list `topic_control` as the model.
69
69
70
-
#### Bonus: Caching the optimized TRTLLM inference engines
70
+
### Bonus: Caching the optimized TRTLLM inference engines
71
71
72
72
If you'd like to not build TRTLLM engines from scratch every time you run the NIM container, you can cache it in the first run by just adding a flag to mount a local directory inside the docker to store the model cache.
0 commit comments