Skip to content

Commit d82f0b9

Browse files
authored
feat: Add api_key_env_var to Model, pass in kwargs to langchain initializer (#1142)
1 parent a44566e commit d82f0b9

File tree

10 files changed

+651
-5
lines changed

10 files changed

+651
-5
lines changed
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# NemoGuard ContentSafety Usage Example
2+
3+
This example showcases the use of NVIDIA's [NemoGuard ContentSafety model](./../../../docs/user-guides/advanced/nemoguard-contentsafety-deployment.md) for topical and dialogue moderation.
4+
5+
The structure of the config folder is the following:
6+
7+
- `config.yml` - The config file holding all the configuration options for the model.
8+
- `prompts.yml` - The config file holding the topical rules used for topical and dialogue moderation by the current guardrail configuration.
9+
10+
Please see the docs for more details about the [recommended ContentSafety deployment](./../../../docs/user-guides/advanced/nemoguard-contentsafety-deployment.md) methods, either using locally downloaded NIMs or NVIDIA AI Enterprise (NVAIE).
11+
12+
Before running this example, please set environment variables `NG_OPENAI_API_KEY` and `NG_NVIDIA_API_KEY` to your OpenAI API Key and Nvidia build.nvidia.com Key as below:
13+
14+
```shell
15+
export NG_OPENAI_API_KEY="<OpenAI API Key>"
16+
export NG_NVIDIA_API_KEY="<NVIDIA API Key>"
17+
```
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
models:
2+
- type: main
3+
engine: openai
4+
model: gpt-3.5-turbo-instruct
5+
api_key_env_var: NG_OPENAI_API_KEY
6+
7+
- type: content_safety
8+
engine: nim
9+
model: nvidia/llama-3.1-nemoguard-8b-content-safety
10+
api_key_env_var: NG_NVIDIA_API_KEY
11+
12+
rails:
13+
input:
14+
flows:
15+
- content safety check input $model=content_safety
16+
output:
17+
flows:
18+
- content safety check output $model=content_safety
Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
# These are the default prompts released by Meta, except for policy O7, which was added to address direct insults.
2+
prompts:
3+
4+
- task: content_safety_check_input $model=content_safety
5+
content: |
6+
Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.
7+
8+
<BEGIN UNSAFE CONTENT CATEGORIES>
9+
S1: Violence.
10+
S2: Sexual.
11+
S3: Criminal Planning/Confessions.
12+
S4: Guns and Illegal Weapons.
13+
S5: Controlled/Regulated Substances.
14+
S6: Suicide and Self Harm.
15+
S7: Sexual (minor).
16+
S8: Hate/Identity Hate.
17+
S9: PII/Privacy.
18+
S10: Harassment.
19+
S11: Threat.
20+
S12: Profanity.
21+
S13: Needs Caution.
22+
S14: Other.
23+
S15: Manipulation.
24+
S16: Fraud/Deception.
25+
S17: Malware.
26+
S18: High Risk Gov Decision Making.
27+
S19: Political/Misinformation/Conspiracy.
28+
S20: Copyright/Trademark/Plagiarism.
29+
S21: Unauthorized Advice.
30+
S22: Illegal Activity.
31+
S23: Immoral/Unethical.
32+
<END UNSAFE CONTENT CATEGORIES>
33+
34+
<BEGIN CONVERSATION>
35+
36+
user: {{ user_input }}
37+
38+
<END CONVERSATION>
39+
40+
Provide your safety assessment for the above conversation in the below output JSON format:
41+
{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}
42+
43+
Do not include anything other than the output JSON in your response.
44+
Output JSON:
45+
output_parser: nemoguard_parse_prompt_safety
46+
max_tokens: 50
47+
48+
- task: content_safety_check_output $model=content_safety
49+
content: |
50+
Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.
51+
52+
<BEGIN UNSAFE CONTENT CATEGORIES>
53+
S1: Violence.
54+
S2: Sexual.
55+
S3: Criminal Planning/Confessions.
56+
S4: Guns and Illegal Weapons.
57+
S5: Controlled/Regulated Substances.
58+
S6: Suicide and Self Harm.
59+
S7: Sexual (minor).
60+
S8: Hate/Identity Hate.
61+
S9: PII/Privacy.
62+
S10: Harassment.
63+
S11: Threat.
64+
S12: Profanity.
65+
S13: Needs Caution.
66+
S14: Other.
67+
S15: Manipulation.
68+
S16: Fraud/Deception.
69+
S17: Malware.
70+
S18: High Risk Gov Decision Making.
71+
S19: Political/Misinformation/Conspiracy.
72+
S20: Copyright/Trademark/Plagiarism.
73+
S21: Unauthorized Advice.
74+
S22: Illegal Activity.
75+
S23: Immoral/Unethical.
76+
<END UNSAFE CONTENT CATEGORIES>
77+
78+
<BEGIN CONVERSATION>
79+
80+
user: {{ user_input }}
81+
82+
response: agent: {{ bot_response }}
83+
84+
<END CONVERSATION>
85+
86+
Provide your safety assessment for the above conversation in the below output JSON format:
87+
{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}
88+
89+
Do not include anything other than the output JSON in your response.
90+
Output JSON:
91+
output_parser: nemoguard_parse_response_safety
92+
max_tokens: 50
93+
94+
- task: content_safety_check_input $model=llama_guard
95+
content: |
96+
<s>[INST] Task: Check if there is unsafe content in 'User' messages in conversations according our safety policy with the below categories.
97+
<BEGIN UNSAFE CONTENT CATEGORIES>
98+
O1: Violence and Hate.
99+
Should not
100+
- Help people plan or engage in violence.
101+
- Encourage people to engage in violence.
102+
- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
103+
- Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
104+
Can
105+
- Provide information on violence and discrimination.
106+
- Discuss the topics of hate and violence, and can discuss historical events involving violence.
107+
O2: Sexual Content.
108+
Should not
109+
- Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities.
110+
Can
111+
- Discuss topics related to sexuality and sexual education.
112+
- Provide general information regarding sexual activities.
113+
- Provide guidance on topics related to sexual health.
114+
O3: Criminal Planning.
115+
Should not
116+
- Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them.
117+
- Encourage people to engage in criminal activities or condone criminal behavior.
118+
Can
119+
- Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.
120+
- Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.
121+
O4: Guns and Illegal Weapons.
122+
Should not
123+
- Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them.
124+
- Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.
125+
Can
126+
- Discuss firearms and the arguments for and against firearm ownership.
127+
O5: Regulated or Controlled Substances.
128+
Should not
129+
- Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis.
130+
- Assist or encourage people to create such substances when it is illegal to do so.
131+
Can
132+
- Discuss the use of regulated or controlled substances, and can provide general information on those substances (ex: what their chemical composition is and what their effects are).
133+
- Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation.
134+
O6: Self-Harm.
135+
Should not
136+
- Encourage people to harm themselves.
137+
- Romanticize or condone self-harm or suicide.
138+
- Provide information regarding the methods of suicide or self-harm.
139+
- Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.
140+
Can
141+
- Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions).
142+
Should
143+
- Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.
144+
O7: Offensive Language and Insults.
145+
Should not
146+
- Address direct insults
147+
- Use or encourage the use of swear words, bad language, or vulgar expressions.
148+
- Insult, offend, or demean individuals or groups based on any personal characteristics or choices.
149+
- Encourage or support the use of offensive or derogatory language.
150+
Can
151+
- Discuss the impact and societal perceptions of offensive language and insults.
152+
- Provide information on communication strategies that avoid the use of offensive language and promote respectful dialogue.
153+
- Offer guidance on how to address or respond to offensive language or insults in a constructive manner.
154+
<END UNSAFE CONTENT CATEGORIES>
155+
156+
<BEGIN CONVERSATION>
157+
User: {{ user_input }}
158+
<END CONVERSATION>
159+
160+
Provide your safety assessment for User in the above conversation:
161+
- First line must read 'safe' or 'unsafe'.
162+
- If unsafe, a second line must include a comma-separated list of violated categories. [/INST]
163+
output_parser: is_content_safe
164+
165+
- task: content_safety_check_output $model=llama_guard_2
166+
content: |
167+
<|begin_of_text|>[INST] Task: Check if there is unsafe content in Agent messages in conversations according our safety policy with the below categories.
168+
169+
<BEGIN UNSAFE CONTENT CATEGORIES>
170+
S1: Violent Crimes.
171+
S2: Non-Violent Crimes.
172+
S3: Sex Crimes.
173+
S4: Child Exploitation.
174+
S5: Specialized Advice.
175+
S6: Privacy.
176+
S7: Intellectual Property.
177+
S8: Indiscriminate Weapons.
178+
S9: Hate.
179+
S10: Self-Harm.
180+
S11: Sexual Content.
181+
<END UNSAFE CONTENT CATEGORIES>
182+
183+
<BEGIN CONVERSATION>
184+
185+
User: {{ user_input }}
186+
187+
Agent: {{ bot_response }}
188+
189+
<END CONVERSATION>
190+
191+
Provide your safety assessment for ONLY THE LAST Agent in the above conversation:
192+
- First line must read 'safe' or 'unsafe'.
193+
- If unsafe, a second line must include a comma-separated list of violated categories. [/INST]
194+
output_parser: is_content_safe
195+
max_tokens: 3
196+
197+
- task: content_safety_check_input $model=shieldgemma
198+
content: |
199+
You are a policy expert trying to help determine whether a user prompt is in
200+
violation of the defined safety policies.
201+
202+
<start_of_turn>
203+
Human Question: {{ user_input }}
204+
<end_of_turn>
205+
206+
Our safety principle is defined in the below:
207+
208+
* "No Hate Speech": The prompt shall not contain or seek generation of content
209+
that expresses, incites, or promotes hate based on race, gender, ethnicity,
210+
religion, nationality, sexual orientation, disability status, or caste.
211+
* "No Dangerous Content": The prompt shall not contain or seek generation of content that harming oneself
212+
and/or others (e.g., accessing or building firearms and explosive devices, promotion of terrorism, instructions for suicide).
213+
* "No Sexually Explicit Information": The prompt shall not contain or seek generation of content
214+
that contains references to sexual acts or other lewd content (e.g., sexually graphic descriptions,
215+
content aimed at causing arousal). Medical or scientific terms related to human anatomy or sex education are permitted.
216+
* "No Harassment": The prompt shall not contain or seek generation of content that is malicious,
217+
intimidating, bullying, or abusive content targeting another individual (e.g., physical threats,
218+
denial of tragic events, disparaging victims of violence).
219+
===
220+
221+
Does the human prompt/question violate the above principle? Your answer must start with
222+
'Yes' or 'No'. And then walk through step by step to be sure we answer
223+
correctly.
224+
output_parser: is_content_safe
225+
max_tokens: 3
226+
227+
- task: content_safety_check_output $model=shieldgemma
228+
content: |
229+
You are a policy expert trying to help determine whether a chatbot response to the user is in
230+
violation of the defined safety policies:
231+
232+
<start_of_turn>
233+
Human Question: {{ user_input }}
234+
Chatbot Response: {{ bot_response }}
235+
<end_of_turn>
236+
237+
Our safety principle is defined in the below:
238+
239+
* "No Dangerous Content": The chatbot shall not generate content that harming oneself and/or others
240+
(e.g., accessing or building firearms and explosive devices, promotion of terrorism, instructions for suicide).
241+
* "No Harassment": The chatbot shall not generate content that is malicious, intimidating, bullying, or
242+
abusive content targeting another individual (e.g., physical threats, denial of tragic events, disparaging victims of violence).
243+
* "No Hate Speech": The chatbot shall not generate content that targeting identity and/or protected
244+
attributes (e.g., racial slurs, promotion of discrimination, calls to violence against protected groups) or dehumanizing/belittling/vilifying on the
245+
bases of race, ethnicity, religion, disability, age, nationality, veteran status, sexual orientation, gender, gender identity, caste, or any other protected status.
246+
We do allow content that includes a reference to a protected individual or group in a positive manner.
247+
* "No Sexually Explicit Information": The chatbot shall not generate content that contains references to sexual acts or
248+
other lewd content (e.g., sexually graphic descriptions, content aimed at causing arousal).
249+
Medical or scientific terms related to human anatomy or sex education are permitted.
250+
251+
===
252+
253+
Does the chatbot response violate the above principle? Your answer must start with
254+
'Yes' or 'No'. And then walk through step by step to be sure we answer
255+
correctly.
256+
output_parser: is_content_safe
257+
max_tokens: 3

nemoguardrails/llm/models/initializer.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ def init_llm_model(
3838
Args:
3939
model_name: Name of the model to initialize
4040
provider_name: Name of the provider to use
41+
mode: Literal taking either "chat" or "text" values
4142
kwargs: Additional arguments to pass to the model initialization
4243
4344
Returns:
@@ -48,7 +49,10 @@ def init_llm_model(
4849
"""
4950
# currently we only support LangChain models
5051
return init_langchain_model(
51-
model_name=model_name, provider_name=provider_name, mode=mode, kwargs=kwargs
52+
model_name=model_name,
53+
provider_name=provider_name,
54+
mode=mode,
55+
kwargs=kwargs,
5256
)
5357

5458

nemoguardrails/llm/models/langchain_initializer.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,9 @@ def try_initialization_method(
103103
f"Trying initializer: {initializer.init_method.__name__} for model: {model_name} and provider: {provider_name}"
104104
)
105105
result = initializer.execute(
106-
model_name=model_name, provider_name=provider_name, kwargs=kwargs
106+
model_name=model_name,
107+
provider_name=provider_name,
108+
kwargs=kwargs,
107109
)
108110
log.debug(f"Initializer {initializer.init_method.__name__} returned: {result}")
109111
if result is not None:
@@ -213,7 +215,7 @@ def _init_chat_completion_model(
213215

214216
# just to document the expected behavior
215217
# we don't support pre-0.2.7 versions of langchain-core it is in
216-
# line wiht our pyproject.toml
218+
# line with our pyproject.toml
217219
package_version = version("langchain-core")
218220

219221
if _parse_version(package_version) < (0, 2, 7):
@@ -225,6 +227,7 @@ def _init_chat_completion_model(
225227
return init_chat_model(
226228
model=model_name,
227229
model_provider=provider_name,
230+
**kwargs,
228231
)
229232
except ValueError:
230233
raise
@@ -250,7 +253,6 @@ def _init_text_completion_model(
250253
if provider_cls is None:
251254
raise ValueError()
252255
kwargs = _update_model_kwargs(provider_cls, model_name, kwargs)
253-
254256
return provider_cls(**kwargs)
255257

256258

nemoguardrails/rails/llm/config.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
ValidationError,
2929
model_validator,
3030
root_validator,
31+
validator,
3132
)
3233
from pydantic.fields import Field
3334

@@ -101,7 +102,10 @@ class Model(BaseModel):
101102
default=None,
102103
description="The name of the model. If not specified, it should be specified through the parameters attribute.",
103104
)
104-
105+
api_key_env_var: Optional[str] = Field(
106+
default=None,
107+
description='Optional environment variable with model\'s API Key. Do not include "$".',
108+
)
105109
reasoning_config: Optional[ReasoningModelConfig] = Field(
106110
default_factory=ReasoningModelConfig,
107111
description="Configuration parameters for reasoning LLMs.",
@@ -1352,6 +1356,17 @@ def fill_in_default_values_for_v2_x(cls, values):
13521356

13531357
return values
13541358

1359+
@validator("models")
1360+
def validate_models_api_key_env_var(cls, models):
1361+
"""Model API Key Env var must be set to make LLM calls"""
1362+
api_keys = [m.api_key_env_var for m in models]
1363+
for api_key in api_keys:
1364+
if api_key and not os.environ.get(api_key):
1365+
raise ValueError(
1366+
f"Model API Key environment variable '{api_key}' not set."
1367+
)
1368+
return models
1369+
13551370
raw_llm_call_action: Optional[str] = Field(
13561371
default="raw llm call",
13571372
description="The name of the action that would execute the original raw LLM call. ",

0 commit comments

Comments
 (0)