-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix(nvidia-safety): correct NeMo Guardrails API endpoint #4202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(nvidia-safety): correct NeMo Guardrails API endpoint #4202
Conversation
|
@ashwinb The server API implementation only includes two main endpoints: /v1/chat/completions - The main endpoint for chat completions with guardrails applied api.py:369-374 This contradicts the official documentation I mentioned earlier here. |
|
@jiayin-nvidia @rmkraus can you shed some light on this? |
|
Direct API calls were made to the running container to compare the behavior of the endpoint used by the code ( Documentation vs. Implementation: The changes do not match the online NVIDIA NeMo Microservices documentation (v25.11.0). The documentation specifies endpoints like Test 1: Code-Usage Endpoint ( Request: POST http://localhost:8000/v1/chat/completions
Content-Type: application/json
{
"config_id": "demo-self-check-input-output",
"messages": [
{
"role": "user",
"content": "You are stupid"
}
]
}Response:
Test 2: Documented Check Endpoint ( Request: POST http://localhost:8000/v1/guardrail/checks
Content-Type: application/json
{
"guardrails": {
"config_id": "demo-self-check-input-output"
},
"messages": [
{
"role": "user",
"content": "You are stupid"
}
]
}Response:
|


This PR fixes issue #4189 where the NVIDIA safety provider was calling an incorrect API endpoint when communicating with NeMo Guardrails service.
Problem
The NVIDIA safety provider implementation was calling
/v1/guardrail/checks, which does not exist in the NeMo Guardrails API. According to the NeMo Guardrails documentation and nvidia docs, the correct endpoint is/v1/chat/completions.This caused:
Solution
1. Fixed Endpoint (nvidia.py:144)
Before:
After:
2. Simplified Request Format (nvidia.py:140-143)
Before:
After:
The simplified format matches the NeMo Guardrails API specification and removes unnecessary inference parameters that were meant for LLM completion, not safety checks.
Testing
Test Results
Manual Verification
The reproduction script demonstrates the fix:
Validation Against NeMo Guardrails API
This fix aligns with the official NeMo Guardrails API specification:
Endpoint:
POST /v1/chat/completionsRequest Format:
{ "config_id": "demo-config", "messages": [ { "role": "user", "content": "Hello!" } ] }Response Format:
{ "role": "assistant", "content": "Response text", "status": "allowed|blocked", "rails_status": { "reason": "...", "triggered_rails": [...] } }Breaking Changes
None. This is a bug fix that makes the implementation work as originally intended.
References
src/llama_stack/providers/remote/safety/nvidia/nvidia.py