Skip to content

Conversation

@erickgalinkin
Copy link
Collaborator

Update jailbreak detection compatibility for NIM to allow providing an API key.

Tagging @tgasser-nv for review

Description

Allows the use of integrate.nvidia.com and other NIM deployments that require API key.

  • Modified config.JailbreakDetectionConfig object to include nim_auth_token parameter with a default value of None and nim_classification_path with a default value of "/v1/classify".
  • Modified jailbreak_detection.request.jailbreak_nim_request to accept nim_auth_token and nim_classification_path parameters and add the "Authorization: Bearer" header if a key is provided.
  • Modified jailbreak_detection.actions.jailbreak_detection_model to extract the nim_auth_token and nim_classification_path from the config and pass them to jailbreak_nim_request.

Tested with config values:

  • nim_url = "https://ai.api.nvidia.com"
  • nim_port=443
  • nim_auth_token=my_token
  • nim_classification_path="/v1/security/nvidia/nemoguard-jailbreak-detect"

Checklist

  • [ x] I've read the CONTRIBUTING guidelines.
  • [x ] I've updated the documentation if applicable.
  • [ x] I've added tests if applicable.
  • [ x] @mentions of the person or team responsible for reviewing proposed changes.

@codecov-commenter
Copy link

codecov-commenter commented May 29, 2025

Codecov Report

Attention: Patch coverage is 38.29787% with 29 lines in your changes missing coverage. Please review.

Project coverage is 68.65%. Comparing base (0570f88) to head (4ed2d85).
Report is 6 commits behind head on develop.

Files with missing lines Patch % Lines
...oguardrails/library/jailbreak_detection/actions.py 38.09% 13 Missing ⚠️
.../library/jailbreak_detection/model_based/checks.py 0.00% 8 Missing ⚠️
...oguardrails/library/jailbreak_detection/request.py 0.00% 6 Missing ⚠️
...moguardrails/library/jailbreak_detection/server.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #1214   +/-   ##
========================================
  Coverage    68.65%   68.65%           
========================================
  Files          161      161           
  Lines        15978    15996   +18     
========================================
+ Hits         10969    10982   +13     
- Misses        5009     5014    +5     
Flag Coverage Δ
python 68.65% <38.29%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../library/jailbreak_detection/model_based/models.py 0.00% <ø> (ø)
nemoguardrails/rails/llm/config.py 90.12% <100.00%> (+0.16%) ⬆️
...moguardrails/library/jailbreak_detection/server.py 0.00% <0.00%> (ø)
...oguardrails/library/jailbreak_detection/request.py 12.12% <0.00%> (-0.79%) ⬇️
.../library/jailbreak_detection/model_based/checks.py 0.00% <0.00%> (ø)
...oguardrails/library/jailbreak_detection/actions.py 41.93% <38.09%> (-3.90%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cparisien cparisien self-requested a review June 9, 2025 19:04
Copy link
Collaborator

@cparisien cparisien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this and it appears to resolve the Datastax issue around jailbreak server URLs. Will need an additional look from Tim, and I'd like to understand why the Windows python checks are failing.

@erickgalinkin
Copy link
Collaborator Author

@jeffreyscarpenter let me know if this satisfies your needs and reconciles the discrepancies between this and #1227

@Pouyanpi
Copy link
Collaborator

Codecov Report

Attention: Patch coverage is 28.20513% with 28 lines in your changes missing coverage. Please review.

Project coverage is 68.64%. Comparing base (0570f88) to head (9a180a6).
Report is 5 commits behind head on develop.

Files with missing lines Patch % Lines
...oguardrails/library/jailbreak_detection/actions.py 38.09% 13 Missing ⚠️
.../library/jailbreak_detection/model_based/checks.py 0.00% 8 Missing ⚠️
...oguardrails/library/jailbreak_detection/request.py 0.00% 5 Missing ⚠️
...moguardrails/library/jailbreak_detection/server.py 0.00% 2 Missing ⚠️
Additional details and impacted files
🚀 New features to boost your workflow:

thank you @erickgalinkin , would you please make sure that these are sufficiently covered?

@Pouyanpi
Copy link
Collaborator

Thank you @erickgalinkin I can see that the he move from nim_url/nim_port to nim_base_url/nim_server_endpoint is a significant improvement and much more flexible and a +1 to the authentication support.

I think this PR also introduces a breaking change:

  1. configuration schema changes:

    • nim_urlnim_base_url
    • nim_port → removed (now part of base URL)
    • embedding field → completely removed
  2. No backward compat: existing configuration files will fail to load or behave unexpectedly

Before merging, please consider:

  1. use Pydantic deprecated fields instead of immediately removing old fields:

    # In config.py
     nim_url: Optional[str] = Field(
         default=None,
         deprecated="Use 'nim_base_url' instead. This field will be removed in a future version.",
         description="DEPRECATED: Use nim_base_url instead",
     )
  2. Provide configuration migration logic using pydantic validators.

(re 1 & 2 see #1214 (comment))

  1. safer url construction:
    # In request.py we can use proper url joining
    from urllib.parse import urljoin
    endpoint = urljoin(nim_url, nim_classification_path)

@Pouyanpi Pouyanpi added this to the v0.14.1 milestone Jun 19, 2025
erickgalinkin and others added 11 commits June 25, 2025 13:53
…t to use base_url and endpoints. Refactor checks to align with base_uri and api_key_env_var approaches. Add additional error handling and logging. Fix tests to reflect changes.

Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
- Fix TypeError when classifier is None by adding defensive programming
- Replace silent failure with clear RuntimeError and descriptive message
- Simplify calling code by removing redundant null checks from actions.py and server.py
- Update tests to match new function signature and behavior
- Add test coverage for new RuntimeError path

This resolves the critical bug where check_jailbreak(prompt) would crash with
"TypeError: 'NoneType' object is not callable" when EMBEDDING_CLASSIFIER_PATH
is not set. Now it raises a clear RuntimeError with guidance on how to fix it.
fix
@Pouyanpi Pouyanpi force-pushed the jailbreakdetect_api_fix branch from 4ed2d85 to f640d29 Compare June 25, 2025 14:40
@Pouyanpi Pouyanpi self-requested a review June 25, 2025 14:48
@Pouyanpi Pouyanpi removed their request for review June 25, 2025 15:06
@Pouyanpi Pouyanpi self-requested a review June 25, 2025 15:06
Copy link
Collaborator

@Pouyanpi Pouyanpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @erickgalinkin, we are good to merge 👍🏻

@erickgalinkin erickgalinkin force-pushed the jailbreakdetect_api_fix branch from 3002c88 to b23863b Compare June 25, 2025 17:04
@erickgalinkin
Copy link
Collaborator Author

erickgalinkin commented Jun 25, 2025

Rolled back weird changes that got pushed. Thanks @Pouyanpi!

@erickgalinkin erickgalinkin merged commit 01cd760 into develop Jun 25, 2025
41 checks passed
@erickgalinkin erickgalinkin deleted the jailbreakdetect_api_fix branch June 25, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants