Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/community cleanlab #713

Merged
merged 2 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions docs/user_guides/community/cleanlab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Cleanlab Integration

The `cleanlab trustworthiness` flow uses trustworthiness score with a default threshold of 0.6 to determine if the output should be allowed or not (i.e., if the trustworthiness score is below the threshold, the response is considered "untrustworthy").

A high trustworthiness score generally correlates with high-quality responses. In a question-answering application, high trustworthiness is indicative of correct responses, while in general open-ended applications, a high score corresponds to the response being helpful and informative. Trustworthiness scores are less useful for creative or open-ended requests.

The mathematical derivation of the score is explained in [Cleanlab's documentation](https://help.cleanlab.ai/tutorials/tlm/#how-does-the-tlm-trustworthiness-score-work), and you can also access [trustworthiness score benchmarks](https://cleanlab.ai/blog/trustworthy-language-model/).

You can easily change the cutoff value for the trustworthiness score by adjusting the threshold in the [config](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/nemoguardrails/library/cleanlab/flows.co). For example, to change the threshold to 0.7, you can add the following flow to your config:

```colang
define subflow cleanlab trustworthiness
"""Guardrail based on trustworthiness score."""
$result = execute call cleanlab api

if $result.trustworthiness_score < 0.7
bot response untrustworthy
stop

define bot response untrustworthy
"Don't place much confidence in this response"
```

## Setup

Install `cleanlab-studio` to use Cleanlab's trustworthiness score:

```
pip install cleanlab-studio
```

Then, you can get an API key for free by [creating a Cleanlab account](https://app.cleanlab.ai/?signup_origin=TLM) or experiment with TLM in the [playground](https://tlm.cleanlab.ai/). You can also [email Cleanlab](mailto:sales@cleanlab.ai) for any special requests or support.

Lastly, set the `CLEANLAB_API_KEY` environment variable with the API key.
22 changes: 1 addition & 21 deletions docs/user_guides/guardrails-library.md
Original file line number Diff line number Diff line change
Expand Up @@ -652,27 +652,7 @@ rails:
- cleanlab trustworthiness
```

The `cleanlab trustworthiness` flow uses trustworthiness score with a default threshold of 0.6 to determine if the output should be allowed or not (i.e., if the trustworthiness score is below the threshold, the response is considered "untrustworthy").


A high trustworthiness score generally correlates with high-quality responses. In a question-answering application, high trustworthiness is indicative of correct responses, while in general open-ended applications, a high score corresponds to the response being helpful and informative. Trustworthiness scores are less useful for creative or open-ended requests.

The mathematical derivation of the score is explained in [Cleanlab's documentation](https://help.cleanlab.ai/tutorials/tlm/#how-does-the-tlm-trustworthiness-score-work), and you can also access [trustworthiness score benchmarks](https://cleanlab.ai/blog/trustworthy-language-model/).

You can easily change the cutoff value for the trustworthiness score by adjusting the threshold in the [config](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/nemoguardrails/library/cleanlab/flows.co). For example, to change the threshold to 0.7, you can add the following flow to your config:

```colang
define subflow cleanlab trustworthiness
"""Guardrail based on trustworthiness score."""
$result = execute call cleanlab api

if $result.trustworthiness_score < 0.7
bot response untrustworthy
stop

define bot response untrustworthy
"Don't place much confidence in this response"
```
For more details, check out the [Cleanlab Integration](./community/cleanlab.md) page.

## Other

Expand Down