Skip to content

Commit

Permalink
Merge pull request #713
Browse files Browse the repository at this point in the history
Docs/community cleanlab
  • Loading branch information
drazvan authored Sep 3, 2024
2 parents 69492a9 + 3b524e6 commit c6291c3
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 21 deletions.
34 changes: 34 additions & 0 deletions docs/user_guides/community/cleanlab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Cleanlab Integration

The `cleanlab trustworthiness` flow uses trustworthiness score with a default threshold of 0.6 to determine if the output should be allowed or not (i.e., if the trustworthiness score is below the threshold, the response is considered "untrustworthy").

A high trustworthiness score generally correlates with high-quality responses. In a question-answering application, high trustworthiness is indicative of correct responses, while in general open-ended applications, a high score corresponds to the response being helpful and informative. Trustworthiness scores are less useful for creative or open-ended requests.

The mathematical derivation of the score is explained in [Cleanlab's documentation](https://help.cleanlab.ai/tutorials/tlm/#how-does-the-tlm-trustworthiness-score-work), and you can also access [trustworthiness score benchmarks](https://cleanlab.ai/blog/trustworthy-language-model/).

You can easily change the cutoff value for the trustworthiness score by adjusting the threshold in the [config](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/nemoguardrails/library/cleanlab/flows.co). For example, to change the threshold to 0.7, you can add the following flow to your config:

```colang
define subflow cleanlab trustworthiness
"""Guardrail based on trustworthiness score."""
$result = execute call cleanlab api
if $result.trustworthiness_score < 0.7
bot response untrustworthy
stop
define bot response untrustworthy
"Don't place much confidence in this response"
```

## Setup

Install `cleanlab-studio` to use Cleanlab's trustworthiness score:

```
pip install cleanlab-studio
```

Then, you can get an API key for free by [creating a Cleanlab account](https://app.cleanlab.ai/?signup_origin=TLM) or experiment with TLM in the [playground](https://tlm.cleanlab.ai/). You can also [email Cleanlab](mailto:sales@cleanlab.ai) for any special requests or support.

Lastly, set the `CLEANLAB_API_KEY` environment variable with the API key.
22 changes: 1 addition & 21 deletions docs/user_guides/guardrails-library.md
Original file line number Diff line number Diff line change
Expand Up @@ -652,27 +652,7 @@ rails:
- cleanlab trustworthiness
```

The `cleanlab trustworthiness` flow uses trustworthiness score with a default threshold of 0.6 to determine if the output should be allowed or not (i.e., if the trustworthiness score is below the threshold, the response is considered "untrustworthy").


A high trustworthiness score generally correlates with high-quality responses. In a question-answering application, high trustworthiness is indicative of correct responses, while in general open-ended applications, a high score corresponds to the response being helpful and informative. Trustworthiness scores are less useful for creative or open-ended requests.

The mathematical derivation of the score is explained in [Cleanlab's documentation](https://help.cleanlab.ai/tutorials/tlm/#how-does-the-tlm-trustworthiness-score-work), and you can also access [trustworthiness score benchmarks](https://cleanlab.ai/blog/trustworthy-language-model/).

You can easily change the cutoff value for the trustworthiness score by adjusting the threshold in the [config](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/nemoguardrails/library/cleanlab/flows.co). For example, to change the threshold to 0.7, you can add the following flow to your config:

```colang
define subflow cleanlab trustworthiness
"""Guardrail based on trustworthiness score."""
$result = execute call cleanlab api
if $result.trustworthiness_score < 0.7
bot response untrustworthy
stop
define bot response untrustworthy
"Don't place much confidence in this response"
```
For more details, check out the [Cleanlab Integration](./community/cleanlab.md) page.

## Other

Expand Down

0 comments on commit c6291c3

Please sign in to comment.