The cleanlab trustworthiness
flow uses trustworthiness score with a default threshold of 0.6 to determine if the output should be allowed or not (i.e., if the trustworthiness score is below the threshold, the response is considered "untrustworthy").
A high trustworthiness score generally correlates with high-quality responses. In a question-answering application, high trustworthiness is indicative of correct responses, while in general open-ended applications, a high score corresponds to the response being helpful and informative. Trustworthiness scores are less useful for creative or open-ended requests.
The mathematical derivation of the score is explained in Cleanlab's documentation, and you can also access trustworthiness score benchmarks.
You can easily change the cutoff value for the trustworthiness score by adjusting the threshold in the config. For example, to change the threshold to 0.7, you can add the following flow to your config:
define subflow cleanlab trustworthiness
"""Guardrail based on trustworthiness score."""
$result = execute call cleanlab api
if $result.trustworthiness_score < 0.7
bot response untrustworthy
stop
define bot response untrustworthy
"Don't place much confidence in this response"
Install cleanlab-studio
to use Cleanlab's trustworthiness score:
pip install cleanlab-studio
Then, you can get an API key for free by creating a Cleanlab account or experiment with TLM in the playground. You can also email Cleanlab for any special requests or support.
Lastly, set the CLEANLAB_API_KEY
environment variable with the API key.