Skip to content

Commit 131f861

Browse files
authored
Merge pull request #867 from decypher-ai/feature/autoalign-guardrail-updates
feat:AutoAlign guardrail updates
2 parents 5bb4455 + a990418 commit 131f861

File tree

19 files changed

+483
-194
lines changed

19 files changed

+483
-194
lines changed

docs/user-guides/community/auto-align.md

Lines changed: 119 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ AutoAlign comes with a library of built-in guardrails that you can easily use:
77
1. [Gender bias Detection](#gender-bias-detection)
88
2. [Harm Detection](#harm-detection)
99
3. [Jailbreak Detection](#jailbreak-detection)
10-
4. [Confidential Detection](#confidential-detection)
10+
4. [Confidential Info Detection](#confidential-info-detection)
1111
5. [Intellectual property detection](#intellectual-property-detection)
1212
6. [Racial bias Detection](#racial-bias-detection)
1313
7. [Tonal Detection](#tonal-detection)
@@ -41,10 +41,11 @@ rails:
4141
autoalign:
4242
parameters:
4343
endpoint: "https://<AUTOALIGN_ENDPOINT>/guardrail"
44+
multi_language: False
4445
input:
4546
guardrails_config:
4647
{
47-
"pii_fast": {
48+
"pii": {
4849
"enabled_types": [
4950
"[BANK ACCOUNT NUMBER]",
5051
"[CREDIT CARD NUMBER]",
@@ -98,7 +99,7 @@ rails:
9899
"[RELIGION]": 0.5
99100
}
100101
},
101-
"confidential_detection": {
102+
"confidential_info_detection": {
102103
"matching_scores": {
103104
"No Confidential": 0.5,
104105
"Legal Documents": 0.5,
@@ -117,7 +118,7 @@ rails:
117118
"score": 0.5
118119
}
119120
},
120-
"text_toxicity_extraction": {
121+
"toxicity_detection": {
121122
"matching_scores": {
122123
"score": 0.5
123124
}
@@ -153,7 +154,7 @@ rails:
153154
output:
154155
guardrails_config:
155156
{
156-
"pii_fast": {
157+
"pii": {
157158
"enabled_types": [
158159
"[BANK ACCOUNT NUMBER]",
159160
"[CREDIT CARD NUMBER]",
@@ -207,7 +208,7 @@ rails:
207208
"[RELIGION]": 0.5
208209
}
209210
},
210-
"confidential_detection": {
211+
"confidential_info_detection": {
211212
"matching_scores": {
212213
"No Confidential": 0.5,
213214
"Legal Documents": 0.5,
@@ -226,7 +227,7 @@ rails:
226227
"score": 0.5
227228
}
228229
},
229-
"text_toxicity_extraction": {
230+
"toxicity_detection": {
230231
"matching_scores": {
231232
"score": 0.5
232233
}
@@ -268,6 +269,8 @@ rails:
268269
```
269270
We also have to add the AutoAlign's guardrail endpoint in parameters.
270271
272+
"multi_language" is an optional parameter to enable guardrails for non-English information
273+
271274
One of the advanced configs is matching score (ranging from 0 to 1) which is a threshold that determines whether the guardrail will block the input/output or not.
272275
If the matching score is higher (i.e. close to 1) then the guardrail will be more strict.
273276
Some guardrails have very different format of `matching_scores` config,
@@ -299,8 +302,8 @@ define flow autoalign check output
299302
bot refuse to respond
300303
stop
301304
else
302-
$pii_message_output = $output_result["pii_fast"]["response"]
303-
if $output_result["pii_fast"]["guarded"]
305+
$pii_message_output = $output_result["pii"]["response"]
306+
if $output_result["pii"]["guarded"]
304307
bot respond pii output
305308
stop
306309
@@ -317,7 +320,7 @@ The actions `autoalign_input_api` and `autoalign_output_api` takes in two argume
317320
`show_toxic_phrases`. Both the arguments expect boolean value being passed to them. The default value of
318321
`show_autoalign_message` is `True` and for `show_toxic_phrases` is False. The `show_autoalign_message` controls whether
319322
we will show any output from autoalign or not. The response from AutoAlign would be presented as a subtext, when
320-
`show_autoalign_message` is kept `True`. Details regarding the second argument can be found in `text_toxicity_extraction`
323+
`show_autoalign_message` is kept `True`. Details regarding the second argument can be found in `toxicity_detection`
321324
section.
322325

323326

@@ -380,13 +383,17 @@ For intellectual property detection, the matching score has to be following form
380383
"matching_scores": { "score": 0.5}
381384
```
382385

383-
### Confidential detection
386+
### Confidential Info detection
387+
388+
```{warning}
389+
Backward incompatible changes are introduced in v0.12.0 due to AutoAlign API changes
390+
```
384391

385-
The goal of the confidential detection rail is to determine if the text has any kind of confidential information. This rail can be applied at both input and output.
386-
This guardrail can be added by adding `confidential_detection` key in the dictionary under `guardrails_config` section
392+
The goal of the confidential info detection rail is to determine if the text has any kind of confidential information. This rail can be applied at both input and output.
393+
This guardrail can be added by adding `confidential_info_detection` key in the dictionary under `guardrails_config` section
387394
which is under `input` or `output` section which should be in `autoalign` section in `config.yml`.
388395

389-
For confidential detection, the matching score has to be following format:
396+
For confidential info detection, the matching score has to be following format:
390397

391398
```yaml
392399
"matching_scores": {
@@ -436,8 +443,12 @@ For tonal detection, the matching score has to be following format:
436443

437444
### Toxicity extraction
438445

446+
```{warning}
447+
Backward incompatible changes are introduced in v0.12.0 due to AutoAlign API changes
448+
```
449+
439450
The goal of the toxicity detection rail is to determine if the text has any kind of toxic content. This rail can be applied at both input and output. This guardrail not just detects the toxicity of the text but also extracts toxic phrases from the text.
440-
This guardrail can be added by adding `text_toxicity_extraction` key in the dictionary under `guardrails_config` section
451+
This guardrail can be added by adding `toxicity_detection` key in the dictionary under `guardrails_config` section
441452
which is under `input` or `output` section which should be in `autoalign` section in `config.yml`.
442453

443454
For text toxicity detection, the matching score has to be following format:
@@ -455,24 +466,24 @@ define subflow autoalign check input
455466
$autoalign_input_response = $input_result['combined_response']
456467
bot refuse to respond
457468
stop
458-
else if $input_result["pii_fast"] and $input_result["pii_fast"]["guarded"]:
459-
$user_message = $input_result["pii_fast"]["response"]
469+
else if $input_result["pii"] and $input_result["pii"]["guarded"]:
470+
$user_message = $input_result["pii"]["response"]
460471
461472
define subflow autoalign check output
462473
$output_result = execute autoalign_output_api(show_autoalign_message=True, show_toxic_phrases=True)
463474
if $output_result["guardrails_triggered"]
464475
bot refuse to respond
465476
stop
466477
else
467-
$pii_message_output = $output_result["pii_fast"]["response"]
468-
if $output_result["pii_fast"]["guarded"]
478+
$pii_message_output = $output_result["pii"]["response"]
479+
if $output_result["pii"]["guarded"]
469480
$bot_message = $pii_message_output
470481
471-
define subflow autoalign factcheck output
482+
define subflow autoalign groundedness output
472483
if $check_facts == True
473484
$check_facts = False
474485
$threshold = 0.5
475-
$output_result = execute autoalign_factcheck_output_api(factcheck_threshold=$threshold, show_autoalign_message=True)
486+
$output_result = execute autoalign_groundedness_output_api(factcheck_threshold=$threshold, show_autoalign_message=True)
476487
bot provide response
477488
478489
define bot refuse to respond
@@ -482,8 +493,12 @@ define bot refuse to respond
482493

483494
### PII
484495

496+
```{warning}
497+
Backward incompatible changes are introduced in v0.12.0 due to AutoAlign API changes
498+
```
499+
485500
To use AutoAlign's PII (Personal Identifiable Information) module, you have to list the entities that you wish to redact
486-
in `enabled_types` in the dictionary of `guardrails_config` under the key of `pii_fast`; if not listed then all PII types will be redacted.
501+
in `enabled_types` in the dictionary of `guardrails_config` under the key of `pii`; if not listed then all PII types will be redacted.
487502

488503
The above sample shows all PII entities that is currently being supported by AutoAlign.
489504

@@ -498,7 +513,7 @@ You have to define the config for output and input side separately based on wher
498513
Example PII config:
499514

500515
```yaml
501-
"pii_fast": {
516+
"pii": {
502517
"enabled_types": [
503518
"[BANK ACCOUNT NUMBER]",
504519
"[CREDIT CARD NUMBER]",
@@ -554,48 +569,53 @@ Example PII config:
554569
}
555570
```
556571

557-
### Factcheck or Groundness Check
558-
The factcheck needs an input statement (represented as ‘prompt’) as a list of evidence documents.
559-
To use AutoAlign's factcheck module, you have to modify the `config.yml` in the following format:
572+
### Groundness Check
573+
574+
```{warning}
575+
Backward incompatible changes are introduced in v0.12.0 due to AutoAlign API changes
576+
```
577+
578+
The groundness check needs an input statement (represented as ‘prompt’) as a list of evidence documents.
579+
To use AutoAlign's groundness check module, you have to modify the `config.yml` in the following format:
560580

561581
```yaml
562582
rails:
563583
config:
564584
autoalign:
565585
guardrails_config:
566586
{
567-
"factcheck":{
587+
"groundedness_checker":{
568588
"verify_response": false
569589
}
570590
}
571591
parameters:
572-
fact_check_endpoint: "https://<AUTOALIGN_ENDPOINT>/factcheck"
592+
groundedness_check_endpoint: "https://<AUTOALIGN_ENDPOINT>/groundedness_check"
573593
output:
574594
flows:
575-
- autoalign factcheck output
595+
- autoalign groundedness output
576596
```
577597

578-
Specify the factcheck endpoint the parameters section of autoalign's config.
579-
Then, you have to call the corresponding subflows for factcheck guardrails.
598+
Specify the groundness endpoint the parameters section of autoalign's config.
599+
Then, you have to call the corresponding subflows for groundness guardrails.
580600

581-
In the guardrails config for factcheck you can toggle "verify_response" flag
601+
In the guardrails config for groundness check you can toggle "verify_response" flag
582602
which will enable(true) / disable (false) additional processing of LLM Response.
583603
This processing ensures that only relevant LLM responses undergo fact-checking
584604
and responses like greetings ('Hi', 'Hello' etc.) do not go through fact-checking
585605
process.
586606

587607
Note that the verify_response is set to False by default as it requires additional
588608
computation, and we encourage users to determine which LLM responses should go through
589-
AutoAlign fact checking whenever possible.
609+
AutoAlign groundness check whenever possible.
590610

591611

592612
Following is the format of the colang file, which is present in the library:
593613
```colang
594-
define subflow autoalign factcheck output
614+
define subflow autoalign groundedness output
595615
if $check_facts == True
596616
$check_facts = False
597617
$threshold = 0.5
598-
$output_result = execute autoalign_factcheck_output_api(factcheck_threshold=$threshold)
618+
$output_result = execute autoalign_groundedness_output_api(factcheck_threshold=$threshold)
599619
```
600620

601621
The `threshold` can be changed depending upon the use-case, the `output_result`
@@ -627,6 +647,69 @@ for ideal chit-chat.
627647

628648

629649

630-
The output of the factcheck endpoint provides you with a factcheck score against which we can add a threshold which determines whether the given output is factually correct or not.
650+
The output of the groundness check endpoint provides you with a factcheck score against which we can add a threshold which determines whether the given output is factually correct or not.
631651

632652
The supporting documents or the evidence has to be placed within a `kb` folder within `config` folder.
653+
654+
655+
### Fact Check
656+
657+
```{warning}
658+
Backward incompatible changes are introduced in v0.12.0 due to AutoAlign API changes
659+
```
660+
661+
The fact check uses the bot response and user input prompt to check the factual correctness of the bot response based on the user prompt. Unlike groundness check, fact check does not use a pre-existing internal knowledge base.
662+
To use AutoAlign's fact check module, modify the `config.yml` from example autoalign_factcheck_config.
663+
664+
```yaml
665+
models:
666+
- type: main
667+
engine: openai
668+
model: gpt-3.5-turbo-instruct
669+
rails:
670+
config:
671+
autoalign:
672+
parameters:
673+
fact_check_endpoint: "https://<AUTOALIGN_ENDPOINT>/content_moderation"
674+
multi_language: False
675+
output:
676+
guardrails_config:
677+
{
678+
"fact_checker": {
679+
"mode": "DETECT",
680+
"knowledge_base": [
681+
{
682+
"add_block_domains": [],
683+
"documents": [],
684+
"knowledgeType": "web",
685+
"num_urls": 3,
686+
"search_engine": "Google",
687+
"static_knowledge_source_type": ""
688+
}
689+
],
690+
"content_processor": {
691+
"max_tokens_per_chunk": 100,
692+
"max_chunks_per_source": 3,
693+
"use_all_chunks": false,
694+
"name": "Semantic Similarity",
695+
"filter_method": {
696+
"name": "Match Threshold",
697+
"threshold": 0.5
698+
},
699+
"content_filtering": true,
700+
"content_filtering_threshold": 0.6,
701+
"factcheck_max_text": false,
702+
"max_input_text": 150
703+
},
704+
"mitigation_with_evidence": false
705+
},
706+
}
707+
output:
708+
flows:
709+
- autoalign factcheck output
710+
```
711+
712+
Specify the fact_check_endpoint to the correct AutoAlign environment.
713+
Then set to the corresponding subflows for fact check guardrail.
714+
715+
The output of the fact check endpoint provides you with a fact check score that combines the factual correctness of various statements made by the bot response. Then provided with a user set threshold, will log a warning if the bot response is determined to be factually incorrect

examples/configs/autoalign/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ This example showcases the use of AutoAlign guardrails.
55
The structure of the config folders is the following:
66
- `autoalign_config` - example configuration folder for all guardrails (except factcheck)
77
- `config.yml` - The config file holding all the configuration options.
8-
- `autoalign_factcheck_config` - example configuration folder for AutoAlign's factcheck
8+
- `autoalign_groundness_config` - example configuration folder for AutoAlign's groundness check
99
- `kb` - The folder containing documents that form the knowledge base.
1010
- `config.yml` - The config file holding all the configuration options.
11+
- `autoalign_factcheck_config` - example configuration folder for AutoAlign's factcheck
12+
- `config.yml` - The config file holding all the configuration options.

examples/configs/autoalign/autoalign_config/config.yml

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,11 @@ rails:
99
autoalign:
1010
parameters:
1111
endpoint: "https://<AUTOALIGN_ENDPOINT>/guardrail"
12+
multi_language: False
1213
input:
1314
guardrails_config:
1415
{
15-
"pii_fast": {
16+
"pii": {
1617
"enabled_types": [
1718
"[BANK ACCOUNT NUMBER]",
1819
"[CREDIT CARD NUMBER]",
@@ -32,15 +33,16 @@ rails:
3233
},
3334
"gender_bias_detection": {},
3435
"harm_detection": {},
35-
"text_toxicity_extraction": {},
36+
"toxicity_detection": {},
3637
"racial_bias_detection": {},
3738
"jailbreak_detection": {},
38-
"intellectual_property": {}
39+
"intellectual_property": {},
40+
"confidential_info_detection": {}
3941
}
4042
output:
4143
guardrails_config:
4244
{
43-
"pii_fast": {
45+
"pii": {
4446
"enabled_types": [
4547
"[BANK ACCOUNT NUMBER]",
4648
"[CREDIT CARD NUMBER]",
@@ -60,9 +62,8 @@ rails:
6062
},
6163
"gender_bias_detection": {},
6264
"harm_detection": {},
63-
"text_toxicity_extraction": {},
65+
"toxicity_detection": {},
6466
"racial_bias_detection": {},
65-
"jailbreak_detection": {},
6667
"intellectual_property": {}
6768
}
6869
input:

0 commit comments

Comments
 (0)