This repository has been archived by the owner on Nov 3, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Cringe project #4871
Merged
Merged
Cringe project #4871
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# The CRINGE Loss: Learning what language *not* to model | ||
|
||
Leonard Adolphs, Tianyu Gao, Jing Xu, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston | ||
|
||
|
||
## Abstract | ||
Standard language model training employs gold human documents or human-human interaction data, and | ||
treats all training data as positive examples. | ||
Growing evidence shows that even with very large amounts of positive training data, issues remain | ||
that can be alleviated with relatively small amounts of negative data -- examples of what the model should not do. | ||
In this work, we propose a novel procedure to train with such data called the Cringe loss | ||
(ContRastive Iterative Negative GEneration). | ||
We show the effectiveness of this approach across three different experiments on the tasks of safe generation, | ||
contradiction avoidance, and open-domain dialogue. Our models outperform multiple strong baselines and are | ||
conceptually simple, easy to train and implement. | ||
|
||
## Paper Link | ||
|
||
Coming soon | ||
|
||
|
||
## Train a CRINGE (single iter.) model on the safe generation task | ||
``` | ||
# Train a 3B parameter BB1 model | ||
parlai train -t blended_skill_talk:mutators=flatten,projects.director.tasks.safety:SafeBADTeacher:mutators=flatten+safety_relabel_classes+filter_want_to_talk_about_labels+DIRECTOR_LTR_EMPTY,projects.director.tasks.safety:SafeAdvTeacher:mutators=flatten+safety_relabel_classes+DIRECTOR_LTR_EMPTY,projects.director.tasks.safety:SafeStdTeacher:mutators=flatten+safety_relabel_classes+DIRECTOR_LTR_EMPTY,projects.director.tasks.safety:SafeMultiTeacher:mutators=flatten+safety_relabel_classes+DIRECTOR_LTR_EMPTY,projects.director.tasks.safety:SafeWikiToxicTeacher:mutators=flatten+safety_relabel_classes+DIRECTOR_LTR_EMPTY --multitask-weights 5,1,1,1,1,1 --model projects.cringe.cringe_loss:ContrastiveTransformerGeneratorAgent --learn-positional-embeddings True --embedding-size 2560 --ffn-size 10240 --n-decoder-layers 24 --n-encoder-layers 2 --n-heads 32 --n-positions 128 --variant prelayernorm --text-truncate 128 --truncate 128 --dict-tokenizer bytelevelbpe --optimizer adam --update-freq 2 --history-add-global-end-token end --lr-scheduler-patience 3 --warmup-updates 100 --batchsize 8 --gradient-clip 10.0 --fp16 True -lr 5e-05 --load-from-checkpoint True --save-after-valid True --aggregate-micro True --attention-dropout 0.1 --dropout 0.1 --label-truncate 512 --relu-dropout 0.0 --fp16-impl mem_efficient --init-model zoo:blender/blender_3B/model --dict-file zoo:blender/blender_3B/model.dict --model-file .models/cringe/safe_bb1/model --model-parallel true | ||
|
||
``` | ||
|
||
|
||
## Evaluate the CRINGE (single iter.) model on the safe generation task | ||
|
||
### Train the evaluation classifier | ||
To evaluate if the model only generates safe utterances, we use an independently trained classifier. Here, we use the training | ||
script from the [DIRECTOR](https://parl.ai/projects/director/): | ||
``` | ||
parlai train --task projects.director.tasks.safety:SafeBADTeacher:mutators=flatten+safety_relabel_classes+pos_only,projects.director.tasks.safety:SafeAdvTeacher:mutators=flatten+safety_relabel_classes+pos_only,projects.director.tasks.safety:SafeStdTeacher:mutators=flatten+safety_relabel_classes+pos_only,projects.director.tasks.safety:SafeMultiTeacher:mutators=flatten+safety_relabel_classes+pos_only,projects.director.tasks.safety:SafeWikiToxicTeacher:mutators=flatten+safety_relabel_classes+pos_only,projects.director.tasks.safety:SafeBADTeacher:mutators=flatten+safety_relabel_classes+neg_only,projects.director.tasks.safety:SafeAdvTeacher:mutators=flatten+safety_relabel_classes+neg_only,projects.director.tasks.safety:SafeStdTeacher:mutators=flatten+safety_relabel_classes+neg_only,projects.director.tasks.safety:SafeMultiTeacher:mutators=flatten+safety_relabel_classes+neg_only,projects.director.tasks.safety:SafeWikiToxicTeacher:mutators=flatten+safety_relabel_classes+neg_only -et projects.director.tasks.safety:SafeBADTeacher:mutators=flatten+safety_relabel_classes+pos_only,projects.director.tasks.safety:SafeAdvTeacher:mutators=flatten+safety_relabel_classes+pos_only,projects.director.tasks.safety:SafeStdTeacher:mutators=flatten+safety_relabel_classes+pos_only,projects.director.tasks.safety:SafeMultiTeacher:mutators=flatten+safety_relabel_classes+pos_only,projects.director.tasks.safety:SafeWikiToxicTeacher:mutators=flatten+safety_relabel_classes+pos_only,projects.director.tasks.safety:SafeBADTeacher:mutators=flatten+safety_relabel_classes+neg_only,projects.director.tasks.safety:SafeAdvTeacher:mutators=flatten+safety_relabel_classes+neg_only,projects.director.tasks.safety:SafeStdTeacher:mutators=flatten+safety_relabel_classes+neg_only,projects.director.tasks.safety:SafeMultiTeacher:mutators=flatten+safety_relabel_classes+neg_only,projects.director.tasks.safety:SafeWikiToxicTeacher:mutators=flatten+safety_relabel_classes+neg_only -vtim 120 --model transformer/classifier --load-from-pretrained-ranker True --init-model zoo:pretrained_transformers/bi_model_huge_reddit/model --dict-file zoo:pretrained_transformers/bi_model_huge_reddit/model.dict --history-size 20 --label-truncate 72 --text-truncate 360 --dict-tokenizer bpe --dict-lower True --optimizer adamax --output-scaling 0.06 --variant xlm --reduction-type mean --share-encoders False --learn-positional-embeddings True --n-layers 12 --n-heads 12 --ffn-size 3072 --attention-dropout 0.1 --relu-dropout 0.0 --dropout 0.1 --n-positions 1024 --embedding-size 768 --activation gelu --embeddings-scale False --n-segments 2 --learn-embeddings True --share-word-embeddings False --dict-endtoken __start__ -vp 30 -stim 60 --lr-scheduler fixed --lr-scheduler-patience 3 --lr-scheduler-decay 0.9 --warmup_updates 1000 --fp16 true -lr 5e-05 --classes pos neg -bs 20 --validation-metric f1 --validation-metric-mode max --validation-max-exs 3000 --validation-patience 200 --log-every-n-secs 10 -ttim 34200 --load-from-checkpoint true --save-after-valid true --tensorboard-log true --aggregate-micro True --model-file ./models/safety/eval_model | ||
``` | ||
|
||
### Evaluate the model checkpoint | ||
``` | ||
parlai em --batchsize 8 --log-every-n-secs 30 --fp16 True --metrics all --inference beam --beam-size 10 --beam-min-length 20 --beam-block-ngram 3 --beam-context-block-ngram 3 --beam-block-full-context True --skip-generation False --task projects.director.tasks.safety:SafeWikiToxicEvalTeacher:mutators=flatten+safety_relabel_classes+neg_only:eval_classifier_model_file=models/safety/eval_model:include_label_cand_only=true -dt valid --num-examples 1000 --model-file ./models/cringe/safe_bb1/model | ||
``` | ||
|
||
## Iterative Training | ||
|
||
### Generate unsafe generations on the training examples | ||
We use the model that we trained previously to generate episodes on the WikiToxic training data. We log all the results as WikiToxic_world_logs.jsonl. | ||
``` | ||
parlai em --batchsize 16 --log-every-n-secs 30 --fp16 True --metrics all --inference beam --beam-size 10 --beam-min-length 20 --beam-block-ngram 3 --beam-context-block-ngram 3 --beam-block-full-context True --skip-generation False --task projects.director.tasks.safety:SafeWikiToxicEvalTeacher:mutators=flatten+safety_relabel_classes+neg_only:eval_classifier_model_file=models/safety/eval_model:include_label_cand_only=true --num-examples 10 --datatype train:evalmode --model-file ./models/cringe/safe_bb1/model --world-logs ./models/cringe/safe_bb1/WikiToxic_world_logs.jsonl | ||
``` | ||
|
||
### Filter the world logs | ||
We filter the world logs to contain 50/50 negative and positive examples. The previously trained classifier determines the label. | ||
``` | ||
python projects/cringe/safety_filter_world_logs.py --world-logs-file ./models/cringe/safe_bb1/WikiToxic_world_logs.jsonl --filtered-world-logs-file ./models/cringe/safe_bb1/WikiToxic_world_logs_filtered.jsonl | ||
``` | ||
|
||
### Display the filtered iterative training data | ||
We display the new training data generated from the model. We prepend each generation with its label predicted by the classifier for easier inspection. | ||
``` | ||
parlai dd -t projects.cringe.teachers:IterativeTeacher -jfdp ./models/cringe/safe_bb1/WikiToxic_world_logs_filtered.jsonl --prepend-classifier-label true | ||
``` | ||
|
||
### Iterative model finetuning | ||
We finetune the model on the multitask dataset augmented with the generated utterances from the bot. It's the same finetuning command as before with the difference that we added the filtered generations as part of the dataset and we initialize the weights from the previous model. | ||
``` | ||
parlai train -t blended_skill_talk:mutators=flatten,projects.director.tasks.safety:SafeBADTeacher:mutators=flatten+safety_relabel_classes+filter_want_to_talk_about_labels+DIRECTOR_LTR_EMPTY,projects.director.tasks.safety:SafeAdvTeacher:mutators=flatten+safety_relabel_classes+DIRECTOR_LTR_EMPTY,projects.director.tasks.safety:SafeStdTeacher:mutators=flatten+safety_relabel_classes+DIRECTOR_LTR_EMPTY,projects.director.tasks.safety:SafeMultiTeacher:mutators=flatten+safety_relabel_classes+DIRECTOR_LTR_EMPTY,projects.director.tasks.safety:SafeWikiToxicTeacher:mutators=flatten+safety_relabel_classes+DIRECTOR_LTR_EMPTY,parlai_internal.projects.scones_director.teachers:IterativeTeacher:mutators=flatten:jsonfile_datapath=models/cringe/safe_bb1/WikiToxic_world_logs_filtered.jsonl --multitask-weights 5,1,1,1,1,1,1 --model projects.cringe.cringe_loss:ContrastiveTransformerGeneratorAgent --learn-positional-embeddings True --embedding-size 2560 --ffn-size 10240 --n-decoder-layers 24 --n-encoder-layers 2 --n-heads 32 --n-positions 128 --variant prelayernorm --text-truncate 128 --truncate 128 --dict-tokenizer bytelevelbpe --optimizer adam --update-freq 2 --history-add-global-end-token end --lr-scheduler-patience 3 --warmup-updates 100 -bs 8 --gradient-clip 10.0 --fp16 True -lr 5e-05 --load-from-checkpoint True --save-after-valid True --aggregate-micro True --attention-dropout 0.1 --dropout 0.1 --label-truncate 512 --relu-dropout 0.0 --fp16-impl mem_efficient --init-model ./models/cringe/safe_bb1/model --dict-file ./models/cringe/safe_bb1/model.dict --model-file .models/cringe/safe_bb1_iterative/model --model-parallel true | ||
``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parlai_internal.projects.scones_director.teachers:
-->projects.cringe.teachers:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!