diff --git a/sample_model_cards/blenderbot_90M/model_card.md b/sample_model_cards/blenderbot_90M/model_card.md new file mode 100644 index 00000000000..25bbd576758 --- /dev/null +++ b/sample_model_cards/blenderbot_90M/model_card.md @@ -0,0 +1,241 @@ +# Blender 90M + + + +90M-parameter generative model finetuned on blended_skill_talk tasks. +- Developed by Facebook AI Research using [ParlAI](https://parl.ai/) +- Model started training on February 10, 2007. +- Type of model: Transformer Generative Model + +### Quick Usage + + +``` +python parlai/scripts/safe_interactive.py -mf zoo:blender/blender_90M/model -t blended_skill_talk +``` + +### Sample Input And Output + +``` +[text]: Science fiction + +[labels]: I think science fiction is an amazing genre for anything. Future science, technology, time travel, FTL travel, they're all such interesting concepts. +--- +[chosen_topic]: Science fiction +[knowledge]: Science fiction Science fiction (often shortened to SF or sci-fi) is a genre of speculative fiction, typically dealing with imaginative concepts such as futuristic science and technology, space travel, time travel, faster than light travel, parallel universes, and extraterrestrial life. +Science fiction Science fiction often explores the potential consequences of scientific and other innovations, and has been called a "literature of ideas". +Science fiction It usually avoids the supernatural, unlike the related genre of fantasy. +Science fiction Historically, science-fiction stories have had a grounding in actual science, but now this is only expected of hard science fiction. +Science fiction Science fiction is difficult to define, as it includes a wide range of subgenres and themes. +Science fiction Hugo Gernsback, who suggested the term "scientifiction" for his "Amazing Stories" magazine, wrote: "By 'scientifiction' I mean the Jules Verne, H. G. Wells and Edgar Allan Poe type of story—a charming romance intermingled with scientific fact and prophetic vision... Not only do these amazing tales make tremendously interesting reading—they are always instructive. +Science fiction They supply knowledge... in a very palatable form... New adventures pictured for us in the scientifiction of today are not at all impossible of realization tomorrow... +[title]: Science fiction +[checked_sentence]: Science fiction (often shortened to SF or sci-fi) is a genre of speculative fiction, typically dealing with imaginative concepts such as futuristic science and technology, space travel, time travel, faster than light travel, parallel universes, and extraterrestrial life. +``` + +## Intended Use + +BlenderBot(90M) is a chatbot built for research purposes only. + +## Limitations + +While we've made our model more engaginess and humanlike with generative model, those models cannot yet fully understand [safe or not](https://parl.ai/projects/safety_recipes/). + +## Privacy + +Our work focuses on models with open-domain conversations wherein speakers may divulge personal interests. We remark that, during data collection, crowdworkers were specifically playing roles with given personality traits, not talking about themselves, and hence not identifying any personal information. + +## Datasets Used + +This model was trained on the datasets below (use the `parlai display_data` commands to show data). Visit the [task (dataset) list](https://parl.ai/docs/tasks.html) for more details about the datasets. + + +- [Wizard_of_Wikipedia](https://parl.ai/docs/tasks.html#wizard_of_wikipedia) ([arXiv](https://arxiv.org/abs/1811.01241)): A dataset with conversations directly grounded with knowledge retrieved from Wikipedia. Contains 201k utterances from 22k dialogues spanning over 1300 diverse topics, split into train, test, and valid sets. The test and valid sets are split into two sets each: one with overlapping topics with the train set, and one with unseen topics. +- [ConvAI2](https://parl.ai/docs/tasks.html#convai2) ([arXiv](https://arxiv.org/abs/1801.07243) | [website](http://convai.io/)): A chit-chat dataset based on PersonaChat for a NIPS 2018 competition. +- [Empathetic Dialogues](https://parl.ai/docs/tasks.html#empathetic-dialogues) ([arXiv](https://arxiv.org/abs/1811.00207)): A dataset of 25k conversations grounded in emotional situations to facilitate training and evaluating dialogue systems. Dataset has been released under the CC BY-NC license. + +In addition, we have also included some basic stats about the training datasets in the table below: + +|Dataset | avg utterance length | unique tokens | utterances | Display Dataset Command +:---: | :---: | :---: | :---: | :---: +ConvAI2 | 15.170 | 18756 | 199786 | `parlai dd -t convai2` +Empathetic Dialogues | 15.850 | 22415 | 129272 | `parlai dd -t empathetic_dialogues` +Wizard_of_Wikipedia | 18.488 | 39612 | 90412 | `parlai dd -t wizard_of_wikipedia` + +Note: The display dataset commands were auto generated, so please visit [here](https://parl.ai/docs/cli_usage.html#display-data) for more details. + + +## Evaluation Results + +For evalution, we used the same training datasets; check the [Datasets Used](#datasets-used) section for more information +We used the metric `ppl` as the validation metric. Recall that `ppl` is perplexity. Click [here](https://en.wikipedia.org/wiki/Perplexity) for more info. + +| | All | Wizard_of_Wikipedia | ConvAI2 | Empathetic Dialogues +:---: | :---: | :---: | :---: | :---: +`ppl` | 12.9936 | 15.0214 | 9.8735 | 13.2825 + + + +## Safety Benchmark + +

Unsafe Generation Test

+ +For the Unsafe Generation test, we examine how the model responds to various dialogue inputs, representing 4 different settings. We report the percentage of the model's responses that are flagged as unsafe by each of the provided tools + + + +

Response to Offensive Language Test

+ +For the Response to Offensive Language test, we examine how the model responds to a previously constructed set of hateful inputs by Sheng et. al (2021): . We attempt to ascertain whether the model's response affirms the hateful input by measuring the percentage of responses that (1) do not contain negations (2) are flagged as offensive by a safety classifier that uses context, and (3) has positive sentiment. + +| | Sentiment analysis (% positive sentiment) | Safety classifier (% flagged offensive) | Negation detection (% without negatives) +:---: | :---: | :---: | :---: +Affirmation Detection | 41.96% | 22.75% | 1.37% + +Further details are provided in Section 5 of , (code details can be found [here](https://github.com/facebookresearch/ParlAI/tree/master/projects/safety_bench)) + +#### DISCLAIMER + + +These unit tests are limited in many ways. The tools are imperfect and known to be biased. The provided settings are not comprehensive. These tests cannot guarantee the safety of your model, and are meant simply as a first pass at understanding its safety limitations. Please see further discussion in Section 5 of about the limitations and next steps. We recommend using human evaluation and adversarial probing to further understand the model's ability to generate or respond inappropriately to unsafe content. + + +## Related Paper(s) + +> :warning: Missing related_paper: Probably need to be grabbed from paper & added by u (the creator):warning: + +## Hyperparameters + + +- `lr_scheduler`: ` reduceonplateau ` +- `batchsize`: ` 16 ` +- `learningrate`: ` 7.5e-06 ` +- `model`: ` transformer/generator ` +- `validation_patience`: ` 15 ` +- `validation_metric`: ` ppl ` +- `multitask_weights`: ` [1.0, 3.0, 3.0, 3.0] ` +- `max_train_steps`: ` Not specified ` +- `num_epochs`: ` -1 ` +
+ model / neural net info +
+ +- `n_layers`: ` 8 ` +- `ffn_size`: ` 2048 ` +- `dropout`: ` 0.1 ` +- `n_heads`: ` 16 ` +- `n_positions`: ` 512 ` +- `variant`: ` xlm ` +- `activation`: ` gelu ` +- `output_scaling`: ` 1.0 ` +
+
+ embedding info +
+ +- `share_word_embeddings`: ` True ` +- `learn_positional_embeddings`: ` True ` +- `embeddings_scale`: ` True ` +- `embedding_projection`: ` random ` +- `embedding_size`: ` 512 ` +- `embedding_type`: ` random ` +
+
+ validation and logging info +
+ +- `validation_every_n_secs`: ` -1 ` +- `save_after_valid`: ` True ` +- `validation_every_n_epochs`: ` 0.25 ` +- `validation_max_exs`: ` 20000 ` +- `validation_metric_mode`: ` min ` +- `validation_cutoff`: ` 1.0 ` +
+
+ dictionary info/pre-processing +
+ +- `dict_class`: ` parlai.core.dict:DictionaryAgent ` +- `dict_lower`: ` True ` +- `dict_unktoken`: ` __unk__ ` +- `dict_endtoken`: ` __end__ ` +- `dict_tokenizer`: ` bpe ` +- `dict_nulltoken`: ` __null__ ` +- `dict_language`: ` english ` +- `dict_starttoken`: ` __start__ ` +- `dict_maxtokens`: ` -1 ` +- `dict_max_ngram_size`: ` -1 ` +- `dict_textfields`: ` text,labels ` +- `dict_maxexs`: ` -1 ` +
+
+ other dataset-related info +
+ +- `truncate`: ` -1 ` +- `text_truncate`: ` 512 ` +- `label_truncate`: ` 128 ` +- `task`: ` blended_skill_talk ` +
+
+ more batch and learning rate info +
+ +- `max_lr_steps`: ` -1 ` +- `invsqrt_lr_decay_gamma`: ` -1 ` +- `lr_scheduler_decay`: ` 0.5 ` +- `lr_scheduler_patience`: ` 3 ` +- `batchindex`: ` 15 ` +
+
+ training info +
+ +- `numthreads`: ` 1 ` +- `metrics`: ` default ` +- `gpu`: ` -1 ` +- `optimizer`: ` adamax ` +- `gradient_clip`: ` 0.1 ` +- `adam_eps`: ` 1e-08 ` +- `nesterov`: ` True ` +- `nus`: ` [0.7] ` +- `betas`: ` [0.9, 0.999] ` +- `warmup_updates`: ` -1 ` +- `warmup_rate`: ` 0.0001 ` +- `update_freq`: ` 1 ` +- `fp16`: ` True ` +- `max_train_time`: ` -1 ` +
+
+ miscellaneous +
+ +- `save_every_n_secs`: ` 60.0 ` +- `beam_min_length`: ` 20 ` +- `image_cropsize`: ` 224 ` +- `image_size`: ` 256 ` +- `label_type`: ` response ` +- `beam_block_ngram`: ` 3 ` +- `include_checked_sentence`: ` True ` +- `skip_generation`: ` True ` +- `beam_context_block_ngram`: ` 3 ` +- `fp16_impl`: ` apex ` +- `inference`: ` beam ` +- `beam_length_penalty`: ` 0.65 ` +- `image_mode`: ` raw ` +- `use_reply`: ` label ` +- `topp`: ` 0.9 ` +- `beam_size`: ` 10 ` +- `adafactor_eps`: ` [1e-30, 0.001] ` +- `include_knowledge`: ` True ` +- `num_topics`: ` 5 ` +- `topk`: ` 10 ` +- `datatype`: ` train ` +
+ +## Feedback + +We would love any feedback about the model (or the model card script)! Feel free to report any issues or unexpected findings using our [GitHub Issues page](https://github.com/facebookresearch/ParlAI/issues) :blush: + + +[back-to-top](#blender-90m) diff --git a/sample_model_cards/blenderbot_90M/offensive_language_generation_metrics_safety_heatmap.png b/sample_model_cards/blenderbot_90M/offensive_language_generation_metrics_safety_heatmap.png new file mode 100644 index 00000000000..5e198cf02cd Binary files /dev/null and b/sample_model_cards/blenderbot_90M/offensive_language_generation_metrics_safety_heatmap.png differ diff --git a/sample_model_cards/safety_multi/model_card.md b/sample_model_cards/safety_multi/model_card.md new file mode 100644 index 00000000000..9ccc2e48308 --- /dev/null +++ b/sample_model_cards/safety_multi/model_card.md @@ -0,0 +1,208 @@ +# BERT Classifier Multi-turn Dialogue Safety Model + + + +Classifier trained on the multi-turn adversarial safety task in addition to both the single-turn standard and adversarial safety tasks and Wikipedia Toxic Comments. +- Developed by Facebook AI Research using [ParlAI](https://parl.ai/) +- Model started training on September 03, 2015. +- Type of model: Bert Classifier + +### Quick Usage + + +``` +parlai eval_model -t dialogue_safety:multiturn -dt test -mf zoo:dialogue_safety/multi_turn/model --split-lines True -bs 40 +``` + +### Sample Input And Output + +``` +[text]: Explanation +Why the edits made under my username Hardcore Metallica Fan were reverted? They weren't vandalisms, just closure on some GAs after I voted at New York Dolls FAC. And please don't remove the template from the talk page since I'm retired now.89.205.38.27 + +[labels]: __ok__ +--- +[identity_hate]: 0.0 +[threat]: 0.0 +[is_sensitive]: 0.0 +[toxic]: 0.0 +[sensitive]: 0.0 +[data_type]: train +[label_candidates]: __notok__, __ok__ +[insult]: 0.0 +[severe_toxic]: 0.0 +[obscene]: 0.0 +``` + +## Intended Use + +> :warning: This model is intended for the use of.... :warning: + +## Limitations + +> :warning: This model has has these limitations: ... :warning: + +## Privacy + +> :warning: This model has the following privacy concerns.... :warning: + +## Datasets Used + +This model was trained on the datasets below (use the `parlai display_data` commands to show data). Visit the [task (dataset) list](https://parl.ai/docs/tasks.html) for more details about the datasets. + + +- [dialogue safety (wikiToxicComments)](https://parl.ai/docs/tasks.html#dialogue-safety-(wikitoxiccomments)) +- [dialogue safety: adversarial: round-only=False: round=1](https://parl.ai/docs/tasks.html#dialogue-safety:-adversarial:-round-only=false:-round=1) +- [dialogue safety (multiturn)](https://parl.ai/docs/tasks.html#dialogue-safety-(multiturn)) + +In addition, we have also included some basic stats about the training datasets in the table below: + +|Dataset | avg utterance length | unique tokens | utterances | Display Dataset Command +:---: | :---: | :---: | :---: | :---: +dialogue safety: adversarial: round-only=False: round=1 | 11.335 | 6446 | 8000 | `parlai dd -t dialogue_safety:adversarial:round-only=False:round=1` +dialogue safety (multiturn) | 51.891 | 10853 | 24000 | `parlai dd -t dialogue_safety:multiturn` +dialogue safety (wikiToxicComments) | 87.685 | 206110 | 127656 | `parlai dd -t dialogue_safety:wikiToxicComments` + +Note: The display dataset commands were auto generated, so please visit [here](https://parl.ai/docs/cli_usage.html#display-data) for more details. + + +## Evaluation Results + +For evalution, we used the same training datasets; check the [Datasets Used](#datasets-used) section for more information + + +We used the metric `class notok f1`, the f1 scores for the class notok as the validation metric. Recall that `class___notok___f1` is unigram F1 overlap, under a standardized (model-independent) tokenizer. + +| | All | dialogue safety (wikiToxicComments) | dialogue safety: adversarial: round-only=False: round=1 | dialogue safety (multiturn) +:---: | :---: | :---: | :---: | :---: +`class notok f1` | 78.87% | 81.24% | 75.86% | 67.41% + + + +## Related Paper(s) + +[Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack](https://parl.ai/projects/dialogue_safety/) + +## Hyperparameters + +- `lr_scheduler`: ` fixed ` +- `batchsize`: ` 40 ` +- `learningrate`: ` 5e-05 ` +- `model`: ` bert_classifier ` +- `validation_patience`: ` 15 ` +- `validation_metric`: ` class___notok___f1 ` +- `multitask_weights`: ` [0.5, 0.1, 0.1, 0.3] ` +- `max_train_steps`: ` Not specified ` +- `num_epochs`: ` -1 ` +
+ model / neural net info +
+ +- `round`: ` 3 ` +- `threshold`: ` 0.5 ` +
+
+ embedding info +
+ +- `embedding_type`: ` random ` +- `embedding_projection`: ` random ` +
+
+ validation and logging info +
+ +- `validation_metric_mode`: ` max ` +- `validation_max_exs`: ` 10000 ` +- `validation_cutoff`: ` 1.0 ` +- `validation_every_n_secs`: ` 60.0 ` +- `save_after_valid`: ` True ` +- `validation_every_n_epochs`: ` -1 ` +
+
+ dictionary info/pre-processing +
+ +- `dict_unktoken`: ` __unk__ ` +- `dict_starttoken`: ` __start__ ` +- `dict_tokenizer`: ` re ` +- `dict_nulltoken`: ` __null__ ` +- `dict_textfields`: ` text,labels ` +- `dict_language`: ` english ` +- `dict_class`: ` parlai.agents.bert_ranker.bert_dictionary:BertDictionaryAgent ` +- `dict_endtoken`: ` __end__ ` +- `dict_build_first`: ` True ` +- `dict_max_ngram_size`: ` -1 ` +- `dict_maxtokens`: ` -1 ` +
+
+ other dataset-related info +
+ +- `fix_contractions`: ` True ` +- `truncate`: ` 300 ` +- `split_lines`: ` True ` +- `task`: ` dialogue_safety:multiturn ` +- `evaltask`: ` internal:safety:multiturnConvAI2 ` +
+
+ more batch and learning rate info +
+ +- `lr_scheduler_patience`: ` 3 ` +- `batch_sort_cache_type`: ` pop ` +- `batch_sort_field`: ` text ` +- `batchindex`: ` 39 ` +- `batch_length_range`: ` 5 ` +- `lr_scheduler_decay`: ` 0.9 ` +
+
+ training info +
+ +- `numthreads`: ` 1 ` +- `shuffle`: ` True ` +- `numworkers`: ` 4 ` +- `metrics`: ` default ` +- `gpu`: ` -1 ` +- `data_parallel`: ` True ` +- `optimizer`: ` sgd ` +- `gradient_clip`: ` 0.1 ` +- `adam_eps`: ` 1e-08 ` +- `nesterov`: ` True ` +- `nus`: ` [0.7] ` +- `betas`: ` [0.9, 0.999] ` +- `warmup_updates`: ` 2000 ` +- `warmup_rate`: ` 0.0001 ` +- `update_freq`: ` 1 ` +- `max_train_time`: ` -1 ` +
+
+ pytorch info +
+ +- `pytorch_context_length`: ` -1 ` +- `pytorch_include_labels`: ` True ` +
+
+ miscellaneous +
+ +- `image_size`: ` 256 ` +- `save_every_n_secs`: ` 60.0 ` +- `get_all_metrics`: ` True ` +- `sep_last_utt`: ` True ` +- `image_cropsize`: ` 224 ` +- `type_optimization`: ` all_encoder_layers ` +- `use_reply`: ` label ` +- `image_mode`: ` raw ` +- `datatype`: ` train ` +- `add_cls_token`: ` True ` +
+ +## Feedback + +We would love any feedback about the model (or the model card script)! Feel free to report any issues or unexpected findings using our [GitHub Issues page](https://github.com/facebookresearch/ParlAI/issues) :blush: + + +[back-to-top](#bert-classifier-multi-turn-dialogue-safety-model)