[PyTorch Bart] Split Bart into different models #9343

patrickvonplaten · 2020-12-29T15:55:30Z

What does this PR do?

This PR splits all Bart-like models into their own respective classes for PyTorch models only. This is more in line with the general philosophy of the library to have self-contained model files.

As discussed with @jplu, the TF models will be separated in a future PR as there are still some issues and improvements (TF serving) blocking the separation - see #9313.

In short, after this PR all those "model-specific" config parameters are removed from all Bart-like configs:

extra_pos_embeddings
normalize_embedding
add_final_layer_norm
normalize_before
do_blenderbot_90_layernorm
static_position_embeddings
add_bias_logits
force_bos_token_to_be_generated (this one has to be kept for Bart though)

and each "bart" model (Pegasus, Bart, MBart, Marian, Blenderbot, BlenderbotSmall) will get its own modeling_....py file.

At the moment the models have the following configurations:

	`extra_pos_embeddings`	`normalize_before`	`add_final_layer_norm`	`do_blenderbot_90_layernorm`	`normalize_embedding`	`static_position_embeddings`	`add_bias_logits`	`force_bos_token_to_be_generated`
`bart`	2	❌	❌	❌	✔️	❌	❌	✔️
`mbart`	2	✔️	✔️	❌	✔️	❌	❌	❌
`marian`	❌	❌	❌	❌	❌	✔️	❌	❌
`pegasus`	❌	✔️	✔️	❌	❌	✔️	❌	❌
`blenderbot90M (BlenderbotSmall)`	0	❌	❌	✔️	✔️	❌	❌	❌
`blenderbot3B + rest (Blenderbot)`	0	✔️	✔️	✔️	❌	❌	❌	❌

We can see that add_bias_logits is actually never used, so I think the best option is to just delete the functionality. Also, one can see that no two models have the exact same usage of the above params, so we'll make 6 different modeling_....py files.

Resulting Improvements:

The model files are much more readable and should be much easier to navigate for the user. No difficult config parameters anymore where the user doesn't know what to set anyways, such as normalize_before.
All irrelevant Bart-like features for other models are removed. Those features are a) never mentioned in the paper, b) don't make any sense since the model wasn't trained with those features, so that the usage of those features leads to non-sense outputs. E.g. Marian was never supposed to be a "mask-filling" model, yet it has "mask-filling" functionality, when doing:

marian = MarianMTModel.from_pretrained(...)
marian(input_ids)  # no decoder_input_ids for mask filling like tasks such as in Bart
# => output makes 0 sense

The big gain here is that users are better guided on how to use the model and wonder less about whether the model is used correctly & whether there is a bug in the model.

Docstrings are improved with more Model-specific examples and fewer comparisons to Bart. E.g. Pegasus, Marian, and Blenderbot never really mention BART in their paper and have no direct relation to BART IMO => these models should not be compared to BART in the docs -> it's confusing for the user
Some small improvements, memory is slightly improved for beam search and gradient checkpointing is added.
All previous tests are copied + some additional tests are added for each model

Possible drawback

The drawback as expected is code duplication. This is remedied to some extent by using the # Copied from ... safety features
Some breaking changes as explained further below
Models might now diverge easier in the future which could make it harder to have the same API for training. This is however also prevented by some function signature tests that are already implemented.

Breaking changes

🚨🚨 Important: We cannot keep 100% backward compatibility here or the PR won't make much sense 🚨🚨

Since all models were packed into a single model file a lot of different model design are at the moment possible. E.g.
Pegasus was only ever used with Sinusoidal position embeddings (as mentioned in the paper) but since it's merged into modeling_bart.py, one could theoretically use Pegasus with Learned position embeddings. This is not done in any config on the model hub however and will not be possible anymore after the PR. Also, Marian's model design has never normalized the word embeddings, but it could be possible with the current design. But again no config in the model hub does that, so this will also not be possible anymore after the PR. In short: All model designs that were never foreseen in the original model and that are never used on the model hub at the moment won't be allowed anymore after the PR.
If we would not make this change, it would mean that we would have to keep all those normalize_before configs, which in return would mean that the modeling code of all Bart-like models would be the same again.
Blenderbot needs to be divided into two models IMO. Blenderbot 90M not only has a very different architecture (see table above), but also uses a different tokenizer. I created a new BlenderbotSmallModel class. Thus I need to update one Blenderbot config online, changing it's class. This means that from this PR onward the following is not supported anymore:

from transformers import BlenderbotForConditionalGeneration

model = BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-90M")
# => this is a wrong model. It should be
model = BlenderbotSmallForConditionalGeneration.from_pretrained("facebook/blenderbot-90M")

That's a big breaking change, but I don't see another way. If we keep the small blenderbot in the "normal" blenderbot, we have to keep the config params normalize_before which I really don't want to do.... I think the best option here is to add a warning (or even an error) by overwriting from_pretrained(...) in BlenderbotForConditionalGeneration so that

model = BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-90M")

will throw an error or give a warning. There are no fine-tuning blenderbot models on the hub and this is the only config. I think it's the right approach to separate the model here

Barthez has essentially a mbart architecture, but has bart defined as its model_type in the configs. Here I'd also like to change the configs online to make sure the correct model is loaded when using AutoModelForSeq2SeqLM. I should also contact the author here.
Bart allowed to automatically create decoder_input_ids by shifting the input_ids to the right. Thus, in Bart one can do the following:

bart = BartForConditionalGeneration(...)
bart(input_ids) # not that no decoder_input_ids are passed here

This is a very special case and should only be used for Bart-like denoising pre-training or mask-filling. The only models that were trained in this fashion and thus can do mask-filling are Bart and MBart. All other models cannot do mask-filling so that decoder_input_ids should never be created from shifting input_ids. => this feature is removed therefore from Pegasus, Marian, Blenderbot, and BlenderbotSmall

Those are all breaking changes. Blenderbot is the big one, the other one should be fine. To be sure, I wrote some scripts that verify that no model on the model hub that contains one of the keywords bart, mbart, pegasus, blenderbot, opus-mt, barthez has incorrect/unexpected parameter settings after the PR.

TODO:

Future TODO:

Communitace about this PR on the forum
Add Copied From statements to seq2seq bart model templates
Add Copied From statements to LED

src/transformers/commands/add_new_model.py

src/transformers/models/blenderbot_small/configuration_blenderbot_small.py

examples/research_projects/seq2seq-distillation/utils copy.py

LysandreJik

Great job splitting the four models! Given that the slow tests pass, this LGTM once the modifications we discussed regarding the small blenderbot model are applied.

Fantastic job 🎉

docs/source/model_doc/blenderbot.rst

src/transformers/models/bart/configuration_bart.py

src/transformers/models/bart/modeling_bart.py

tests/test_modeling_blenderbot_small.py

tests/test_modeling_marian.py

tests/test_modeling_mbart.py

tests/test_modeling_pegasus.py

tests/test_tokenization_small_blenderbot.py

sgugger

Exceptional work! I have a few nits, which, from the way they are duplicated for each model, sound like issues in the templates. If you could fix the templates as you fix the nits, that would be truly amazing.

docs/source/model_doc/blenderbot.rst

docs/source/model_doc/blenderbot_small.rst

src/transformers/models/bart/configuration_bart.py

tests/test_modeling_blenderbot_small.py

tests/test_modeling_marian.py

tests/test_modeling_mbart.py

tests/test_modeling_pegasus.py

sgugger · 2021-01-04T20:54:38Z

utils/check_repo.py

+    "BlenderbotSmallEncoder",  # Building part of bigger (tested) model.
+    "BlenderbotSmallDecoder",  # Building part of bigger (tested) model.
+    "BlenderbotEncoder",  # Building part of bigger (tested) model.
+    "BlenderbotDecoder",  # Building part of bigger (tested) model.
+    "MBartEncoder",  # Building part of bigger (tested) model.
+    "MBartDecoder",  # Building part of bigger (tested) model.
+    "PegasusEncoder",  # Building part of bigger (tested) model.
+    "PegasusDecoder",  # Building part of bigger (tested) model.


As a followup PR for me, all Encoder and Decoder models should be ignored for the checks of tested and auto-configured.

sgugger · 2021-01-04T22:35:38Z

One important comment I forgot to add to my review: I don't think we should adapt the research_project to the new structure as it has been pinned to an earlier version of transformers (3.5.1). So apart from the duplicate file deleted, the other changes should be reverted IMO.

…aten/transformers into split_bart_into_sep

* first try * remove old template * finish bart * finish mbart * delete unnecessary line * init pegasus * save intermediate * correct pegasus * finish pegasus * remove cookie cutter leftover * add marian * finish blenderbot * replace in file * correctly split blenderbot * delete "old" folder * correct "add statement" * adapt config for tf comp * correct configs for tf * remove ipdb * fix more stuff * fix mbart * push pegasus fix * fix mbart * more fixes * fix research projects code * finish docs for bart, mbart, and marian * delete unnecessary file * correct attn typo * correct configs * remove pegasus for seq class * correct peg docs * correct peg docs * finish configs * further improve docs * add copied from statements to mbart * fix copied from in mbart * add copy statements to marian * add copied from to marian * add pegasus copied from * finish pegasus * finish copied from * Apply suggestions from code review * make style * backward comp blenderbot * apply lysandres and sylvains suggestions * apply suggestions * push last fixes * fix docs * fix tok tests * fix imports code style * fix doc

patrickvonplaten added 2 commits December 29, 2020 15:54

first try

129986e

remove old template

c963c8a

patrickvonplaten commented Dec 29, 2020

View reviewed changes

src/transformers/commands/add_new_model.py Outdated Show resolved Hide resolved

patrickvonplaten changed the title ~~first try~~ [PyTorch Bart] Split Bart into different models Dec 29, 2020

patrickvonplaten changed the title ~~[PyTorch Bart] Split Bart into different models~~ [WIP][PyTorch Bart] Split Bart into different models Dec 29, 2020

patrickvonplaten added 24 commits December 29, 2020 21:36

finish bart

9b10bf1

finish mbart

ade222b

delete unnecessary line

0c229e0

init pegasus

69eec0c

save intermediate

47a5d9a

correct pegasus

047f6c9

finish pegasus

e4988c3

remove cookie cutter leftover

6269291

add marian

88792fa

finish blenderbot

fcdbcd9

replace in file

1021dd6

Merge remote-tracking branch 'main/master' into split_bart_into_sep

8fa7211

correctly split blenderbot

c8bbfa1

delete "old" folder

bdcaacf

correct "add statement"

8ebab5e

adapt config for tf comp

c206f6d

correct configs for tf

0b3ba48

remove ipdb

a9757e2

fix more stuff

953b110

fix mbart

165f271

push pegasus fix

e8afa3e

fix mbart

441446d

more fixes

3fd722b

fix research projects code

85f36c4

patrickvonplaten mentioned this pull request Jan 3, 2021

[model parallelism] Bart goes parallel #9384

Closed

8 tasks

patrickvonplaten commented Jan 4, 2021

View reviewed changes

src/transformers/models/blenderbot_small/configuration_blenderbot_small.py Outdated Show resolved Hide resolved

patrickvonplaten changed the title ~~[WIP][PyTorch Bart] Split Bart into different models~~ [PyTorch Bart] Split Bart into different models Jan 4, 2021

patrickvonplaten added 2 commits January 4, 2021 16:59

Apply suggestions from code review

9e9b66f

make style

65fe574

patrickvonplaten commented Jan 4, 2021

View reviewed changes

examples/research_projects/seq2seq-distillation/utils copy.py Show resolved Hide resolved

LysandreJik approved these changes Jan 4, 2021

View reviewed changes

patrickvonplaten mentioned this pull request Jan 4, 2021

replace apex.normalization.FusedLayerNorm with torch.nn.LayerNorm #9386

Merged

sgugger approved these changes Jan 4, 2021

View reviewed changes

patrickvonplaten added 2 commits January 5, 2021 15:38

backward comp blenderbot

d3cbc55

Merge branch 'split_bart_into_sep' of https://github.com/patrickvonpl…

e562d22

…aten/transformers into split_bart_into_sep

patrickvonplaten mentioned this pull request Jan 5, 2021

[Announcement] Changing model type of Barthez #9422

Closed

patrickvonplaten added 9 commits January 5, 2021 19:32

apply lysandres and sylvains suggestions

e69ec94

apply suggestions

0077a6e

push last fixes

945ad5f

merge conflicts

06bb472

fix docs

7e1b7fe

fix tok tests

c5e1ff3

Merge branch 'split_bart_into_sep' of https://github.com/patrickvonpl…

4f27946

…aten/transformers into split_bart_into_sep

fix imports code style

dff4c1a

fix doc

dc3cdef

patrickvonplaten merged commit eef6603 into huggingface:master Jan 5, 2021

patrickvonplaten deleted the split_bart_into_sep branch January 5, 2021 21:00

patrickvonplaten mentioned this pull request Jan 7, 2021

Add head_mask/decoder_head_mask for BART #9404

Closed

stancld added a commit to stancld/transformers that referenced this pull request Jan 8, 2021

Rebase BART based on the commit huggingface#9343

dbf9845

patrickvonplaten mentioned this pull request Jan 10, 2021

[TFBart] Split TF-Bart #9497

Merged

1 task

sgugger mentioned this pull request Jan 21, 2021

Fix memory regression in Seq2Seq example #9713

Merged

patil-suraj mentioned this pull request Feb 23, 2022

Add PreLN to fsmt module #15747

Closed

5 tasks

ankitvad mentioned this pull request Dec 20, 2022

Changes to BART shift_token_right and using the proper shifting index EOS or BOS. #20842

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch Bart] Split Bart into different models #9343

[PyTorch Bart] Split Bart into different models #9343

patrickvonplaten commented Dec 29, 2020 •

edited

Loading

LysandreJik left a comment

sgugger left a comment

sgugger Jan 4, 2021

sgugger commented Jan 4, 2021

[PyTorch Bart] Split Bart into different models #9343

[PyTorch Bart] Split Bart into different models #9343

Conversation

patrickvonplaten commented Dec 29, 2020 • edited Loading

What does this PR do?

Resulting Improvements:

Possible drawback

Breaking changes

TODO:

Future TODO:

LysandreJik left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

sgugger Jan 4, 2021

Choose a reason for hiding this comment

sgugger commented Jan 4, 2021

patrickvonplaten commented Dec 29, 2020 •

edited

Loading