Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch Bart] Split Bart into different models #9343

Merged

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Dec 29, 2020

What does this PR do?

This PR splits all Bart-like models into their own respective classes for PyTorch models only. This is more in line with the general philosophy of the library to have self-contained model files.

As discussed with @jplu, the TF models will be separated in a future PR as there are still some issues and improvements (TF serving) blocking the separation - see #9313.

In short, after this PR all those "model-specific" config parameters are removed from all Bart-like configs:

  • extra_pos_embeddings
  • normalize_embedding
  • add_final_layer_norm
  • normalize_before
  • do_blenderbot_90_layernorm
  • static_position_embeddings
  • add_bias_logits
  • force_bos_token_to_be_generated (this one has to be kept for Bart though)

and each "bart" model (Pegasus, Bart, MBart, Marian, Blenderbot, BlenderbotSmall) will get its own modeling_....py file.

At the moment the models have the following configurations:

extra_pos_embeddings normalize_before add_final_layer_norm do_blenderbot_90_layernorm normalize_embedding static_position_embeddings add_bias_logits force_bos_token_to_be_generated
bart 2 ✔️ ✔️
mbart 2 ✔️ ✔️ ✔️
marian ✔️
pegasus ✔️ ✔️ ✔️
blenderbot90M (BlenderbotSmall) 0 ✔️ ✔️
blenderbot3B + rest (Blenderbot) 0 ✔️ ✔️ ✔️

We can see that add_bias_logits is actually never used, so I think the best option is to just delete the functionality. Also, one can see that no two models have the exact same usage of the above params, so we'll make 6 different modeling_....py files.

Resulting Improvements:

  • The model files are much more readable and should be much easier to navigate for the user. No difficult config parameters anymore where the user doesn't know what to set anyways, such as normalize_before.

  • All irrelevant Bart-like features for other models are removed. Those features are a) never mentioned in the paper, b) don't make any sense since the model wasn't trained with those features, so that the usage of those features leads to non-sense outputs. E.g. Marian was never supposed to be a "mask-filling" model, yet it has "mask-filling" functionality, when doing:

marian = MarianMTModel.from_pretrained(...)
marian(input_ids)  # no decoder_input_ids for mask filling like tasks such as in Bart
# => output makes 0 sense

The big gain here is that users are better guided on how to use the model and wonder less about whether the model is used correctly & whether there is a bug in the model.

  • Docstrings are improved with more Model-specific examples and fewer comparisons to Bart. E.g. Pegasus, Marian, and Blenderbot never really mention BART in their paper and have no direct relation to BART IMO => these models should not be compared to BART in the docs -> it's confusing for the user

  • Some small improvements, memory is slightly improved for beam search and gradient checkpointing is added.

  • All previous tests are copied + some additional tests are added for each model

Possible drawback

  • The drawback as expected is code duplication. This is remedied to some extent by using the # Copied from ... safety features
  • Some breaking changes as explained further below
  • Models might now diverge easier in the future which could make it harder to have the same API for training. This is however also prevented by some function signature tests that are already implemented.

Breaking changes

🚨🚨 Important: We cannot keep 100% backward compatibility here or the PR won't make much sense 🚨🚨

  • Since all models were packed into a single model file a lot of different model design are at the moment possible. E.g.
    Pegasus was only ever used with Sinusoidal position embeddings (as mentioned in the paper) but since it's merged into modeling_bart.py, one could theoretically use Pegasus with Learned position embeddings. This is not done in any config on the model hub however and will not be possible anymore after the PR. Also, Marian's model design has never normalized the word embeddings, but it could be possible with the current design. But again no config in the model hub does that, so this will also not be possible anymore after the PR. In short: All model designs that were never foreseen in the original model and that are never used on the model hub at the moment won't be allowed anymore after the PR.
    If we would not make this change, it would mean that we would have to keep all those normalize_before configs, which in return would mean that the modeling code of all Bart-like models would be the same again.

  • Blenderbot needs to be divided into two models IMO. Blenderbot 90M not only has a very different architecture (see table above), but also uses a different tokenizer. I created a new BlenderbotSmallModel class. Thus I need to update one Blenderbot config online, changing it's class. This means that from this PR onward the following is not supported anymore:

from transformers import BlenderbotForConditionalGeneration

model = BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-90M")
# => this is a wrong model. It should be
model = BlenderbotSmallForConditionalGeneration.from_pretrained("facebook/blenderbot-90M")

That's a big breaking change, but I don't see another way. If we keep the small blenderbot in the "normal" blenderbot, we have to keep the config params normalize_before which I really don't want to do.... I think the best option here is to add a warning (or even an error) by overwriting from_pretrained(...) in BlenderbotForConditionalGeneration so that

model = BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-90M")

will throw an error or give a warning. There are no fine-tuning blenderbot models on the hub and this is the only config. I think it's the right approach to separate the model here

  • Barthez has essentially a mbart architecture, but has bart defined as its model_type in the configs. Here I'd also like to change the configs online to make sure the correct model is loaded when using AutoModelForSeq2SeqLM. I should also contact the author here.

  • Bart allowed to automatically create decoder_input_ids by shifting the input_ids to the right. Thus, in Bart one can do the following:

bart = BartForConditionalGeneration(...)
bart(input_ids) # not that no decoder_input_ids are passed here

This is a very special case and should only be used for Bart-like denoising pre-training or mask-filling. The only models that were trained in this fashion and thus can do mask-filling are Bart and MBart. All other models cannot do mask-filling so that decoder_input_ids should never be created from shifting input_ids. => this feature is removed therefore from Pegasus, Marian, Blenderbot, and BlenderbotSmall

Those are all breaking changes. Blenderbot is the big one, the other one should be fine. To be sure, I wrote some scripts that verify that no model on the model hub that contains one of the keywords bart, mbart, pegasus, blenderbot, opus-mt, barthez has incorrect/unexpected parameter settings after the PR.

TODO:

  • Create Bart model file & pass all tests
  • Create MBart model file & pass all tests
  • Greate Pegasus model file & pass all tests
  • Create Marian model file & pass all tests
  • Create Blenderbot model file & pass all tests
  • Create BlenderbotSmall model file & pass all tests
  • Clean PR (delete all helper files)
  • Clean docs
  • Add #Copied From statements
  • To a very in-detail review of own PR to make sure no hidden bugs were introduced.
  • Correct configs of barthez online to be of type mbart instead of bart.
  • Correct config of https://huggingface.co/facebook/blenderbot-90M online.

Future TODO:

  • Communitace about this PR on the forum
  • Add Copied From statements to seq2seq bart model templates
  • Add Copied From statements to LED

@patrickvonplaten patrickvonplaten changed the title first try [PyTorch Bart] Split Bart into different models Dec 29, 2020
@patrickvonplaten patrickvonplaten changed the title [PyTorch Bart] Split Bart into different models [WIP][PyTorch Bart] Split Bart into different models Dec 29, 2020
@patrickvonplaten patrickvonplaten changed the title [WIP][PyTorch Bart] Split Bart into different models [PyTorch Bart] Split Bart into different models Jan 4, 2021
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job splitting the four models! Given that the slow tests pass, this LGTM once the modifications we discussed regarding the small blenderbot model are applied.

Fantastic job 🎉

docs/source/model_doc/blenderbot.rst Outdated Show resolved Hide resolved
src/transformers/models/bart/configuration_bart.py Outdated Show resolved Hide resolved
src/transformers/models/bart/modeling_bart.py Outdated Show resolved Hide resolved
src/transformers/models/bart/modeling_bart.py Outdated Show resolved Hide resolved
tests/test_modeling_blenderbot_small.py Outdated Show resolved Hide resolved
tests/test_modeling_marian.py Outdated Show resolved Hide resolved
tests/test_modeling_mbart.py Outdated Show resolved Hide resolved
tests/test_modeling_pegasus.py Outdated Show resolved Hide resolved
tests/test_tokenization_small_blenderbot.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceptional work! I have a few nits, which, from the way they are duplicated for each model, sound like issues in the templates. If you could fix the templates as you fix the nits, that would be truly amazing.

docs/source/model_doc/blenderbot.rst Outdated Show resolved Hide resolved
docs/source/model_doc/blenderbot.rst Outdated Show resolved Hide resolved
docs/source/model_doc/blenderbot_small.rst Outdated Show resolved Hide resolved
src/transformers/models/bart/configuration_bart.py Outdated Show resolved Hide resolved
tests/test_modeling_blenderbot_small.py Outdated Show resolved Hide resolved
tests/test_modeling_marian.py Outdated Show resolved Hide resolved
tests/test_modeling_mbart.py Outdated Show resolved Hide resolved
tests/test_modeling_pegasus.py Outdated Show resolved Hide resolved
Comment on lines +36 to +43
"BlenderbotSmallEncoder", # Building part of bigger (tested) model.
"BlenderbotSmallDecoder", # Building part of bigger (tested) model.
"BlenderbotEncoder", # Building part of bigger (tested) model.
"BlenderbotDecoder", # Building part of bigger (tested) model.
"MBartEncoder", # Building part of bigger (tested) model.
"MBartDecoder", # Building part of bigger (tested) model.
"PegasusEncoder", # Building part of bigger (tested) model.
"PegasusDecoder", # Building part of bigger (tested) model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup PR for me, all Encoder and Decoder models should be ignored for the checks of tested and auto-configured.

@sgugger
Copy link
Collaborator

sgugger commented Jan 4, 2021

One important comment I forgot to add to my review: I don't think we should adapt the research_project to the new structure as it has been pinned to an earlier version of transformers (3.5.1). So apart from the duplicate file deleted, the other changes should be reverted IMO.

@patrickvonplaten patrickvonplaten merged commit eef6603 into huggingface:master Jan 5, 2021
@patrickvonplaten patrickvonplaten deleted the split_bart_into_sep branch January 5, 2021 21:00
stancld added a commit to stancld/transformers that referenced this pull request Jan 8, 2021
@patrickvonplaten patrickvonplaten mentioned this pull request Jan 10, 2021
1 task
guyrosin pushed a commit to guyrosin/transformers that referenced this pull request Jan 15, 2021
* first try

* remove old template

* finish bart

* finish mbart

* delete unnecessary line

* init pegasus

* save intermediate

* correct pegasus

* finish pegasus

* remove cookie cutter leftover

* add marian

* finish blenderbot

* replace in file

* correctly split blenderbot

* delete "old" folder

* correct "add statement"

* adapt config for tf comp

* correct configs for tf

* remove ipdb

* fix more stuff

* fix mbart

* push pegasus fix

* fix mbart

* more fixes

* fix research projects code

* finish docs for bart, mbart, and marian

* delete unnecessary file

* correct attn typo

* correct configs

* remove pegasus for seq class

* correct peg docs

* correct peg docs

* finish configs

* further improve docs

* add copied from statements to mbart

* fix copied from in mbart

* add copy statements to marian

* add copied from to marian

* add pegasus copied from

* finish pegasus

* finish copied from

* Apply suggestions from code review

* make style

* backward comp blenderbot

* apply lysandres and sylvains suggestions

* apply suggestions

* push last fixes

* fix docs

* fix tok tests

* fix imports code style

* fix doc
@patil-suraj patil-suraj mentioned this pull request Feb 23, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants