Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch Bart] Split Bart into different models #9343

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
129986e
first try
patrickvonplaten Dec 29, 2020
c963c8a
remove old template
patrickvonplaten Dec 29, 2020
9b10bf1
finish bart
patrickvonplaten Dec 29, 2020
ade222b
finish mbart
patrickvonplaten Dec 30, 2020
0c229e0
delete unnecessary line
patrickvonplaten Dec 30, 2020
69eec0c
init pegasus
patrickvonplaten Dec 30, 2020
47a5d9a
save intermediate
patrickvonplaten Dec 30, 2020
047f6c9
correct pegasus
patrickvonplaten Jan 2, 2021
e4988c3
finish pegasus
patrickvonplaten Jan 2, 2021
6269291
remove cookie cutter leftover
patrickvonplaten Jan 2, 2021
88792fa
add marian
patrickvonplaten Jan 2, 2021
fcdbcd9
finish blenderbot
patrickvonplaten Jan 2, 2021
1021dd6
replace in file
patrickvonplaten Jan 2, 2021
8fa7211
Merge remote-tracking branch 'main/master' into split_bart_into_sep
patrickvonplaten Jan 2, 2021
c8bbfa1
correctly split blenderbot
patrickvonplaten Jan 2, 2021
bdcaacf
delete "old" folder
patrickvonplaten Jan 2, 2021
8ebab5e
correct "add statement"
patrickvonplaten Jan 2, 2021
c206f6d
adapt config for tf comp
patrickvonplaten Jan 2, 2021
0b3ba48
correct configs for tf
patrickvonplaten Jan 2, 2021
a9757e2
remove ipdb
patrickvonplaten Jan 2, 2021
953b110
fix more stuff
patrickvonplaten Jan 2, 2021
165f271
fix mbart
patrickvonplaten Jan 2, 2021
e8afa3e
push pegasus fix
patrickvonplaten Jan 2, 2021
441446d
fix mbart
patrickvonplaten Jan 2, 2021
3fd722b
more fixes
patrickvonplaten Jan 2, 2021
85f36c4
fix research projects code
patrickvonplaten Jan 2, 2021
a7442fb
finish docs for bart, mbart, and marian
patrickvonplaten Jan 3, 2021
9332a06
delete unnecessary file
patrickvonplaten Jan 3, 2021
3cbfb7d
correct attn typo
patrickvonplaten Jan 4, 2021
b08d165
correct configs
patrickvonplaten Jan 4, 2021
93b9944
remove pegasus for seq class
patrickvonplaten Jan 4, 2021
04172c1
correct peg docs
patrickvonplaten Jan 4, 2021
ada1cd2
correct peg docs
patrickvonplaten Jan 4, 2021
6844c1b
fix flake8
patrickvonplaten Jan 4, 2021
0171fcd
finish configs
patrickvonplaten Jan 4, 2021
b6058ce
further improve docs
patrickvonplaten Jan 4, 2021
fcc944c
add copied from statements to mbart
patrickvonplaten Jan 4, 2021
6f54cc0
fix copied from in mbart
patrickvonplaten Jan 4, 2021
7b11e33
add copy statements to marian
patrickvonplaten Jan 4, 2021
36a5c26
add copied from to marian
patrickvonplaten Jan 4, 2021
b0762ca
add pegasus copied from
patrickvonplaten Jan 4, 2021
7b307b6
finish pegasus
patrickvonplaten Jan 4, 2021
d47b5cd
finish copied from
patrickvonplaten Jan 4, 2021
9e9b66f
Apply suggestions from code review
patrickvonplaten Jan 4, 2021
65fe574
make style
patrickvonplaten Jan 4, 2021
d3cbc55
backward comp blenderbot
patrickvonplaten Jan 5, 2021
e562d22
Merge branch 'split_bart_into_sep' of https://github.com/patrickvonpl…
patrickvonplaten Jan 5, 2021
e69ec94
apply lysandres and sylvains suggestions
patrickvonplaten Jan 5, 2021
0077a6e
apply suggestions
patrickvonplaten Jan 5, 2021
945ad5f
push last fixes
patrickvonplaten Jan 5, 2021
06bb472
merge conflicts
patrickvonplaten Jan 5, 2021
7e1b7fe
fix docs
patrickvonplaten Jan 5, 2021
c5e1ff3
fix tok tests
patrickvonplaten Jan 5, 2021
4f27946
Merge branch 'split_bart_into_sep' of https://github.com/patrickvonpl…
patrickvonplaten Jan 5, 2021
dff4c1a
fix imports code style
patrickvonplaten Jan 5, 2021
dc3cdef
fix doc
patrickvonplaten Jan 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,8 @@ TensorFlow and/or Flax.
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Blenderbot | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| BlenderbotSmall | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
Expand Down Expand Up @@ -361,6 +363,7 @@ TensorFlow and/or Flax.
model_doc/bertweet
model_doc/bertgeneration
model_doc/blenderbot
model_doc/blenderbot_small
model_doc/camembert
model_doc/ctrl
model_doc/deberta
Expand Down
1 change: 0 additions & 1 deletion docs/source/model_doc/bart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ Implementation Notes
summarization, see the example in that docstrings.
- Models that load the `facebook/bart-large-cnn` weights will not have a :obj:`mask_token_id`, or be able to perform
mask-filling tasks.
- For training/forward passes that don't involve beam search, pass :obj:`use_cache=False`.

Mask Filling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
40 changes: 10 additions & 30 deletions docs/source/model_doc/blenderbot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,10 @@ Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Blenderbot uses a standard `seq2seq model transformer <https://arxiv.org/pdf/1706.03762.pdf>`__ based architecture.
- It inherits completely from :class:`~transformers.BartForConditionalGeneration`
- Even though blenderbot is one model, it uses two tokenizers :class:`~transformers.BlenderbotSmallTokenizer` for 90M
checkpoint and :class:`~transformers.BlenderbotTokenizer` for all other checkpoints.
- :class:`~transformers.BlenderbotSmallTokenizer` will always return :class:`~transformers.BlenderbotSmallTokenizer`,
regardless of checkpoint. To use the 3B parameter checkpoint, you must call
:class:`~transformers.BlenderbotTokenizer` directly.
- Available checkpoints can be found in the `model hub <https://huggingface.co/models?search=blenderbot>`__.
- This is the `default` Blenderbot model class. However, some smaller checkpoints, such as
``facebook/blenderbot_small_90M``, have a different architecture and consequently should be used with
`BlenderbotSmall <https://huggingface.co/transformers/master/model_doc/blenderbot_small.html>`__.


Usage
Expand All @@ -59,26 +56,15 @@ Here is an example of model usage:

.. code-block::

>>> from transformers import BlenderbotSmallTokenizer, BlenderbotForConditionalGeneration
>>> mname = 'facebook/blenderbot-90M'
>>> from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration
>>> mname = 'facebook/blenderbot-400M-distill'
>>> model = BlenderbotForConditionalGeneration.from_pretrained(mname)
>>> tokenizer = BlenderbotSmallTokenizer.from_pretrained(mname)
>>> tokenizer = BlenderbotTokenizer.from_pretrained(mname)
>>> UTTERANCE = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([UTTERANCE], return_tensors='pt')
>>> reply_ids = model.generate(**inputs)
>>> print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in reply_ids])


Here is how you can check out config values:

.. code-block::


>>> from transformers import BlenderbotConfig
>>> config_90 = BlenderbotConfig.from_pretrained("facebook/blenderbot-90M")
>>> config_90.to_diff_dict() # show interesting Values.
>>> configuration_3B = BlenderbotConfig("facebook/blenderbot-3B")
>>> configuration_3B.to_diff_dict()
>>> print(tokenizer.batch_decode(reply_ids))
["<s> That's unfortunate. Are they trying to lose weight or are they just trying to be healthier?</s>"]


BlenderbotConfig
Expand All @@ -93,20 +79,14 @@ BlenderbotTokenizer
.. autoclass:: transformers.BlenderbotTokenizer
:members: build_inputs_with_special_tokens

BlenderbotSmallTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BlenderbotSmallTokenizer
:members:


BlenderbotModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See :obj:`transformers.BartModel` for arguments to `forward` and `generate`

.. autoclass:: transformers.BlenderbotModel
:members:
:members: forward


BlenderbotForConditionalGeneration
Expand All @@ -115,7 +95,7 @@ BlenderbotForConditionalGeneration
See :obj:`transformers.BartForConditionalGeneration` for arguments to `forward` and `generate`

.. autoclass:: transformers.BlenderbotForConditionalGeneration
:members:
:members: forward


TFBlenderbotForConditionalGeneration
Expand Down
70 changes: 70 additions & 0 deletions docs/source/model_doc/blenderbot_small.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

Blenderbot Small
-----------------------------------------------------------------------------------------------------------------------

Note that :class:`~transformers.BlenderbotSmallModel` and
:class:`~transformers.BlenderbotSmallForConditionalGeneration` are only used in combination with the checkpoint
`facebook/blenderbot-90M <https://huggingface.co/facebook/blenderbot-90M>`__. Larger Blenderbot checkpoints should
instead be used with :class:`~transformers.BlenderbotModel` and
:class:`~transformers.BlenderbotForConditionalGeneration`

Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Blender chatbot model was proposed in `Recipes for building an open-domain chatbot
<https://arxiv.org/pdf/2004.13637.pdf>`__ Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu,
Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.

The abstract of the paper is the following:

*Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that
scaling neural models in the number of parameters and the size of the data they are trained on gives improved results,
we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of
skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to
their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent
persona. We show that large scale models can learn these skills when given appropriate training data and choice of
generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models
and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.*

The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .

BlenderbotSmallConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BlenderbotSmallConfig
:members:


BlenderbotSmallTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BlenderbotSmallTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary


BlenderbotSmallModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BlenderbotSmallModel
:members: forward


BlenderbotSmallForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.BlenderbotSmallForConditionalGeneration
:members: forward
16 changes: 12 additions & 4 deletions docs/source/model_doc/marian.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ Implementation Notes
- The modeling code is the same as :class:`~transformers.BartForConditionalGeneration` with a few minor modifications:

- static (sinusoid) positional embeddings (:obj:`MarianConfig.static_position_embeddings=True`)
- a new final_logits_bias (:obj:`MarianConfig.add_bias_logits=True`)
- no layernorm_embedding (:obj:`MarianConfig.normalize_embedding=False`)
- the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
:obj:`<s/>`),
Expand All @@ -56,9 +55,10 @@ Examples

- Since Marian models are smaller than many other translation models available in the library, they can be useful for
fine-tuning experiments and integration tests.
- :prefix_link:`Fine-tune on TPU <examples/seq2seq/builtin_trainer/train_distil_marian_enro_tpu.sh>`
- :prefix_link:`Fine-tune on GPU <examples/seq2seq/builtin_trainer/train_distil_marian_enro.sh>`
- :prefix_link:`Fine-tune on GPU with pytorch-lightning <examples/seq2seq/distil_marian_no_teacher.sh>`
- `Fine-tune on GPU
<https://github.com/huggingface/transformers/blob/master/examples/research_projects/seq2seq-distillation/train_distil_marian_enro_teacher.sh>`__
- `Fine-tune on GPU with pytorch-lightning
<https://github.com/huggingface/transformers/blob/master/examples/research_projects/seq2seq-distillation/train_distil_marian_no_teacher.sh>`__

Multilingual Models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -179,10 +179,18 @@ MarianTokenizer
:members: prepare_seq2seq_batch


MarianModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.MarianModel
:members: forward


MarianMTModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.MarianMTModel
:members: forward


TFMarianMTModel
Expand Down
13 changes: 13 additions & 0 deletions docs/source/model_doc/mbart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,19 @@ MBartForConditionalGeneration
:members:


MBartForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.MBartForQuestionAnswering
:members:


MBartForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.MBartForSequenceClassification


TFMBartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
3 changes: 2 additions & 1 deletion docs/source/model_doc/pegasus.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@ Implementation Notes
- Some key configuration differences:

- static, sinusoidal position embeddings
- no :obj:`layernorm_embedding` (:obj:`PegasusConfig.normalize_embedding=False`)
- the model starts generating with pad_token_id (which has 0 token_embedding) as the prefix.
- more beams are used (:obj:`num_beams=8`)
- All pretrained pegasus checkpoints are the same besides three attributes: :obj:`tokenizer.model_max_length` (maximum
Expand Down Expand Up @@ -122,12 +121,14 @@ PegasusModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.PegasusModel
:members: forward


PegasusForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.PegasusForConditionalGeneration
:members: forward


TFPegasusForConditionalGeneration
Expand Down
Loading