Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OPT #17088

Merged
merged 158 commits into from
May 12, 2022
Merged

Add OPT #17088

merged 158 commits into from
May 12, 2022

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented May 4, 2022

What does this PR do?

A PR to add OPT-350-m model on transformers library 🤗
We tested the logits and the generation script and the logits on the DGX and we match the results from metaseq

TO DOs

  • code lint + integration tests
  • discuss how to import the logits for the hardcoded tests
  • correct documentation

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @LysandreJik @patrickvonplaten @stas00 @sgugger

younesbelkada and others added 5 commits May 4, 2022 18:45
- remove commented block
- remove unecessary files
- remove a test file
- added the logits test
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
@younesbelkada
Copy link
Contributor Author

younesbelkada commented May 4, 2022

Also I have noticed something here: https://github.com/younesbelkada/transformers/blob/13632212d5bacc83bb9a9b8b6816a70bde865dbc/src/transformers/models/opt/modeling_opt.py#L249
Initially in our implementation the attention mask has a shape batch_size, 1, seq_len, seq_len whereas in the metaseq implementation the masks is identitcal across the whole batch therefore has a shape of seq_len, seq_len. Is it a common practice that the attention mask is identical across the whole batch ?

@patrickvonplaten
Copy link
Contributor

identitcal

I shouldn't but I think for causal LM it's fine to do this since the attention_mask is usually just the causal mask which is the same for every batch. Later tokens can never attend to earlier tokens when there is a causal mask, so it's often good enough to use that even when tokens are padded

- rm mask filling example on docstring
- remove useless args
@younesbelkada
Copy link
Contributor Author

Thank you very much for the explanation regarding the attention mask
Since I took the template from BART I think that it considered this attention strategy by default

@younesbelkada
Copy link
Contributor Author

younesbelkada commented May 4, 2022

I am facing this CI issue when running make repo-consistency and I can't figure out what is happening.. Do you have any idea by change what could cause this issue?
Screenshot from 2022-05-05 00-00-12

@younesbelkada
Copy link
Contributor Author

younesbelkada commented May 5, 2022

Linking this issue facebookresearch/metaseq#31
to understand how to load properly sharded models

@suchenzang
Copy link

Crashing the party here (apologies for not having more context on this codebase): is it normal for model imports to require >8k LOC added? Or is there something in particular about OPT models that requires all this additional glue code?

If this is is OPT's fault - is there anything we can do (on the metaseq side) to bridge this gap?

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working so fast on this new model integration!

docs/source/en/_toctree.yml Show resolved Hide resolved
docs/source/en/model_doc/opt.mdx Outdated Show resolved Hide resolved
docs/source/en/model_doc/opt.mdx Outdated Show resolved Hide resolved
docs/source/en/model_doc/opt.mdx Outdated Show resolved Hide resolved
src/transformers/models/auto/tokenization_auto.py Outdated Show resolved Hide resolved
src/transformers/models/opt/modeling_opt.py Outdated Show resolved Hide resolved
src/transformers/models/opt/modeling_opt.py Outdated Show resolved Hide resolved
src/transformers/models/opt/modeling_opt.py Outdated Show resolved Hide resolved
src/transformers/models/opt/modeling_opt.py Show resolved Hide resolved
tests/models/opt/test_modeling_opt.py Outdated Show resolved Hide resolved
ArthurZucker and others added 12 commits May 11, 2022 21:07
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for working on it, @younesbelkada, @ArthurZucker, @patrickvonplaten!

src/transformers/models/opt/configuration_opt.py Outdated Show resolved Hide resolved
src/transformers/models/opt/configuration_opt.py Outdated Show resolved Hide resolved
src/transformers/models/opt/modeling_opt.py Outdated Show resolved Hide resolved
src/transformers/models/opt/modeling_opt.py Outdated Show resolved Hide resolved
src/transformers/models/opt/modeling_opt.py Outdated Show resolved Hide resolved
src/transformers/models/opt/modeling_opt.py Show resolved Hide resolved
@ArthurZucker
Copy link
Collaborator

Should we merge? 🤗

@patrickvonplaten
Copy link
Contributor

Actually one sec, I'll update the model docs a bit again

@patrickvonplaten patrickvonplaten merged commit b971c76 into huggingface:main May 12, 2022
@patrickvonplaten patrickvonplaten deleted the opt-350-m branch May 12, 2022 10:25
@@ -137,6 +137,7 @@
("openai-gpt", ("OpenAIGPTTokenizer", "OpenAIGPTTokenizerFast" if is_tokenizers_available() else None)),
("gpt2", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)),
("gptj", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)),
("opt", ("GPT2Tokenizer", None)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add a fast tokenizer in a follow-up PR that is able to prepend the bos_token at the beginning. For this a new converter needs to be written.

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* First version - OPT model

* Final changes

- putting use cache to False

* few changes

- remove commented block

* few changes

- remove unecessary files

* fix style issues

* few changes

- remove a test file
- added the logits test

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* add gen tests

* few changes

- rm mask filling example on docstring

* few changes

- remove useless args

* some changes

- more tests should pass now
- needs to clean more
- documentation still needs to be done

* fix code quality

* major changes

- change attention architecture to BART-like
- modify some tests
- style fix

* rm useless classes

- remove opt for:
- QA
- cond generation
- seq classif

* Removed autodoc calls to non-existant classes

TOkenizers are not implemented

* Update src/transformers/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/auto/modeling_tf_auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Replaced OPTTokeniser with GPT2 tokenizer

* added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")

* Removed OPTTokenizer

* make style

* Make style replaces

``` ...).unsqueeze(```
by
``` >>>).unsqueeze(```

* make repo consistency

* Removed PretrainedOPTModel

* fix opt.mdx removed other heads

* fix init, removed 3 heads

* removed heads

* finished cleaning head

* removed seauence classif and question answering

* removed unused imports

* removed useless dummy object for QA, SC and CG

* removed tests for removed useless dummy object for QA, SC and CG

* Removed head_mask using encoder layers which don't exist

* fixed test

* fix line

* added OPT to toctree

* Updated model path with pushed weigths

* fix model path

* fixed code quality

* fixed embeddings and generation tests

* update paths

* clean comments

* removed OPTClassificationHead for sentence classification

* renamed hidden layer

* renamed num layers to standard num_hidden_layers

* num_attention_heads fix

* changes for 125m

* add first version for 125m

* add first version - flax

* add new version

* causal LM output

* replace output type with BaseModelOutputWithPastAndCrossAttentions

* revert working config from 150m to 350m

* clean

* removed decoder input ids

* fixed embed dim

* more embed_dim issues

* make style + removed enc_dec test

* update falx model

* removed troublesome copy

* added is_encoder_decoder=False to config

* added set_input emb fuinction to model class

* requires torch on embed test

* use head mask instead of decoder head mask input param solves a test

* 8 test remaining, update

* Updated create_and_check_decoder_model_past_large_inputs

* Make style

* update op tokenizer with condition

* make style

* See if I can push

* some clean up

* remove linear head hack

* save intermediate

* save correct attention

* add copied from from bart

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix part of the reviewss
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* same changes in naming / conversion

* correct mask

* more fixes

* delete FlaxOPT and TfOPT

* clean traces of Flax and Tf

* fix mask

* fixed positionnal embedding length when past key value is provoded

* get 125m, 6.7b to work

* Added do_layer_norm

* solved mismatch in load dictionnary

* clean up preapre opt input dict

* fixed past key value as bool

* fix previus

* fixed return dict False tuple issue

* All tests are passing

* Make style

* Ignore OPTDecoder non tested

* make fix-copies

* make repo consistency

* small fix

* removed uselss @torch.no_grad decorator

* make styl;e

* fix previous opt test

* style

* make style

* added opt documentation

* update OPT_PRETRAINED_MODEL_ARCHIVE_LIST

* up

* more fixes

* model & config work

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* added comment on padding hack (+2)

* cleaup

* review update

* docstring for missing arg

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/opt/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update pretrained map

* update path and tests

* make style

* styling

* make consistency

* add gpt2 tok new

* more tok fixes

* Update src/transformers/models/auto/tokenization_auto.py

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/models/opt/test_modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update based on reviews

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* make style

* make tokenizer auto tests pass

* apply Lysandre suggestion

* finish tests

* add some good tokenizer tests

* improve docs slighly

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
@Narsil Narsil mentioned this pull request Aug 24, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants