[EncoderDecoder] Add Cross Attention for GPT2 #6415

patrickvonplaten · 2020-08-11T16:03:26Z

This PR implements Bert2GPT2 by adding cross-attention layers to GPT2.

Note that currently it is not possible to speed up decoder generation with the encoder-decoder framework (by using GPT2's past tensors) since it has to be implemented for all models that are compatible with the encoder/decoder framework (Bert, Roberta) before it can be used within the framework.

All GPT2 RUN_SLOW tests are verified to pass.

Future PRs TODO:

Verify that Bert2GPT2 works by training on CNN Daily Mail summarization
Add smart caching to Bert and add it to the encoder-decoder framework
Update encoder-decoder docs
Add a notebook explaining how to use encoder-decoder models.

src/transformers/modeling_gpt2.py

sgugger

Thanks, this looks great to me!

src/transformers/modeling_encoder_decoder.py

src/transformers/modeling_gpt2.py

LysandreJik

Great, looks good to me!

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

codecov · 2020-08-14T07:35:49Z

Codecov Report

Merging #6415 into master will decrease coverage by 0.00%.
The diff coverage is 96.61%.

@@            Coverage Diff             @@
##           master    #6415      +/-   ##
==========================================
- Coverage   79.98%   79.98%   -0.01%     
==========================================
  Files         153      153              
  Lines       28005    28039      +34     
==========================================
+ Hits        22401    22427      +26     
- Misses       5604     5612       +8

Impacted Files	Coverage Δ
src/transformers/modeling_encoder_decoder.py	`91.66% <87.50%> (+0.64%)`	⬆️
src/transformers/modeling_gpt2.py	`86.68% <97.87%> (+0.71%)`	⬆️
src/transformers/generation_utils.py	`96.94% <100.00%> (+0.01%)`	⬆️
src/transformers/modeling_tf_distilbert.py	`64.47% <0.00%> (-32.95%)`	⬇️
src/transformers/generation_tf_utils.py	`86.71% <0.00%> (+7.51%)`	⬆️
src/transformers/modeling_tf_flaubert.py	`87.73% <0.00%> (+63.19%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc82047...56094e2. Read the comment docs.

* Generation doc * MBartForConditionalGeneration (#6441) * add MBartForConditionalGeneration * style * rebase and fixes * add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS * fix docs * don't ignore mbart * doc * fix mbart fairseq link * put mbart before bart * apply doc suggestions * Use hash to clean the test dirs (#6475) * Use hash to clean the test dirs * Use hash to clean the test dirs * Use hash to clean the test dirs * fix * [EncoderDecoder] Add Cross Attention for GPT2 (#6415) * add cross attention layers for gpt2 * make gpt2 cross attention work * finish bert2gpt2 * add explicit comments * remove attention mask since not yet supported * revert attn mask in pipeline * Update src/transformers/modeling_gpt2.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_encoder_decoder.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Sort unique_no_split_tokens to make it deterministic (#6461) * change unique_no_split_tokens's type to set * use sorted list instead of set * style * Import accuracy_score (#6480) * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address comments * Styling * Generation doc * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address comments * Styling Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Kevin Canwen Xu <canwenxu@126.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: gijswijnholds <gijswijnholds@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

add cross attention layers for gpt2

ad5af2c

patrickvonplaten force-pushed the add_gpt2_encoder_decoder branch from 7e7c8ad to ad5af2c Compare August 13, 2020 07:05

patrickvonplaten added 2 commits August 13, 2020 09:54

make gpt2 cross attention work

5869055

finish bert2gpt2

2e99f03

patrickvonplaten commented Aug 13, 2020

View reviewed changes

src/transformers/modeling_gpt2.py Show resolved Hide resolved

patrickvonplaten commented Aug 13, 2020

View reviewed changes

src/transformers/modeling_gpt2.py Show resolved Hide resolved

add explicit comments

b17ce5f

patrickvonplaten changed the title ~~[WIP][EncoderDecoder] Add Cross Attention for GPT2~~ [EncoderDecoder] Add Cross Attention for GPT2 Aug 13, 2020

patrickvonplaten requested review from sshleifer, LysandreJik and sgugger August 13, 2020 08:25

patrickvonplaten added 2 commits August 13, 2020 10:59

remove attention mask since not yet supported

af12b19

revert attn mask in pipeline

99d70da

sgugger approved these changes Aug 13, 2020

View reviewed changes

src/transformers/modeling_encoder_decoder.py Outdated Show resolved Hide resolved

src/transformers/modeling_gpt2.py Show resolved Hide resolved

src/transformers/modeling_gpt2.py Outdated Show resolved Hide resolved

LysandreJik approved these changes Aug 14, 2020

View reviewed changes

patrickvonplaten and others added 2 commits August 14, 2020 09:25

Update src/transformers/modeling_gpt2.py

e002129

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/modeling_encoder_decoder.py

56094e2

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

patrickvonplaten merged commit 1d6e71e into huggingface:master Aug 14, 2020

patrickvonplaten mentioned this pull request Oct 25, 2022

bert2gtp_neo #19761

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EncoderDecoder] Add Cross Attention for GPT2 #6415

[EncoderDecoder] Add Cross Attention for GPT2 #6415

patrickvonplaten commented Aug 11, 2020 •

edited

Loading

sgugger left a comment

LysandreJik left a comment

codecov bot commented Aug 14, 2020 •

edited

Loading

[EncoderDecoder] Add Cross Attention for GPT2 #6415

[EncoderDecoder] Add Cross Attention for GPT2 #6415

Conversation

patrickvonplaten commented Aug 11, 2020 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 14, 2020 • edited Loading

Codecov Report

patrickvonplaten commented Aug 11, 2020 •

edited

Loading

codecov bot commented Aug 14, 2020 •

edited

Loading