[fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test #7224

stas00 · 2020-09-17T22:29:36Z

These changes are in one PR as they all fix problems for USE_CUDA=1

fix a few USE_CUDA=1 tests that got skipped previously (was missing .to(device))
rewrite SinusoidalPositionalEmbedding to be a normal nn.Embedding subclass with a normal self.weight param, but exclude this param from being saved with the state_dict since it's not trained, but deterministic
adjust PreTrainedModel.save_pretrained to support models that don't want all of their params saved. (needed for fsmt's SinusoidalPositionalEmbedding)
add a new test using TranslationPipeline (well, this one is just a new test)

This includes fixing: #7229

was needed for torchscript to work - this is now part of the state_dict, so will have to remove these keys during save_pretrained

codecov · 2020-09-19T18:16:10Z

Codecov Report

Merging #7224 into master will decrease coverage by 0.24%.
The diff coverage is 96.77%.

@@            Coverage Diff             @@
##           master    #7224      +/-   ##
==========================================
- Coverage   81.81%   81.57%   -0.25%     
==========================================
  Files         174      174              
  Lines       33446    33448       +2     
==========================================
- Hits        27364    27285      -79     
- Misses       6082     6163      +81

Impacted Files	Coverage Δ
src/transformers/modeling_utils.py	`86.96% <83.33%> (-0.27%)`	⬇️
src/transformers/modeling_fsmt.py	`93.99% <100.00%> (+0.40%)`	⬆️
src/transformers/modeling_tf_electra.py	`25.32% <0.00%> (-73.63%)`	⬇️
src/transformers/modeling_tf_roberta.py	`45.63% <0.00%> (-47.62%)`	⬇️
src/transformers/modeling_tf_ctrl.py	`83.74% <0.00%> (-14.14%)`	⬇️
src/transformers/modeling_gpt2.py	`79.03% <0.00%> (-7.80%)`	⬇️
src/transformers/generation_tf_utils.py	`82.95% <0.00%> (-3.51%)`	⬇️
src/transformers/modeling_ctrl.py	`95.16% <0.00%> (-2.42%)`	⬇️
src/transformers/modeling_tf_utils.py	`86.03% <0.00%> (-1.30%)`	⬇️
...rc/transformers/data/datasets/language_modeling.py	`92.94% <0.00%> (-1.18%)`	⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d90d0f...1b80514. Read the comment docs.

sshleifer

This PR is wonderful! parametrized.expand is a great find, and not saving static embeddings is an obvious win. We should add the latter to bart in a separate PR.

The expansion of embeddings may require a bit more care, but the comment below doesn't prevent merging this PR. You can just delete that logic later if it is bad.

Expanding Positional Embeddings

if max_pos > self.weight.size(0): # recompute/expand embeddings if needed

The reason I haven't autoexpanded the bart positional embeddings so far is that I wanted an error for long sequences that the model would translate poorly, instead of just poor performance. But if you can like concatenate a few en-ru examples and see that performance doesn't plummet it would be good. There is also a theoretical O(seq_len^2) theoretical cost associated with passing longer documents through transformers, so we may not want to encourage longer docs/instead write tooling that, for example, uses moses SentenceSplitter to chunk documents, pass them through the model, and rejoin the results correctly. If it just works with the auto expansion hack then I'm all for it.

src/transformers/modeling_utils.py

tests/test_modeling_fsmt.py

src/transformers/modeling_fsmt.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

stas00 · 2020-09-19T21:20:39Z

The expansion of embeddings may require a bit more care, but the comment below doesn't prevent merging this PR. You can just delete that logic later if it is bad.

This is how fairseq does it - this PR doesn't change the original behavior, so yes, it can be merged, as it fixes USE_CUDA=1 with torchscript situation. And further algorithmic changes will require a separate care. I moved the issue you raised to its own ticket: #7256

So let's continue discussing that suggestion there. Thank you for bringing it up, @sshleifer!

sshleifer · 2020-09-19T21:22:30Z

Great, I will let the 2nd approving reviewer merge.

stas00 · 2020-09-19T22:01:11Z

Oh, your comment made me discover a bug in my porting - somehow I used vocab sizes as the number of positional embeddings - so it's not surprising the weights were 250MB - fixed now.

I rechecked - fairseq inits them to the config.max_position_embeddings + self.padding_idx + 1 - so we have 2 extras there.

Or should I sync it with bart and just have config.max_position_embeddings, so it's consistent across transformers?

context:

        self.embed_positions = SinusoidalPositionalEmbedding(
            config.max_position_embeddings + self.padding_idx + 1, embed_dim, self.padding_idx
        )

@sshleifer?

stas00 · 2020-09-19T23:02:00Z

I split off the key naming discussion to #7258 - this is again not a show-stopper for this PR, as it impacts modeling_bart.py too.

sgugger

This is the most elegant solution to the problem we can have, short of PyTorch supporting a new kind of weights. Thanks a lot for your work on this @stas00 !

src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

LysandreJik

Very nice change, would have saved us a lot of pain with the addition of position_ids in several models. We should add these keys (the position IDs buffers) to the keys to never save in my opinion.

LysandreJik · 2020-09-21T08:37:32Z

Or, as mentioned here: #6700 (comment), change the persistent attribute to False for the registered buffers.

stas00 · 2020-09-21T08:40:22Z

Or, as mentioned here: #6700 (comment), change the persistent attribute to False for the registered buffers.

It won't work at the moment, since

this functionality was added just a few months ago (can't require recent torch)
it doesn't yet work with torchscript

LysandreJik · 2020-09-21T08:49:05Z

Indeed, thanks for clarifying!

stas00 added 9 commits September 17, 2020 15:28

fix USE_CUDA, add pipeline

9c7c36d

Merge remote-tracking branch 'origin/master' into fsmt5

8e404b7

USE_CUDA fix

594b446

Merge remote-tracking branch 'origin/master' into fsmt5

c816e98

recode SinusoidalPositionalEmbedding into nn.Embedding subclass

2e64858

was needed for torchscript to work - this is now part of the state_dict, so will have to remove these keys during save_pretrained

Merge remote-tracking branch 'origin/master' into fsmt5

fc94ca3

back out (ci debug)

eb11d55

restore

fe8deee

slow last?

ce85cf8

facilitate not saving certain keys and test

35c84b4

stas00 mentioned this pull request Sep 19, 2020

a possible hack for FSMT's SinusoidalPositionalEmbedding peculiarity #7229

Closed

stas00 added 2 commits September 19, 2020 12:03

remove no longer used keys

dbc7c42

style

0a0b7a1

stas00 changed the title ~~[wip] fsmt tests improvements~~ [fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes Sep 19, 2020

stas00 changed the title ~~[fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes~~ [fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test Sep 19, 2020

stas00 added 2 commits September 19, 2020 13:07

fix logging import

ec9fb86

cleanup

651ca19

sshleifer reviewed Sep 19, 2020

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

tests/test_modeling_fsmt.py Show resolved Hide resolved

src/transformers/modeling_fsmt.py Outdated Show resolved Hide resolved

sshleifer approved these changes Sep 19, 2020

View reviewed changes

sshleifer requested review from LysandreJik and sgugger September 19, 2020 21:00

Update src/transformers/modeling_utils.py

89a4bc9

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

stas00 mentioned this pull request Sep 19, 2020

[fsmt] Expanding Positional Embeddings #7256

Closed

fix bug in max_positional_embeddings

bb46e30

rename keys to keys_to_never_save per suggestion, improve the setup

a830f85

stas00 mentioned this pull request Sep 19, 2020

[save/load model] authorized keys, no save keys, etc. #7258

Closed

stas00 mentioned this pull request Sep 20, 2020

[s2s] adjust finetune + test to work with fsmt #7263

Merged

sgugger approved these changes Sep 20, 2020

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

Update src/transformers/modeling_utils.py

1b80514

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

LysandreJik approved these changes Sep 21, 2020

View reviewed changes

stas00 mentioned this pull request Sep 21, 2020

Some weights of AlbertModel were not initialized ['albert.embeddings.position_ids'] #6700

Closed

LysandreJik merged commit 8ff88d2 into huggingface:master Sep 21, 2020

stas00 deleted the fsmt5 branch September 21, 2020 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test #7224

[fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test #7224

stas00 commented Sep 17, 2020 •

edited

Loading

codecov bot commented Sep 19, 2020 •

edited

Loading

sshleifer left a comment

stas00 commented Sep 19, 2020 •

edited

Loading

sshleifer commented Sep 19, 2020

stas00 commented Sep 19, 2020 •

edited

Loading

stas00 commented Sep 19, 2020 •

edited

Loading

sgugger left a comment

LysandreJik left a comment

LysandreJik commented Sep 21, 2020

stas00 commented Sep 21, 2020 •

edited

Loading

LysandreJik commented Sep 21, 2020

[fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test #7224

[fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test #7224

Conversation

stas00 commented Sep 17, 2020 • edited Loading

codecov bot commented Sep 19, 2020 • edited Loading

Codecov Report

sshleifer left a comment

Choose a reason for hiding this comment

Expanding Positional Embeddings

stas00 commented Sep 19, 2020 • edited Loading

sshleifer commented Sep 19, 2020

stas00 commented Sep 19, 2020 • edited Loading

stas00 commented Sep 19, 2020 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik commented Sep 21, 2020

stas00 commented Sep 21, 2020 • edited Loading

LysandreJik commented Sep 21, 2020

stas00 commented Sep 17, 2020 •

edited

Loading

codecov bot commented Sep 19, 2020 •

edited

Loading

stas00 commented Sep 19, 2020 •

edited

Loading

stas00 commented Sep 19, 2020 •

edited

Loading

stas00 commented Sep 19, 2020 •

edited

Loading

stas00 commented Sep 21, 2020 •

edited

Loading