Use random_attention_mask for TF tests #16517

ydshieh · 2022-03-31T14:08:39Z

What does this PR do?

Change TF's random_attention_mask to match its PT/Flax equivalence.
Use random_attention_mask defined in test_modeling_tf_common.py to generate attention mask in TF tests.
- so TF code has the same logic as in PT/Flax tests (regarding this attention mask part in tests)
- avoid large difference between PT/TF outputs. (In particular, TFGPT2EncoderDecoderModelTest in here)
  - In the case of TFBERTEncoderDecoderModelTest or TFGPT2EncoderDecoderModelTest, it is caused by some sequence in a batch which gets all 0s as attention mask (generated by ids_tensor) - may happens on both encoder and decoder (especially after combining with the causal mask).

More context

Currently, most of TF tests still uses

input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2)

while in PT/Flax tests, they call

input_mask = random_attention_mask([self.batch_size, self.seq_length])

(defined in the comment test file).

In particular, random_attention_mask has

    # make sure that at least one token is attended to for each batch
    attn_mask[:, -1] = 1

HuggingFaceDocBuilderDev · 2022-03-31T14:21:25Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh · 2022-03-31T17:59:14Z

tests/clip/test_modeling_tf_clip.py

+            # make sure the first token has attention mask `1` to ensure that, after combining the causal mask, there
+            # is still at least one token being attended to for each batch.
+            # TODO: Change `random_attention_mask` in PT/TF/Flax common test file, after a discussion with the team.
+            input_mask = tf.concat(


This is added to make TF CLIP pass.
(as TF's random_attention_mask is changed too in this PR)

ydshieh · 2022-03-31T17:59:59Z

tests/test_modeling_tf_common.py

@@ -1440,7 +1440,7 @@ def ids_tensor(shape, vocab_size, rng=None, name=None, dtype=None):
 def random_attention_mask(shape, rng=None, name=None, dtype=None):
    attn_mask = ids_tensor(shape, vocab_size=2, rng=None, name=None, dtype=dtype)
    # make sure that at least one token is attended to for each batch
-    attn_mask = tf.concat([tf.constant(value=1, shape=(shape[0], 1), dtype=dtype), attn_mask[:, 1:]], axis=1)
+    attn_mask = tf.concat([attn_mask[:, :-1], tf.ones_like(attn_mask[:, -1:], dtype=dtype)], axis=-1)


This is changed to match PT/Flax's random_attention_mask.

sgugger

Thanks for making this more consistent with the rest of the library!

gante

Interesting. It moves the column of 1s for the start to the end and now becomes like a left-padded input. It could help with GPT-2, indeed

ydshieh · 2022-04-01T09:27:41Z

Interesting. It moves the column of 1s for the start to the end and now becomes like a left-padded input. It could help with GPT-2, indeed

(If you are interested to know a bit more the detail, @gante )

Actually, moving 1 to the end will cause problem (when a model uses causal mask.). This is why I needed to update the code in TFCLIPModelTest.

In general, current library has a bit issue when the final attention mask (after combining the causal mask if any) received by the attention layer has a sequence (in the batch) having all 0s as mask. One thing (but maybe not only) involved is the different values (-1e4, -1e9, -1e30, -inf) used.

Put 1 at the start will avoid this situation (when combining the causal mask).
(But I don't want to change the PT/Flax logic in this PR. This should be addressed in a separate PR after discussion.)

Regarding the tests like TFGPT2EncoderDecoderModelTest, this PR only helps partially (the encoder part). The decoder part needs extra logic for now (to address the above situation regarding the causal mask)

Rocketknight1

Writing as I go to make sure I follow:

TF tests used to call ids_tensor with a vocabulary of 2 to generate a random attention mask
They now call random_attention_mask, which also generates a tensor containing 0 and 1 only, but guarantees that at least one token will have a value of 1.
This matches the behaviour in the rest of the library, guarantees we will never get a fully-masked input, and slightly increases the expected number of unmasked tokens in each input.

Seems like a great change for both test reliability and consistency with the rest of the library!

ydshieh · 2022-04-01T14:47:21Z

Hi, @Rocketknight1,

Yes, all the points are right -- except

I am not sure about this statement and slightly increases the expected number of unmasked tokens in each input.: I would say not this case, but I might misunderstand the sentence.
but guarantees that at least one token will have a value of 1:
- Yes, but not guarantee the same thing for the final attention mask used by attention layers to compute the softmax - because the final mask might be the one after combining the causal mask (for decoder models).
- Some more future PRs to improve these kinds of things.

Rocketknight1 · 2022-04-01T14:54:43Z

That makes sense! And my comment about "increases the expected number of unmasked tokens" was just an irrelevant observation - the average number of unmasked tokens is very slightly larger since we guarantee that one of them will have value 1. Ignore me!

* use random_attention_mask for TF tests * Fix for TFCLIP test (for now). Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 📝 add image/vision classification and asr * 🖍 minor formatting fixes * Fixed a typo in legacy seq2seq_trainer.py (#16531) * Add ONNX export for BeiT (#16498) * Add beit onnx conversion support * Updated docs * Added cross reference to ViT ONNX config * call on_train_end when trial is pruned (#16536) * Type hints added (#16529) * Fix Bart type hints (#16297) * Add type hints to PLBart PyTorch * Remove pending merge conflicts * Fix PLBart Type Hints * Add changes from review * Add VisualBert type hints (#16544) * Adding missing type hints for mBART model (PyTorch) (#16429) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent Co-authored-by: matt <rocketknight1@gmail.com> * Remove MBart subclass of XLMRoberta in tokenzier docs (#16546) * Remove MBart subclass of XLMRoberta in tokenzier * Fix style * Copy docs from MBart50 tokenizer * Use random_attention_mask for TF tests (#16517) * use random_attention_mask for TF tests * Fix for TFCLIP test (for now). Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Improve code example (#16450) Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home> * Pin tokenizers version <0.13 (#16539) * Pin tokenizers version <0.13 * Style * Add code samples for TF speech models (#16494) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * [FlaxSpeechEncoderDecoder] Fix dtype bug (#16581) * [FlaxSpeechEncoderDecoder] Fix dtype bug * more fixes * Making the impossible to connect error actually report the right URL. (#16446) * Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm (#16556) * Add utility to find model labels (#16526) * Add utility to find model labels * Use it in the Trainer * Update src/transformers/utils/generic.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Quality Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Enable doc in Spanish (#16518) * Reorganize doc for multilingual support * Fix style * Style * Toc trees * Adapt templates * Add use_auth to load_datasets for private datasets to PT and TF examples (#16521) * fix formatting and remove use_auth * Add use_auth_token to Flax examples * add a test checking the format of `convert_tokens_to_string`'s output (#16540) * add new tests * add comment to overridden tests * TF: Finalize `unpack_inputs`-related changes (#16499) * Add unpack_inputs to remaining models * removed kwargs to `call()` in TF models * fix TF T5 tests * [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output (#16586) * initialize the default rank set on TrainerState (#16530) * initialize the default rank set on TrainerState * fix style * Trigger doc build * Fix CI: test_inference_for_pretraining in ViTMAEModelTest (#16591) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * add a template to add missing tokenization test (#16553) * add a template to add missing tokenization test * add cookiecutter setting * improve doc * Update templates/adding_a_missing_tokenization_test/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * made _load_pretrained_model_low_mem static + bug fix (#16548) * handle torch_dtype in low cpu mem usage (#16580) * [Doctests] Correct filenaming (#16599) * [Doctests] Correct filenaming * improve quicktour * make style * Adding new train_step logic to make things less confusing for users (#15994) * Adding new train_step logic to make things less confusing for users * DO NOT ASK WHY WE NEED THAT SUBCLASS * Metrics now working, at least for single-output models with type annotations! * Updates and TODOs for the new train_step * Make fixup * Temporary test workaround until T5 has types * Temporary test workaround until T5 has types * I think this actually works! Needs a lot of tests though * MAke style/quality * Revert changes to T5 tests * Deleting the aforementioned unmentionable subclass * Deleting the aforementioned unmentionable subclass * Adding a Keras API test * Style fixes * Removing unneeded TODO and comments * Update test_step too * Stop trying to compute metrics with the dummy_loss, patch up test * Make style * make fixup * Docstring cleanup * make fixup * make fixup * Stop expanding 1D input tensors when using dummy loss * Adjust T5 test given the new compile() * make fixup * Skipping test for convnext * Removing old T5-specific Keras test now that we have a common one * make fixup * make fixup * Only skip convnext test on CPU * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Avoiding TF import issues * make fixup * Update compile() to support TF 2.3 * Skipping model.fit() on template classes for now * Skipping model.fit() on template class tests for now * Replace ad-hoc solution with find_labels * make fixup Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Adding missing type hints for BigBird model (#16555) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent * Type hints for BigBird * removing typos Co-authored-by: matt <rocketknight1@gmail.com> * [deepspeed] fix typo, adjust config name (#16597) * 🖍 apply feedback Co-authored-by: Cathy <815244047@qq.com> Co-authored-by: Jim Rohrer <jrohrer1@gmail.com> Co-authored-by: Ferdinand Schlatt <fschlatt@gmail.com> Co-authored-by: Dahlbomii <101373053+Dahlbomii@users.noreply.github.com> Co-authored-by: Gunjan Chhablani <chhablani.gunjan@gmail.com> Co-authored-by: Rishav Chandra Varma <rishavchandra.v16@iiits.in> Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com> Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> Co-authored-by: Joao Gante <joao@huggingface.co> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Andres Codas <andrescodas@users.noreply.github.com> Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com> Co-authored-by: Francesco Saverio Zuppichini <francesco.zuppichini@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

use random_attention_mask for TF tests

37fac83

ydshieh added 2 commits March 31, 2022 19:03

Fix for TFCLIP test (for now).

2e04d19

fix style

b6323db

ydshieh changed the title ~~[WIP] use random_attention_mask for TF tests~~ Use random_attention_mask for TF tests Mar 31, 2022

ydshieh marked this pull request as ready for review March 31, 2022 17:24

ydshieh requested review from gante, Rocketknight1 and sgugger March 31, 2022 17:53

ydshieh commented Mar 31, 2022

View reviewed changes

sgugger approved these changes Mar 31, 2022

View reviewed changes

gante approved these changes Apr 1, 2022

View reviewed changes

Rocketknight1 approved these changes Apr 1, 2022

View reviewed changes

ydshieh merged commit 2199382 into huggingface:main Apr 1, 2022

ydshieh deleted the use_random_attention_mask_for_tf_tests branch April 1, 2022 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use random_attention_mask for TF tests #16517

Use random_attention_mask for TF tests #16517

ydshieh commented Mar 31, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 31, 2022 •

edited

Loading

ydshieh Mar 31, 2022

ydshieh Mar 31, 2022

sgugger left a comment

gante left a comment

ydshieh commented Apr 1, 2022 •

edited

Loading

Rocketknight1 left a comment

ydshieh commented Apr 1, 2022 •

edited

Loading

Rocketknight1 commented Apr 1, 2022

Use random_attention_mask for TF tests #16517

Use random_attention_mask for TF tests #16517

Conversation

ydshieh commented Mar 31, 2022 • edited Loading

What does this PR do?

More context

HuggingFaceDocBuilderDev commented Mar 31, 2022 • edited Loading

ydshieh Mar 31, 2022

Choose a reason for hiding this comment

ydshieh Mar 31, 2022

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

gante left a comment

Choose a reason for hiding this comment

ydshieh commented Apr 1, 2022 • edited Loading

Rocketknight1 left a comment

Choose a reason for hiding this comment

ydshieh commented Apr 1, 2022 • edited Loading

Rocketknight1 commented Apr 1, 2022

ydshieh commented Mar 31, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 31, 2022 •

edited

Loading

ydshieh commented Apr 1, 2022 •

edited

Loading

ydshieh commented Apr 1, 2022 •

edited

Loading