Fix TF Longformer #9348

jplu · 2020-12-29T19:44:08Z

What does this PR do?

This PR aims to fix the TF Longformer version in order to make it graph compliant. As seen offline with @patrickvonplaten all_global_attentions now is added in the output when output_attentions=True. The global attentions are filled with zeros in case is_global_attn is False (see line 897 in TFLongformerSelfAttention.

Fix issue

#9333

patrickvonplaten · 2020-12-30T09:18:08Z

src/transformers/models/longformer/modeling_tf_longformer.py

-            if input_ids is not None:
-                input_ids = tf.pad(input_ids, paddings, constant_values=pad_token_id)
+        if input_ids is not None:
+            input_ids = tf.pad(input_ids, paddings, constant_values=pad_token_id)


When padding_len==0, then this won't change the input_ids correct?

patrickvonplaten · 2020-12-30T09:20:18Z

src/transformers/models/longformer/modeling_tf_longformer.py

@@ -2171,16 +2171,14 @@ def call(

        # set global attention on question tokens
        if inputs["global_attention_mask"] is None and inputs["input_ids"] is not None:
-            if inputs["input_ids"] is None:
-                logger.warning(
-                    "It is not possible to automatically generate the `global_attention_mask`. Please make sure that it is correctly set."


can we leave this warning?

We can but here the problem is that we test if inputs["input_ids"] is None inside a if that already test if inputs["input_ids"] is not None, this seems strange

Agreed with @jplu here ;-)

patrickvonplaten · 2020-12-30T09:22:41Z

src/transformers/models/longformer/modeling_tf_longformer.py

            ):
                logger.warning(
                    f"There should be exactly three separator tokens: {self.config.sep_token_id} in every sample for questions answering. You might also consider to set `global_attention_mask` manually in the forward function to avoid this. This is most likely an error."
                )
+                inputs["global_attention_mask"] = tf.fill(shape_list(inputs["input_ids"]), value=1)


this doesn't look correct to me. The "default" global_attention_mask is all 0s so I think it should be:

inputs["global_attention_mask"] = tf.fill(shape_list(inputs["input_ids"]), value=0)

Also we could improve the warning a bit by appending a sentence like. Disabling global attention for this forward pass...

patrickvonplaten · 2020-12-30T09:23:12Z

src/transformers/models/longformer/modeling_tf_longformer.py

@@ -1523,8 +1523,7 @@ def call(
        training=False,
    ):
        all_hidden_states = () if output_hidden_states else None
-        all_attentions = () if output_attentions else None
-        all_global_attentions = () if (output_attentions and is_global_attn) else None
+        all_attentions = all_global_attentions = () if output_attentions else None


patrickvonplaten

I like the general direction of this PR! Here we should also run the slow tests to be sure nothing is broken.

IMO, the only thing left to do is to correct the "default" global attention to all 0's instead of 1's (global_attention_mask is different from attention_mask)

jplu · 2020-12-30T09:29:39Z

I have already ran the slow tests as well and they all pass!

sgugger

LGTM, thanks for fixing!

sgugger · 2021-01-04T15:17:50Z

src/transformers/models/longformer/modeling_tf_longformer.py

@@ -2171,16 +2171,14 @@ def call(

        # set global attention on question tokens
        if inputs["global_attention_mask"] is None and inputs["input_ids"] is not None:
-            if inputs["input_ids"] is None:
-                logger.warning(
-                    "It is not possible to automatically generate the `global_attention_mask`. Please make sure that it is correctly set."


Agreed with @jplu here ;-)

LysandreJik

LGTM, thanks @jplu!

* Fix longformer * Apply style * Remove serving content * Forgot a condition * Apply style * Address Patrick's comments * Fix dtype

jplu added 4 commits December 29, 2020 20:31

Fix longformer

56e3088

Apply style

0cf3fef

Remove serving content

86bc881

Forgot a condition

b63a954

jplu requested review from patrickvonplaten, LysandreJik and sgugger December 29, 2020 20:08

Apply style

992eaf6

patrickvonplaten reviewed Dec 30, 2020

View reviewed changes

Address Patrick's comments

1870dd1

patrickvonplaten approved these changes Dec 30, 2020

View reviewed changes

Fix dtype

dbeff02

sgugger approved these changes Jan 4, 2021

View reviewed changes

LysandreJik approved these changes Jan 5, 2021

View reviewed changes

LysandreJik merged commit 83eec97 into huggingface:master Jan 5, 2021

jplu deleted the fix-tf-longformer branch January 5, 2021 09:37

guyrosin pushed a commit to guyrosin/transformers that referenced this pull request Jan 15, 2021

Fix TF Longformer (huggingface#9348)

3623dc9

* Fix longformer * Apply style * Remove serving content * Forgot a condition * Apply style * Address Patrick's comments * Fix dtype

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TF Longformer #9348

Fix TF Longformer #9348

jplu commented Dec 29, 2020

patrickvonplaten Dec 30, 2020

jplu Dec 30, 2020

patrickvonplaten Dec 30, 2020

jplu Dec 30, 2020

sgugger Jan 4, 2021

patrickvonplaten Dec 30, 2020

patrickvonplaten Dec 30, 2020

patrickvonplaten left a comment

jplu commented Dec 30, 2020

sgugger left a comment

sgugger Jan 4, 2021

LysandreJik left a comment

Fix TF Longformer #9348

Fix TF Longformer #9348

Conversation

jplu commented Dec 29, 2020

What does this PR do?

Fix issue

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

jplu commented Dec 30, 2020

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment