Fix TF Funnel #9300

jplu · 2020-12-24T15:53:53Z

What does this PR do?

This PR fixes Funnel to make it full graph compliant. Even though all the slow/quick tests are passing and got similar results with few experiements, @sgugger I would appreciate that you thoroughly look at the changes in order to be sure no bugs have been introduced.

patrickvonplaten · 2020-12-26T11:29:44Z

src/transformers/models/funnel/modeling_tf_funnel.py

-                if block_index == 0:
-                    position_embeds_pooling = None
-                else:
+                position_embeds_pooling = None


position_embeds_pooling seems to be only used at line 288. IMO it makes more sense to have a single if-else statement further below (at line 287):

if block_index != 0: position_embeds_pooling = tf.gather(pos_embed, rel_pos, axis=0) else: position_embeds_pooling = tf.fill(shape_list(position_embeds_no_pooling), value=-1.0)

patrickvonplaten · 2020-12-26T11:32:46Z

src/transformers/models/funnel/modeling_tf_funnel.py

@@ -652,10 +661,11 @@ def call(
        for block_index, block in enumerate(self.blocks):
            pooling_flag = shape_list(hidden)[1] > (2 if self.separate_cls else 1)
            pooling_flag = pooling_flag and block_index > 0
+            pooled_hidden = self.attention_structure.pool_tensor(hidden, mode=self.attention_structure.pooling_type)


this seems to slightly change the logic here - is this OK?

maybe a comment here would be great

The problem was coming from shift = 2 if q_head.shape[1] != context_len else 1. Here, shift in graph mode becomes an undefined tensor because it can be either 1 or 2. Which makes the line r = position_embeds[self.block_index][shift - 1] impossible to be compiled because in a graph, TF cannot mix tensors and numbers, here (shift -1)

If the method self.attention_structure.pre_attention_pooling is not changed as suggested above, this line should be removed (not sure how it's linked to the code you mention which is in self.attention_structure.post_attention_pooling).

Otherwise, the line should be inside the test if pooling_flag otherwise it breaks the current behavior.

patrickvonplaten

Think this looks ok for me! We should wait for @sgugger feedback here though....

sgugger

Thanks for fixing! There is one change of behavior we should revert (last of my comments), left a few other comments.

sgugger · 2021-01-04T14:41:03Z

src/transformers/models/funnel/modeling_tf_funnel.py

        attention_inputs = (position_embeds, token_type_mat, attention_mask, cls_mask)
-        return output, attention_inputs
+        return attention_inputs


Why change this function input and return? The tuple always has the same length so there is no reason for it to be incompatible with graph mode, no? The output can be pooled outside of the test to avoid the repetition of the same line of code, but otherwise, I'd prefer to keep the same logic as the PT implementation.

I understand but this is not graph compliant. The reason is because pooled_hidden is not defined outside the if pooling_flag line 655. So to fix this I had to export the pool_tensor call outside the if.

If pooled_hidden cannot be set outside if has to be deleted. Basically, either it has a value in both branches (same shape + same dtype) either it should not be set anywhere. Any idea what could be a "dummy" value?

It's not used if pooling_flag is not used, so a tensor with a 0 can be a good dummy value (if it needs the same shape, a tensor of the right shape).

Perfect! Doing the update.

I don't succeed to guess the shape of pooled_hidden without calling pool_tensor. Any idea how to do this?

pooled_hidden = tf.zeros(shape_list(hidden)) seems to work perfectly fine 👍

Yes, it should have the same shape as the hidden state if no pooling is done (so on the else side).

sgugger · 2021-01-04T14:42:58Z

src/transformers/models/funnel/modeling_tf_funnel.py

            # Notations from the paper, appending A.2.1, final formula (https://arxiv.org/abs/2006.03236)
            # Grab the proper positional encoding, shape max_rel_len x d_model
-            r = position_embeds[self.block_index][shift - 1]
+            # shift = 2 if shape_list(q_head)[1] != context_len else 1


Let's clean the comment if it's not needed.

sgugger · 2021-01-04T14:43:18Z

src/transformers/models/funnel/modeling_tf_funnel.py

+            if shape_list(q_head)[1] != context_len:
+                shift = 2
+                r = position_embeds[self.block_index][1]
+            else:
+                shift = 1
+                r = position_embeds[self.block_index][0]


Looks like the shift variable isn't used after? In this case, we can just remove it.

It is used after see line 508.

sgugger · 2021-01-04T14:44:30Z

src/transformers/models/funnel/modeling_tf_funnel.py

@@ -652,10 +661,11 @@ def call(
        for block_index, block in enumerate(self.blocks):
            pooling_flag = shape_list(hidden)[1] > (2 if self.separate_cls else 1)
            pooling_flag = pooling_flag and block_index > 0
+            pooled_hidden = self.attention_structure.pool_tensor(hidden, mode=self.attention_structure.pooling_type)


If the method self.attention_structure.pre_attention_pooling is not changed as suggested above, this line should be removed (not sure how it's linked to the code you mention which is in self.attention_structure.post_attention_pooling).

Otherwise, the line should be inside the test if pooling_flag otherwise it breaks the current behavior.

jplu · 2021-01-04T17:02:33Z

@LysandreJik feel free to merge if it looks ok for you and if @sgugger approves the last fix on pooled_hidden.

sgugger

Looks perfect now, thanks!

LysandreJik

LGTM!

Fix Funnel

39f49a3

jplu requested review from sgugger, LysandreJik and patrickvonplaten December 24, 2020 16:00

patrickvonplaten reviewed Dec 26, 2020

View reviewed changes

patrickvonplaten approved these changes Dec 26, 2020

View reviewed changes

Apply Patrick's comment

8a1a5c0

sgugger approved these changes Jan 4, 2021

View reviewed changes

jplu added 3 commits January 4, 2021 16:25

Remove comment

e682b34

Fix dummy value

78d7e1e

Apply style

de9f43c

sgugger approved these changes Jan 4, 2021

View reviewed changes

LysandreJik approved these changes Jan 5, 2021

View reviewed changes

LysandreJik merged commit 52d62e6 into huggingface:master Jan 5, 2021

jplu deleted the fix-tf-funnel branch January 5, 2021 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TF Funnel #9300

Fix TF Funnel #9300

jplu commented Dec 24, 2020

patrickvonplaten Dec 26, 2020

patrickvonplaten Dec 26, 2020

patrickvonplaten Dec 26, 2020

jplu Dec 28, 2020

sgugger Jan 4, 2021

patrickvonplaten left a comment

sgugger left a comment

sgugger Jan 4, 2021

jplu Jan 4, 2021

jplu Jan 4, 2021

sgugger Jan 4, 2021

jplu Jan 4, 2021

jplu Jan 4, 2021

jplu Jan 4, 2021

sgugger Jan 4, 2021

sgugger Jan 4, 2021

sgugger Jan 4, 2021

jplu Jan 4, 2021

sgugger Jan 4, 2021

jplu commented Jan 4, 2021

sgugger left a comment

LysandreJik left a comment

Fix TF Funnel #9300

Fix TF Funnel #9300

Conversation

jplu commented Dec 24, 2020

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jplu commented Jan 4, 2021

sgugger left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment