Pruning function in T5Attention doesnt affect _relative_position_bucket #17886

hadaev8 · 2022-06-27T00:36:26Z

Who can help?

@patrickvonplaten

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Run pruning function in t5 model, then run inference.

Expected behavior

Relative position head should be pruned too.

Here it is
https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L355

The text was updated successfully, but these errors were encountered:

ydshieh · 2022-06-27T08:04:47Z

@hadaev8

It is not clear to me about this.

_relative_position_bucket is a staticmethod without using any model weight in it, and IMO there is no need to do anything when pruning a model.

cc @patrickvonplaten

hadaev8 · 2022-06-27T10:08:52Z

@ydshieh

Relative position bias have shape (dim, heads).
For example I have 6 heads and pruned one, would be mismatch, (dim, 5) + (dim, 6)

Here this line

transformers/src/transformers/models/t5/modeling_t5.py

Line 529 in 3ccff0d

scores += position_bias

I realized all layers use same positional bias, so it should be masked in forward, not pruned.

ydshieh · 2022-06-27T10:33:16Z

After looking the 2 blocks below, I think there is indeed a shape issue when we prune the heads.

Would you like to try to make a minimal code snippet that could confirm the issue, @hadaev8?

transformers/src/transformers/models/t5/modeling_t5.py

Line 432 in 3ccff0d

    
           values = self.relative_attention_bias(relative_position_bucket)  # shape (query_length, key_length, num_heads)

transformers/src/transformers/models/t5/modeling_t5.py

Line 351 in 3ccff0d

    
           self.relative_attention_bias = nn.Embedding(self.relative_attention_num_buckets, self.n_heads)

hadaev8 · 2022-06-27T12:05:09Z

@ydshieh
Here it is
https://colab.research.google.com/drive/1HYu-yzmmbumbskGZExXlOP0WFmDYdgAp?usp=sharing

I fixed rel pos bias, but where is some other error

patrickvonplaten · 2022-06-27T20:14:56Z

Hey @hadaev8,

This is quite an edge case and I don't think it'll be to find an easy fix here because usually one only prunes some heads of some layers (not of all layers), where as the same position_bias is applied to all layers. So pruning some heads of only some layers will necessarily lead to problems here. The solution I see it to dynamically discard the superfluous dimensions of relative_attention_biasat every attention layer if the corresponding head has been discarded. @hadaev8 would you be interested in opening a PR for this? I won't have the time to dive deeper here for this sadly in the near future, but more than happy to review!

hadaev8 · 2022-06-28T20:50:48Z

@patrickvonplaten
My fix looks like this and seems to work, but I'm not satisfied with it, idk if it worth adding to codebase.

        if self.pruned_heads:
            mask = torch.ones(position_bias.shape[1])
            mask[list(self.pruned_heads)] = 0
            position_bias_masked = position_bias[:,mask.bool()]
        else:
            position_bias_masked = position_bias

        scores += position_bias_masked

patrickvonplaten · 2022-06-29T22:35:41Z

Hey @hadaev8,

That's actually quite a smart fix :-) Think I'd be ok with adding this! Do you want to open a PR for it ? :-)

hadaev8 · 2022-06-29T23:16:01Z

@patrickvonplaten
Okay, if you think its ok, i will do pr tomorrow.

github-actions · 2022-07-27T15:01:45Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

hadaev8 added the bug label Jun 27, 2022

hadaev8 mentioned this issue Jun 30, 2022

Mask t5 relative position bias then head pruned #17968

Merged

5 tasks

github-actions bot closed this as completed Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pruning function in T5Attention doesnt affect _relative_position_bucket #17886

Pruning function in T5Attention doesnt affect _relative_position_bucket #17886

hadaev8 commented Jun 27, 2022 •

edited

Loading

ydshieh commented Jun 27, 2022 •

edited

Loading

hadaev8 commented Jun 27, 2022

ydshieh commented Jun 27, 2022

hadaev8 commented Jun 27, 2022

patrickvonplaten commented Jun 27, 2022

hadaev8 commented Jun 28, 2022

patrickvonplaten commented Jun 29, 2022

hadaev8 commented Jun 29, 2022

github-actions bot commented Jul 27, 2022

Pruning function in T5Attention doesnt affect _relative_position_bucket #17886

Pruning function in T5Attention doesnt affect _relative_position_bucket #17886

Comments

hadaev8 commented Jun 27, 2022 • edited Loading

Who can help?

Information

Tasks

Reproduction

Expected behavior

ydshieh commented Jun 27, 2022 • edited Loading

hadaev8 commented Jun 27, 2022

ydshieh commented Jun 27, 2022

hadaev8 commented Jun 27, 2022

patrickvonplaten commented Jun 27, 2022

hadaev8 commented Jun 28, 2022

patrickvonplaten commented Jun 29, 2022

hadaev8 commented Jun 29, 2022

github-actions bot commented Jul 27, 2022

hadaev8 commented Jun 27, 2022 •

edited

Loading

ydshieh commented Jun 27, 2022 •

edited

Loading