Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning function in T5Attention doesnt affect _relative_position_bucket #17886

Closed
1 of 4 tasks
hadaev8 opened this issue Jun 27, 2022 · 9 comments · Fixed by #17968
Closed
1 of 4 tasks

Pruning function in T5Attention doesnt affect _relative_position_bucket #17886

hadaev8 opened this issue Jun 27, 2022 · 9 comments · Fixed by #17968
Labels

Comments

@hadaev8
Copy link
Contributor

hadaev8 commented Jun 27, 2022

Who can help?

@patrickvonplaten

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Run pruning function in t5 model, then run inference.

Expected behavior

Relative position head should be pruned too.

Here it is
https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L355

@hadaev8 hadaev8 added the bug label Jun 27, 2022
@ydshieh
Copy link
Collaborator

ydshieh commented Jun 27, 2022

@hadaev8

It is not clear to me about this.

_relative_position_bucket is a staticmethod without using any model weight in it, and IMO there is no need to do anything when pruning a model.

cc @patrickvonplaten

@hadaev8
Copy link
Contributor Author

hadaev8 commented Jun 27, 2022

@ydshieh

Relative position bias have shape (dim, heads).
For example I have 6 heads and pruned one, would be mismatch, (dim, 5) + (dim, 6)

Here this line

I realized all layers use same positional bias, so it should be masked in forward, not pruned.

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 27, 2022

After looking the 2 blocks below, I think there is indeed a shape issue when we prune the heads.

Would you like to try to make a minimal code snippet that could confirm the issue, @hadaev8?

values = self.relative_attention_bias(relative_position_bucket) # shape (query_length, key_length, num_heads)

self.relative_attention_bias = nn.Embedding(self.relative_attention_num_buckets, self.n_heads)

@hadaev8
Copy link
Contributor Author

hadaev8 commented Jun 27, 2022

@ydshieh
Here it is
https://colab.research.google.com/drive/1HYu-yzmmbumbskGZExXlOP0WFmDYdgAp?usp=sharing

I fixed rel pos bias, but where is some other error

@patrickvonplaten
Copy link
Contributor

Hey @hadaev8,

This is quite an edge case and I don't think it'll be to find an easy fix here because usually one only prunes some heads of some layers (not of all layers), where as the same position_bias is applied to all layers. So pruning some heads of only some layers will necessarily lead to problems here. The solution I see it to dynamically discard the superfluous dimensions of relative_attention_biasat every attention layer if the corresponding head has been discarded. @hadaev8 would you be interested in opening a PR for this? I won't have the time to dive deeper here for this sadly in the near future, but more than happy to review!

@hadaev8
Copy link
Contributor Author

hadaev8 commented Jun 28, 2022

@patrickvonplaten
My fix looks like this and seems to work, but I'm not satisfied with it, idk if it worth adding to codebase.

        if self.pruned_heads:
            mask = torch.ones(position_bias.shape[1])
            mask[list(self.pruned_heads)] = 0
            position_bias_masked = position_bias[:,mask.bool()]
        else:
            position_bias_masked = position_bias

        scores += position_bias_masked

@patrickvonplaten
Copy link
Contributor

Hey @hadaev8,

That's actually quite a smart fix :-) Think I'd be ok with adding this! Do you want to open a PR for it ? :-)

@hadaev8
Copy link
Contributor Author

hadaev8 commented Jun 29, 2022

@patrickvonplaten
Okay, if you think its ok, i will do pr tomorrow.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants