-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pruning function in T5Attention doesnt affect _relative_position_bucket #17886
Comments
It is not clear to me about this.
|
Relative position bias have shape (dim, heads). Here this line
I realized all layers use same positional bias, so it should be masked in forward, not pruned. |
After looking the 2 blocks below, I think there is indeed a shape issue when we prune the heads. Would you like to try to make a minimal code snippet that could confirm the issue, @hadaev8?
|
@ydshieh I fixed rel pos bias, but where is some other error |
Hey @hadaev8, This is quite an edge case and I don't think it'll be to find an easy fix here because usually one only prunes some heads of some layers (not of all layers), where as the same |
@patrickvonplaten
|
Hey @hadaev8, That's actually quite a smart fix :-) Think I'd be ok with adding this! Do you want to open a PR for it ? :-) |
@patrickvonplaten |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Who can help?
@patrickvonplaten
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Run pruning function in t5 model, then run inference.
Expected behavior
Relative position head should be pruned too.
Here it is
https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L355
The text was updated successfully, but these errors were encountered: