-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About key_padding_mask in multihead self attention #36
Comments
Thx for the question, I could hardly recall the exact details in short time, well, generally: As padding item is 0, corresponding to all 0 embeddings as initialized in https://github.com/pmixer/SASRec.pytorch/blob/master/model.py#L36, attending to 0 would not effect the output Also, for pytorch api, there's the statement Pls feel free to uncomment the line to check how would key mask affect model training and inference. |
Thanks for your prompt reply! Oh ok I see, I will try! After careful consideration, I think, conceptuallly, that even letting all paddings to 0 (what the code does now) will still influence the attention mechanism, since before attention softmax, the original attention score could be negative, 0 or positive. Therefore zero does not mean sth special for softmax function (should be -inf). Thanks again! Yuchen |
Thx, for attention score, yes, BTW, some of the lectures by Prof. Lee may help further clarifying these details about multi-head attention, pls consider checking https://www.youtube.com/@HungyiLeeNTU/search?query=attention if you are interested. |
Hi!
Thank you for your implementation!
I would like to know if there are particular reasons why https://github.com/pmixer/SASRec.pytorch/blob/master/model.py#L83 this line for key_padding_mask is commented? It seems that this mask is necessary to prevent from attending to paddings?
Thanks again,
Sincerely
Yuchen
The text was updated successfully, but these errors were encountered: