-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the Duplex attention #10
Comments
Hi! :) So sorry for the large delay in my response, I hope to get back to you in about 2 days at most, will go then over all open issues! |
Hi, so sorry for the delay in my response! Several people indeed indicated that the notation regarding the key-value description in the paper is a bit confusing, and I plan to upload a new version of the paper with that aspect fixed by tomorrow.
Hope it helps, and let me know if you have any further questions! :) |
Thanks a lot for your detailed reply! Now I understand the core idea of the duplex attention part. Thank you! :) |
So the explicit form of duplex attention is: K = Attention( K, X, X ) # or LayerNorm(K+Attention( K, X, X )) Am I right? |
Hey, i wanted to ask, so in the paper u say that you compute K = a(Y,X) and then X =u_d(X,Y), so if i get this right, V is never updated? Thanks |
Thanks for this issue and answer, I could understand how does duplex attention work. However, according to your answer, there is still a typo in the new paper version (v3): |
Hi all! |
Hi, Thanks for sharing the code!
I have a few questions about Section 3.1.2. Duplex attention.
I am confused by the notation in the section. For example, in this section, "
Y=(K^{P\times d}, V^{P\times d})
, where the values store the content of the Y variables (e.g. the randomly sampled latents for the case of GAN)". Does it mean that V^{P\times d} is sampled from the original variable Y? how to set the number of P in your code?"keys track the centroids of the attention-based assignments from X to Y, which can be computed as
K=a_b(Y, X)
", does it mean K is calculated by using the self-attention module but with (Y, X) as input? If so, how to understand “the keys track the centroid of the attention-based assignments from X to Y”? BTW, how to get the centroids?For the update rule in duplex attention, what does the
a()
function mean? Does it denote a self-attention module likea_b()
in Section 3.1.1, where X as query, K as keys, and V as values, if so, K is calculated from another self-attention module as mentioned in question 2, so the output ofa_b(Y, X)
will be treated as Keys, so the update rule contains two self-attention operations? is that right? Does it mean ’Duplex‘ attention?But finally I find I may be wrong when I read the last paragraph in this section. As mentioned in this section, "to support bidirectional interaction between elements, we can chain two reciprocal simplex attentions from X to Y and from Y to X, obtaining the duplex attention" So, does it mean, first, we calculate the Y by using a simplex attention module
u^a(Y, X)
, and then use this Y as input ofu^d(X, Y)
to update X? Does it mean the duplex attention module contains three self-attention operations?Thanks a lot! :)
The text was updated successfully, but these errors were encountered: