-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about attention patterns #28
Comments
Hi, thanks for your interest! |
Thank you for your reply! I'm still a little bit confused.
|
Hi,
|
Hi @FrankCast1e , my idea is that the red 6 is generated by the sequence: some 3, orange 4, green 5. This makes a strong local relation if these 3,4,5 tokens can form an n-gram phase. In this turn, we can use orange 4, green 5, and red 6 to generate the next token to form another meaningful n-gram. If you use red 5 as the previous token of red 6, I think it does not make much sense as the red 6 has no relationship with red5, and it may not generate a meaningful n-gram. |
Thank you very much for your explanation! I totally get the idea right now. |
Hi!
In Figure 5 of the blog, it seems like tokens of the current iteration attend to tokens from previous iterations. For example, the token at position 6 in red attends to token at position 5 in green.
But in Jacobi decoding, is it supposed to attend to tokens from the current iteration? That is: the token at position 6 in red attends to token at position 5 in red.
The text was updated successfully, but these errors were encountered: