-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions on the attention mask, and whether to accept the last element of guess_results when all guess_tokens are accepted #32
Comments
yes, but orange 1-3 is another 3-gram before this 4-gram, so they should also be taken by yellow7 so that yellow7 attends to a complete sentence.
I think this possibly works as well. But conceptually, the guess_results are just used to verify the guess_tokens. The tokens to be accepted should be chosen from "verified guess tokens", not the "tokens used for verification" |
But I do not see how orange 1-4 should have any connections at all. They are part of different 4-grams, and orange 1-3 is not a 3-gram (N-gram are those across the different colours) if I am not mistaken. When I inspect an example, I see tokens in orange 1-4 do not form a coherent phrase. |
Thank you very much for the response. I still have trouble understand why orange 1-4 should have connections. I guess this is because we use causal mask in the first context decoding step (where a conventional triangular mask is used so that orange tokens can see their preceding tokens), is this the reason? If so, why not do the same for green 1-5 and red 1-5 so that they can also see their preceding tokens (so six lower triangles in mask as opposed to the current three under the orange tokens)? |
orange 1-4 are "guessed" to be 4-grams, but the "collected 4-gram" are indeed generated in an autoregressive pattern. And if you
there would be no autoregressive pattern within the "guess decoding" process, and the probability of "n-gram-guess-is-right" might decrease. (ps. just a personal guess here:-)) |
I think I can slowly grasp what is happening here now. We need that blue 0 and orange 1-4 to build connections so that their corresponding 4-grams (five collected 4-grams in this case) are all relevant to the prompt context. Otherwise, some of the 4-grams are useless as they almost have no connections to the prompt context (even though they are coherent 4-grams). Thus, connecting blue 0 and orange 1-4 in an auto-regressive manner can lead to better acceptance rate, since they are the first tokens to any collected 4-grams. I guess this is the reason why we want the blue 0 and orange 1-4 to form a sentence :) |
It was mentioned in #14 that yellow 7 can see orange1-4, green 5 and red 6. However, as I have thought it was orange 4, green 5, red 6 and yellow 7 that form a 4-gram, and orange 1-3 is irrelevant here so they should be masked, or am I misunderstanding something?
On a different question, if all guess_tokens matches guess_results[0:-1], then should the Lookahead step also accept the last element guess_results[-1]? (since this is a complete sentence)
Many thanks for the help
The text was updated successfully, but these errors were encountered: