-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The instantiation of Multi-head PA and the design choice of MAM adapter. #18
Comments
Thanks for your interest! For your questions:
|
Thanks for your reply! But I am still a bit confused about question 1. For PA, we have parameters of size Correct me if I am wrong in the calculation. Thanks! |
Hi, sorry to getting back so late! (I am kinda in a post-graduation vacation mode recently......) Back to your questions, in our implementation the input |
Thanks for your great work!
I have read your paper, but I am a bit confused about two things.
(1) The instantiation of Multi-head PA. How can we instantiate Multi-head PA (r=30) to make it have the same quantity of tuned parameters as PA (attn, r=30) according to Table 4 in the main paper? My initial thought is that Multi-head PA's tuned parameters will be N_h times those of PA.
(2) The design choice of MAM adapter. According to my understanding, MH PA (attn, r = 30) is slightly better than prefix tuning (l = 30) based on the result in Table 4 (35.3>35.2), and according to previous papers like LoRA, prefix tuning is not stable to optimize. However, MAM adopts prefix tuning. Is there a specific reason for this?
Would you mind giving me any clues about these two questions?
The text was updated successfully, but these errors were encountered: