Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add LoRA multihead attention module #1324
[WIP] Add LoRA multihead attention module #1324
Changes from 10 commits
49fab86
d8e9589
0e188a3
b409d81
173062c
1e007f5
557c4a1
add1f51
e44e030
8d62579
c5d8a6b
9dc4a4d
c3fb2ce
17d407b
4cbf6e9
e0cae11
52c8d9b
977c84b
96d376d
0c17476
4b8db0c
7e91712
e12070b
7b6c7cb
e6ab8ed
8ec6c3c
f6ba465
fb18886
4ff2ec3
d1f6ab2
65363be
7ba2e68
6ef04b0
07c7240
cc3ac3d
03c466f
e558caa
38f4a98
7e5c61d
183bf52
b970607
732e8e7
79e2b38
61e6934
ced2f15
4c31bbc
1dbb9a5
e094234
e90af48
30a08e7
09f5ea6
6a83bd7
3b0471a
465a85e
266f9da
39e755e
4857858
74cbba6
14deb9f
ba2a8dd
ac10b18
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this throw an exception? AFAICS we're assigning a tensor to a parameter value:
What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that we change the type here, I guess you could consider this part of the hack to make this work. At the end, through
_restore_weights
, the correct type is restored.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yes. I missed the
del
statement which unregisters the parameter and, thus, removes thesetattr
constraint. WDYT about something along the lines ofThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need also to over-write the
modules()
method?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, as
modules
callsnamed_modules
under the hood. I added a comment to that effect.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this has been removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to put this into the description of the PR.
These lines are obsolete for some time now. They only apply when we unload the model (otherwise, the
if
does not match). Remember when we made thebase_layer
switch, we ensured that when unloading, we simply return thebase_layer
, no more need to create a new layer (say, a newnn.Linear
when usinglora.Linear
) and replace the new layer'sweight
by the parent layer'sweight
. Thebase_layer
already has the originalweight
. Therefore, these lines are unnecessary.I removed them now because they were annoying with
MultiheadAttention
, because that layer has noweight
attribute, so this line would fail.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case we support this is never not
None
, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean
getattr(child, "q_proj_weight", None) is not None
can never evaluate toFalse
, thus theelse
clause below is not needed? I think it would be good to have that fallback, in case we do miss something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I meant that
q_proj_weight
is always None in our case. (_qkv_same_embed_dim = True
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, sorry, you're right. This is there in case we add support for the other mode in the future.