Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze manually #2

Open
G-JWLee opened this issue Mar 2, 2023 · 4 comments
Open

Freeze manually #2

G-JWLee opened this issue Mar 2, 2023 · 4 comments

Comments

@G-JWLee
Copy link

G-JWLee commented Mar 2, 2023

Hi, thank you for your great work.

I want to use yours for my experiment.

I wonder get_lora_params() would load parameters to optimizer, but if the model itself can compute gradient, wouldn't the model still compute gradient?

Would be freezing the model enough for using minlora without the get_lora_params?

Also, when merging lora to the model to have another lora module, should I have to set lora_A and lora_B requires_grad=False before merging?

Thank you.

@cccntu
Copy link
Owner

cccntu commented Mar 2, 2023

Hi, thanks!

I wonder get_lora_params() would load parameters to optimizer, but if the model itself can compute gradient, wouldn't the model still compute gradient?
Would be freezing the model enough for using minlora without the get_lora_params?

Probably yes, but you need to make sure you don't accidentally freeze the lora parameters.

Also, when merging lora to the model to have another lora module, should I have to set lora_A and lora_B requires_grad=False before merging?

Probably not. After merging, lora_A and lora_B will no longer exist.

@G-JWLee
Copy link
Author

G-JWLee commented Mar 2, 2023

Thank you for your kind reply.

However, in the example in https://github.com/cccntu/LoRAnanoGPT/blob/master/train.py, line 236, it uses DDP without 'find_unused_parameters=True' argument.
When I work on my own experiment on other setting with DDP, since backbone model has requires_grad=False, I get error message since backbone model parameters are not used for gradient computation when not specifying 'find_unused_parameters=True'.
Is there something that I missed? I believe this API works with DDP.

Thnak you!

@cccntu
Copy link
Owner

cccntu commented Mar 3, 2023

Honestly I don't know. Can you solve it by simply adding 'find_unused_parameters=True'?

I've only used it on one GPU.

Or does using get_lora_parameter solve this issue?

@justindachille
Copy link

It looks like this method is correct in the sense that it only updates the parameters you pass in to the optimizer, but Torch will still compute gradients for all weights, as requires_grad is still True, according to this thread:

https://discuss.pytorch.org/t/passing-a-subset-of-the-parameters-to-an-optimizer-equivalent-to-setting-requires-grad-of-subset-only-to-true/42866/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants