-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving pytorch_model.bin with QLORA #123
Comments
hi @grimulkan , kinda of a late reply but I've been facing the same issue when not using deepspeed, hence I'm sharing what I did if that could help you or anyone that will read this. Specifically, I did two things:
If you or the repo original authors found another method, let me know 😄 |
Thanks. I should have posted what I did earlier also. Here is how I addressed it:
and then when creating the
This will save intermediate steps based on the save strategy specified, same as the LORA adapter weights. |
great solution! with a minor tweak to generalize modules to save I think this should be merged into the sftq script @yukang2017 since it a rather annoying issue that one discovers only when training is completed |
Thanks for your great contribution. I will mention this in the README.md. @grimulkan Would you mind provide a PR for fixing this? I will merge it into the main branch. |
Will do |
@grimulkan , is the reason this works because the norm and embed layers are not quantized? If they were quantized, I assume that would run into saving issues as saving 4-bit models is not supported |
Yes, they are being trained so they are not quantized, and they are small, so you don't need LORA/PEFT. This also reminds me to actually submit this PR. I somehow forgot, so thanks for the reminder! |
At least for me, the per-epoch/step saving function of transformers trainer only saves the intermediate adapter_model.bin, but this does not include the trainable embed and norm layers. Is there some other strategy to get those layers (or force pytorch_model.bin to be saved)? Are the embed/norm layers also being trained with QLORA?
The text was updated successfully, but these errors were encountered: