-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip saving frozen parameters if using peft model with deepspeed #26503
Skip saving frozen parameters if using peft model with deepspeed #26503
Conversation
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Gentle ping @pacman100 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
I believe this is resolved in #27825 |
@amyeroberts Hi Amy, you are right. We implemented a similar solution to save only a portion of the weights. However, I'm afraid that issue could arise when attempting to load the weights again, unless the # @@ def deepspeed_load_checkpoint(deepspeed_engine, checkpoint_path):
if len(deepspeed_checkpoint_dirs) > 0:
logger.info(f"Attempting to resume from {checkpoint_path}")
+
+ load_module_strict = True
+ if version.parse(deepspeed_version) > version.parse("0.10.0"):
+ if is_peft_available() and isinstance(deepspeed_engine.module, PeftModel):
+ load_module_strict = False
# this magically updates self.optimizer and self.lr_scheduler
load_path, _ = deepspeed_engine.load_checkpoint(
- checkpoint_path, load_optimizer_states=True, load_lr_scheduler_states=True
+ checkpoint_path,
+ load_optimizer_states=True,
+ load_lr_scheduler_states=True,
+ load_module_strict=load_module_strict,
)
|
@VeryLazyBoy Thanks for pointing out! @muellerzr @pacman100 could you give this a first review? |
I think I am running into this issue when resuming,
The |
I encountered the same issue previously. Moreover, when resuming from the checkpoint in Deepspeed (with Currently, Transformers delegates checkpointing to Deepspeed if it is enabled, and in that scenario lr_scheduler is not saved and loaded in checkpoints. I tweaked the Transformers a bit to handle this scenario with minimal changes in this commit. |
Hi @kazemf78 , have you checked this #25863? By the way, the issue you mentioned is not related to this PR, if there indeed exist a bug, you can create a separate issue or pr :) |
The |
cc @pacman100 for first review |
Hello, Thank you for @VeryLazyBoy for your PR. I'm extremely sorry and apologize that this PR got missed. However, this issue is fixed in #28746. |
Thank you for providing this information. I am pleased to hear that the issue has been resolved. @pacman100 |
What does this PR do?
Currently, when using
PeftModel
,transformers
only save the weights of the adapter and support resuming training from these saved weights. However, ifdeepspeed
is used on top ofPeftModel
, the entire model weights are saved. This behavior differs from that ofPeftModel
.This PR integrates a newly added parameter
exclude_frozen_weights
from deepspeed to skip saving frozen weights if using peft.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@pacman100