-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC TST Document and test reproducibility with models using batch norm #1734
DOC TST Document and test reproducibility with models using batch norm #1734
Conversation
Fixes huggingface#1732 After loading a model that was trained with PEFT on a base model with some kind of batch norm layer, the loaded model should produce the same output. Right now, this does not happen. The reason is that during training, buffers for running mean etc. are updated, but they are not saved when calling save_pretrained on the PeftModel instance. Normally in PEFT, we assume that during training, the base model parameters are kept constant, which is not the case with batch norm. We only save the PEFT parameters and assume that when the user loads the base model, all parameters are restored exactly. That way, the information in the buffers is lost completely. This PR fixes this issue by saving the buffers of the batch norm layers. They are identified by checking for the presence of the track_running_stats attribute. Note: One test for BOFT is currently failing, see the comment in the test file.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
src/peft/utils/save_and_load.py
Outdated
# currently, we only deal with running stats from BatchNorm* modules | ||
if not hasattr(module, "track_running_stats"): | ||
continue | ||
for buffer_name, buffer in module.named_buffers(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we wanna check if track_running_stats
is set to True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I considered this, but was hesitant. Say someone trains a model with tracking enabled, and then turns it off for some reason before saving. Then we would not save these buffers even though they're needed, do I see that right? Ideally, we would have a check if they changed vis-à-vis the base model, but I don't see a way to monitor this except for storing a copy of all these buffers, requiring extra memory.
I guess we could decide only to save them if getattr(module, "track_running_stats", None) is True
, and issue a warning that they're not saved if getattr(module, "track_running_stats", None) is False
(and for None
, just ignore).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes... true.. ok its safer to save them you are right yeah someone can then turn it off and it starts to use the batch's summary statistics in inference mode...
@BenjaminBossan when merging the weights... I suppose currently it will work for |
Hmm, not sure if I understand. If I train a LoRA adapter and then merge it, its weights will be fused with the base model weights. When we load the LoRA adapter, the base model's running stats are replaced by whatever is saved in the LoRA checkpoint. As the running stats buffers are part of the base model and no LoRA is applied to them, they are not further affected by merging. Based on your comment, I could think of another problematic case though: A user adds 2 adapters, first adapter There are probably more edge cases that result from the fact that we kinda assume that only the PEFT parameters ever change, whereas the base model parameters are fixed. I think for this particular one, we can accept this failure case for now, as the scenario I describe should be very rare (users would typically train |
Oh, now I wonder if there isn't a different solution: Ask users to add the batch norm layers to |
|
Just tested the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @BenjaminBossan for the nice minimal reproduction of the issue with batch norm and this PR to fix it. After going through the thread I see that passing batch norm layers to modules_to_save
fixes this issue, right?
No need to add extra code to save buffers in checkpoint.
@pacman100 I removed the new functionality and instead changed the test to add batch norm layers to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @BenjaminBossan, for adding detailed docs on handling BatchNorm, adding comments on different sections of save checkpoint utility and extensive tests! ✨
# TODO: cannot use BOFT because some convolutional kernel dimensions are even (64) and others odd (147). There is no | ||
# common denominator for the boft_block_size except 1, but using 1 results in an error in the fbd_cuda kernel: | ||
# > Error in forward_fast_block_diag_cuda_kernel: an illegal memory access was encountered | ||
# "boft": BOFTConfig(target_modules=["convolution"], modules_to_save=["classifier", "normalization"], boft_block_size=2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yfeng95 @Zeju1997 @YuliangXiu Any idea how I could fix the described issue?
Fixes #1732
After loading a model that was trained with PEFT on a base model with some kind of batch norm layer, the loaded model should produce the same output. Right now, this does not happen.
The reason is that during training, buffers for running mean etc. are updated, but they are not saved when calling
save_pretrained
on thePeftModel
instance. Normally in PEFT, we assume that during training, the base model parameters are kept constant, which is not the case with batch norm. We only save the PEFT parameters and assume that when the user loads the base model, all parameters are restored exactly. That way, the information in the buffers is lost completely.This PR fixes this issue by saving the buffers of the batch norm layers. They are identified by checking for the presence of the
track_running_stats
attribute.Note: One test for BOFT is currently failing, see the comment in the test file.