You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While fine-tuning a model by substituting some nn.Linear layers with lora.Linear, I noticed that the evaluation results during training differ from those after loading a checkpoint. More specifically, performing a "load-infer-save" cycle on a checkpoint without conducting any training led to changes in the weight parameters of the lora.Linear layers. Other parameters such as bias and lora_A within lora.Linear did not exhibit this behavior.
Steps to Reproduce
Replace certain nn.Linear layers within the model with lora.Linear for fine-tuning.
Save the entire model state without differentiating between LoRA-specific parameters and pretrained model parameters.
Ensure the model is in train mode.
Load the saved checkpoint using load_state_dict.
Observe that the weight parameter of lora.Linear layers changes after loading, which leads to inconsistent evaluation outcomes.
Root Cause Analysis
The problem appears to occur because when load_state_dict is called while the model is in train mode, it alters the weight parameters of lora.Linear layers. This alteration might be related to the merging and unmerging processes of LoRA parameters with the corresponding pretrained parameters.
Solution Applied
To address this issue, switch the model to eval mode before invoking load_state_dict. This approach ensures that the weight parameters of lora.Linear layers remain stable both before and after loading. Moreover, switching between eval and train modes afterward does not result in anomalies.
Is this behavior expected? If so, it would be helpful to document this behavior or adjust the implementation to prevent confusion among other users.
The following script may help reproduce the issue.
defcompare_model_weights(state_dict1, state_dict2):
# Compare the differences between two state_dict objects # (whether they have the same keys and the same values).keys1=set(state_dict1.keys())
keys2=set(state_dict2.keys())
missing_in_model1=keys2-keys1# Keys present in model2 but not in model1missing_in_model2=keys1-keys2# Keys present in model1 but not in model2all_match=Trueifmissing_in_model1ormissing_in_model2:
all_match=Falseprint("State dict keys do not match.\n")
ifmissing_in_model1:
print(f"Keys missing in model1: {missing_in_model1}\n")
ifmissing_in_model2:
print(f"Keys missing in model2: {missing_in_model2}\n")
common_keys=keys1.intersection(keys2)
forkeyincommon_keys:
ifnottorch.allclose(state_dict1[key], state_dict2[key]):
all_match=Falseprint(f"Weight mismatch found at layer: {key}\n")
print(f"Model 1 tensor: {state_dict1[key]}\n")
print(f"Model 2 tensor: {state_dict2[key]}\n")
print("-"*80+"\n")
ifall_match:
print("All weights match.")
returnall_matchcheckpoint_path="..."# This checkpoint contains all the weights of the model, # including those belonging to LoRA and those of the pre-trained model.ckp=torch.load(checkpoint_path, map_location="cpu")
# The model contains layers of lora.Linear().model=Model(...)
# Loading weights in training mode may lead to anomalies.model.train()
model.load_state_dict(ckp, strict=True)
ckp2=model.state_dict()
# This is very strange. If I execute model.eval(), # ckp and ckp2 are different; if I remove it, they are the same.model.eval()
compare_model_weights(ckp, ckp2)
The text was updated successfully, but these errors were encountered:
# ... As above ...importcopyckp_copy=copy.deepcopy(ckp)
ckp2_copy=copy.deepcopy(ckp2)
model.eval()
compare_model_weights(ckp_copy, ckp2_copy)
The above code now reports that ckp_copy and ckp2_copy are identical. This observation indicates that switching the model to eval mode triggers a parameter merging process that alters the original model weights. Consequently, this can result in the inference model weights differing from those obtained during training. This discrepancy might be due to saving and loading all parameters together, as opposed to the separate handling of LoRA parameters and pretrained parameters as demonstrated in the README examples.
Of course, as mentioned in the "Solution Applied" section of the issue, switching the model to eval mode before calling load_state_dict can prevent this problem. However, the underlying reason for why this works remains unclear.
Issue Summary
While fine-tuning a model by substituting some
nn.Linear
layers withlora.Linear
, I noticed that the evaluation results during training differ from those after loading a checkpoint. More specifically, performing a "load-infer-save" cycle on a checkpoint without conducting any training led to changes in theweight
parameters of thelora.Linear
layers. Other parameters such asbias
andlora_A
withinlora.Linear
did not exhibit this behavior.Steps to Reproduce
nn.Linear
layers within the model withlora.Linear
for fine-tuning.load_state_dict
.weight
parameter oflora.Linear
layers changes after loading, which leads to inconsistent evaluation outcomes.Root Cause Analysis
The problem appears to occur because when
load_state_dict
is called while the model is in train mode, it alters theweight
parameters oflora.Linear
layers. This alteration might be related to the merging and unmerging processes of LoRA parameters with the corresponding pretrained parameters.Solution Applied
To address this issue, switch the model to eval mode before invoking
load_state_dict
. This approach ensures that theweight
parameters oflora.Linear
layers remain stable both before and after loading. Moreover, switching between eval and train modes afterward does not result in anomalies.Is this behavior expected? If so, it would be helpful to document this behavior or adjust the implementation to prevent confusion among other users.
The following script may help reproduce the issue.
The text was updated successfully, but these errors were encountered: