Saving pytorch_model.bin with QLORA #123

grimulkan · 2023-11-03T17:33:34Z

At least for me, the per-epoch/step saving function of transformers trainer only saves the intermediate adapter_model.bin, but this does not include the trainable embed and norm layers. Is there some other strategy to get those layers (or force pytorch_model.bin to be saved)? Are the embed/norm layers also being trained with QLORA?

gianlucamacri · 2023-11-15T10:53:34Z

hi @grimulkan , kinda of a late reply but I've been facing the same issue when not using deepspeed, hence I'm sharing what I did if that could help you or anyone that will read this. Specifically, I did two things:

use 8bit quantization instead of 4bit, because model saving for 4bit quantized models is yet to be supported in an official release of bitsandbytes, but they are woking on it so this may be unnecessary in the future;
add the following code after the training model.base_model.save_pretrained(training_args.output_dir). This will save the pytorch_model.bin corresponding to the base model with the weights updated due to the training for the embed and norm layers .

If you or the repo original authors found another method, let me know 😄

grimulkan · 2023-11-15T17:46:20Z

Thanks. I should have posted what I did earlier also. Here is how I addressed it:

class SavePeftModelCallback(TrainerCallback):
    def on_save(
        self,
        args: TrainingArguments,
        state: TrainerState,
        control: TrainerControl,
        **kwargs,
    ):
        checkpoint_folder = os.path.join(args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}")

        modules_to_save = ["embed", "norm"]
        state_dict = kwargs["model"].state_dict()
        to_save = {}
        for key, value in state_dict.items():
            if any(module_name in key for module_name in modules_to_save):
                to_save[key.replace("base_model.model.", "")] = value
        torch.save(to_save, os.path.join(checkpoint_folder, "trainable_params.bin"))    

        return control

and then when creating the Trainer

        trainer = Trainer(
            ...
            callbacks=[SavePeftModelCallback],
                        )

This will save intermediate steps based on the save strategy specified, same as the LORA adapter weights.

gianlucamacri · 2023-11-16T08:56:06Z

grimulkan

great solution! with a minor tweak to generalize modules to save I think this should be merged into the sftq script @yukang2017 since it a rather annoying issue that one discovers only when training is completed

yukang2017 · 2023-11-19T07:21:54Z

Thanks for your great contribution. I will mention this in the README.md.

@grimulkan Would you mind provide a PR for fixing this? I will merge it into the main branch.

grimulkan · 2023-11-19T11:12:04Z

Will do

RonanKMcGovern · 2023-12-10T13:55:33Z

@grimulkan , is the reason this works because the norm and embed layers are not quantized?

If they were quantized, I assume that would run into saving issues as saving 4-bit models is not supported

grimulkan · 2023-12-10T19:55:12Z

Yes, they are being trained so they are not quantized, and they are small, so you don't need LORA/PEFT.

This also reminds me to actually submit this PR. I somehow forgot, so thanks for the reminder!

gianlucamacri mentioned this issue Nov 16, 2023

To save model in HF format after supervised-fine-tune-qlora #139

Open

yukang2017 closed this as completed Nov 19, 2023

GirinMan mentioned this issue Feb 11, 2024

Add callback for saving trainable parameters and model config #178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving pytorch_model.bin with QLORA #123

Saving pytorch_model.bin with QLORA #123

grimulkan commented Nov 3, 2023

gianlucamacri commented Nov 15, 2023

grimulkan commented Nov 15, 2023 •

edited

Loading

gianlucamacri commented Nov 16, 2023

yukang2017 commented Nov 19, 2023

grimulkan commented Nov 19, 2023

RonanKMcGovern commented Dec 10, 2023

grimulkan commented Dec 10, 2023

Saving pytorch_model.bin with QLORA #123

Saving pytorch_model.bin with QLORA #123

Comments

grimulkan commented Nov 3, 2023

gianlucamacri commented Nov 15, 2023

grimulkan commented Nov 15, 2023 • edited Loading

gianlucamacri commented Nov 16, 2023

yukang2017 commented Nov 19, 2023

grimulkan commented Nov 19, 2023

RonanKMcGovern commented Dec 10, 2023

grimulkan commented Dec 10, 2023

grimulkan commented Nov 15, 2023 •

edited

Loading