请问训练结束还要准备什么 #50

fangyifei222 · 2024-03-08T02:11:26Z

我对13B模型lora微调之后，只获得了如下的文件，我将.pt文件转成了.bin模型，但是其他的configuration_baichuan.py，generation_config.json， modeling_baichuan.py都使用原来的好像不可以，请问是要针对微调后的模型自行修改吗

ssbuild · 2024-03-08T02:23:10Z

正常情况下是需要通过对应infer 模块下的文件进行推理。
lora 权重不能用peft直接加载，可以把lora 权重合并下，这样权重就跟官方权重一样了，可以直接加载推理

fangyifei222 · 2024-03-08T02:50:32Z

我训练之后的模型文件就是27G，我的数据集是问答对所以对训练脚本的数据处理做了一点修改，训练完得到的这个pt文件我转成了.bin格式的权重。我加载这个模型就会提示缺少configuration_baichuan.py这种文件，我把baichuan-2-13B的对应文件复制过来加载模型会提示部分权重未使用
-rw-rw-r-- 1 fyf fyf 4918199 Mar 6 10:00 bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 fyf fyf 4918199 Mar 6 10:00 bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 fyf fyf 13896988349 Mar 6 10:00 zero_pp_rank_0_mp_rank_00_model_states.pt
-rw-rw-r-- 1 fyf fyf 13896988349 Mar 6 10:00 zero_pp_rank_1_mp_rank_00_model_states.pt

ssbuild · 2024-03-08T03:05:14Z

我训练之后的模型文件就是27G，我的数据集是问答对所以对训练脚本的数据处理做了一点修改，训练完得到的这个pt文件我转成了.bin格式的权重。我加载这个模型就会提示缺少configuration_baichuan.py这种文件，我把baichuan-2-13B的对应文件复制过来加载模型会提示部分权重未使用 -rw-rw-r-- 1 fyf fyf 4918199 Mar 6 10:00 bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt -rw-rw-r-- 1 fyf fyf 4918199 Mar 6 10:00 bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt -rw-rw-r-- 1 fyf fyf 13896988349 Mar 6 10:00 zero_pp_rank_0_mp_rank_00_model_states.pt -rw-rw-r-- 1 fyf fyf 13896988349 Mar 6 10:00 zero_pp_rank_1_mp_rank_00_model_states.pt

不能直接改后缀， deepspeed 权重需要通过下面转换下
cd best_ckpt/last
python zero_to_fp32.py . ../last.ckpt

ssbuild · 2024-03-08T03:06:09Z

baichuan_finetuning/infer/infer_finetuning.py

Line 31 in 089367d

# deepspeed 权重使用转换脚本命令

fangyifei222 · 2024-03-08T03:12:48Z

我好像知道问题出在哪了，我想请问一下我用的是自制的数据集，是不是数据集的格式得跟data文件夹的示例数据格式相同，我的数据集格式是这种：{
"Question": "请问8端口OC-12c/STM-4c POS-SFP灵活插卡的尺寸(宽×深×高)分别是多少？",
"Answer": "8端口OC-12c/STM-4c POS-SFP灵活插卡的尺寸(宽×深×高)分别是169mm × 189.9mm × 18.4mm。"
},
{
"Question": "请问4端口OC-3c/STM-1c POS-SFP 灵活插卡的典型散热值是多少？",
"Answer": "4端口OC-3c/STM-1c POS-SFP 灵活插卡的典型散热值是275.8 BTU/hour。"
},

ssbuild · 2024-03-08T03:21:55Z

可以看下readme 的 datasample ，或者 data文件夹下的例子。

fangyifei222 · 2024-03-08T03:23:07Z

好的，谢谢您

fangyifei222 · 2024-03-27T07:16:21Z

您好我想请问一下，我修改了fine-tune.py中的peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=["W_pack"],
inference_mode=False,
r=1,
lora_alpha=32,
lora_dropout=0.1,
)调整了r=8和 lora_alpha=16，但是训练结束后的模型加载报错
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.layers.0.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).
size mismatch for base_model.model.model.layers.1.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).
size mismatch for base_model.model.model.layers.2.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).
size mismatch for base_model.model.model.layers.3.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).
size mismatch for base_model.model.model.layers.4.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).您知道怎么解决吗

ssbuild · 2024-03-27T07:26:26Z

您好我想请问一下，我修改了fine-tune.py中的peft_config = LoraConfig( task_type=TaskType.CAUSAL_LM, target_modules=["W_pack"], inference_mode=False, r=1, lora_alpha=32, lora_dropout=0.1, )调整了r=8和 lora_alpha=16，但是训练结束后的模型加载报错 RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.layers.0.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]). size mismatch for base_model.model.model.layers.1.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]). size mismatch for base_model.model.model.layers.2.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]). size mismatch for base_model.model.model.layers.3.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]). size mismatch for base_model.model.model.layers.4.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).您知道怎么解决吗

你用的哪个脚本做的推理？

fangyifei222 · 2024-03-27T07:29:17Z

我用的fine-tune.py做的训练，cli_demo.py做的加载和推理，我用了AutoPeftModelForCausalLM来加载模型，之前默认的r=1 lora_alpha=32训练之后的模型可以正常加载，修改了这两个参数训练之后的模型就出现了尺寸问题

ssbuild · 2024-03-27T07:57:52Z

我用的fine-tune.py做的训练，cli_demo.py做的加载和推理，我用了AutoPeftModelForCausalLM来加载模型，之前默认的r=1 lora_alpha=32训练之后的模型可以正常加载，修改了这两个参数训练之后的模型就出现了尺寸问题

推理脚本代码贴一下，我看看

fangyifei222 · 2024-03-27T08:00:55Z

from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "/data/fyf/Baichuan2-main/fine-tune/outputr8a16-1"
device = "cuda:0"
model = AutoPeftModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

model = model.to(device)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)

while True:

user_input = input("请输入您的问题（或输入'退出'来结束对话）：")
if user_input == "退出":
    break


messages = [{"role": "user", "content": user_input}]

response = model.chat(tokenizer, messages)


print("模型回复：", response)

ssbuild · 2024-03-27T11:58:30Z

from transformers import AutoTokenizer from peft import AutoPeftModelForCausalLM from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "/data/fyf/Baichuan2-main/fine-tune/outputr8a16-1" device = "cuda:0" model = AutoPeftModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

model = model.to(device) model.eval() tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)

while True:
user_input = input("请输入您的问题（或输入'退出'来结束对话）：")
if user_input == "退出":
    break


messages = [{"role": "user", "content": user_input}]

response = model.chat(tokenizer, messages)


print("模型回复：", response)

不能直接加载，名字有点区别，回头加一个权重转换脚本

fangyifei222 · 2024-03-28T06:35:10Z

from transformers import AutoTokenizer from peft import AutoPeftModelForCausalLM from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "/data/fyf/Baichuan2-main/fine-tune/outputr8a16-1" device = "cuda:0" model = AutoPeftModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
model = model.to(device) model.eval() tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)
while True:
user_input = input("请输入您的问题（或输入'退出'来结束对话）：")
if user_input == "退出":
    break


messages = [{"role": "user", "content": user_input}]

response = model.chat(tokenizer, messages)


print("模型回复：", response)
不能直接加载，名字有点区别，回头加一个权重转换脚本

要转换为.bin格式嘛，那为什么r=1 ，lora_alpha=32的时候直接就能加载

ssbuild · 2024-03-28T07:19:18Z

from transformers import AutoTokenizer from peft import AutoPeftModelForCausalLM from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "/data/fyf/Baichuan2-main/fine-tune/outputr8a16-1" device = "cuda:0" model = AutoPeftModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
model = model.to(device) model.eval() tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)
while True:
user_input = input("请输入您的问题（或输入'退出'来结束对话）：")
if user_input == "退出":
    break


messages = [{"role": "user", "content": user_input}]

response = model.chat(tokenizer, messages)


print("模型回复：", response)
不能直接加载，名字有点区别，回头加一个权重转换脚本
要转换为.bin格式嘛，那为什么r=1 ，lora_alpha=32的时候直接就能加载

不是不是，就是权重key ，不是文件名字，有时间我加一下。

fangyifei222 · 2024-03-28T07:26:06Z

我用的是https://github.com/baichuan-inc/Baichuan2这个项目，您是百川的开发人员嘛QvQ

ssbuild · 2024-04-23T05:47:36Z

deep_export --mode=hf --src adapter_model.bin --dst=.
@fangyifei222 使用以上方式可以转换成huggingface 权重

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问训练结束还要准备什么 #50

请问训练结束还要准备什么 #50

fangyifei222 commented Mar 8, 2024

ssbuild commented Mar 8, 2024

fangyifei222 commented Mar 8, 2024

ssbuild commented Mar 8, 2024

ssbuild commented Mar 8, 2024

fangyifei222 commented Mar 8, 2024

ssbuild commented Mar 8, 2024

fangyifei222 commented Mar 8, 2024

fangyifei222 commented Mar 27, 2024

ssbuild commented Mar 27, 2024

fangyifei222 commented Mar 27, 2024

ssbuild commented Mar 27, 2024

fangyifei222 commented Mar 27, 2024 •

edited

Loading

ssbuild commented Mar 27, 2024

fangyifei222 commented Mar 28, 2024

ssbuild commented Mar 28, 2024

fangyifei222 commented Mar 28, 2024

ssbuild commented Apr 23, 2024

请问训练结束还要准备什么 #50

请问训练结束还要准备什么 #50

Comments

fangyifei222 commented Mar 8, 2024

ssbuild commented Mar 8, 2024

fangyifei222 commented Mar 8, 2024

ssbuild commented Mar 8, 2024

ssbuild commented Mar 8, 2024

fangyifei222 commented Mar 8, 2024

ssbuild commented Mar 8, 2024

fangyifei222 commented Mar 8, 2024

fangyifei222 commented Mar 27, 2024

ssbuild commented Mar 27, 2024

fangyifei222 commented Mar 27, 2024

ssbuild commented Mar 27, 2024

fangyifei222 commented Mar 27, 2024 • edited Loading

ssbuild commented Mar 27, 2024

fangyifei222 commented Mar 28, 2024

ssbuild commented Mar 28, 2024

fangyifei222 commented Mar 28, 2024

ssbuild commented Apr 23, 2024

fangyifei222 commented Mar 27, 2024 •

edited

Loading