ChatGLM3 全参数微调后，加载checkpoint报错 #1340

nansanhao · 2023-11-01T03:05:25Z

tokenizer = AutoTokenizer.from_pretrained(model_file_path, trust_remote_code=True)
AttributeError: can't set attribute 'eos_token'

AmeowCAT · 2023-11-01T03:12:34Z

我用Lora微调后合并完了加载也会报这个错目前是把合并目录下tokenizer_config.json文件里的几个*_token删掉才能运行

nansanhao · 2023-11-01T03:51:04Z

我用Lora微调后合并完了加载也会报这个错目前是把合并目录下tokenizer_config.json文件里的几个*_token删掉才能运行

删除之后报这个错：assert self.padding_side == "left" AssertionError @AmeowCAT

hiyouga · 2023-11-01T03:52:18Z

需要手动修改 tokenizer config 里面的 padding side 为 left

yanyuze123 · 2023-11-01T07:29:14Z

@hiyouga 您好，这个项目目前支持chatGLM2嘛，我跑出来以后导出模型也是AttributeError: can't set attribute 'eos_token'这个错误。

tokenizer config 内容如下：
{
"added_tokens_decoder": {},
"auto_map": {
"AutoTokenizer": [
"tokenization_chatglm.ChatGLMTokenizer",
null
]
},
"clean_up_tokenization_spaces": false,
"do_lower_case": false,
"eos_token": "",
"model_max_length": 1000000000000000019884624838656,
"pad_token": "",
"padding_side": "left",
"remove_space": false,
"split_special_tokens": false,
"tokenizer_class": "ChatGLMTokenizer",
"unk_token": ""
}

hiyouga · 2023-11-01T08:27:00Z

#1307 (comment)

CplusHua01 · 2023-11-01T10:36:44Z

以下几个参数值不可设定属性，可以注释掉来尝试兼容
https://huggingface.co/THUDM/chatglm3-6b-32k/raw/main/tokenization_chatglm.py

`

# @property
# def unk_token(self) -> str:
#     return "<unk>"

# @property
# def pad_token(self) -> str:
#     return "<unk>"

# @property
# def eos_token(self) -> str:
#     return "</s>"

`

yanyuze123 · 2023-11-02T03:52:26Z

@CplusHua01
直接改，导出训练好的模型仍然会有这几个参数。

重新训练的话会直接报错：
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}).

是因为我训练的是chatGLM2嘛？
作者说的直接覆盖除bin和pytorch_model.bin.index.json是可行的。

CplusHua · 2023-11-02T05:30:20Z

https://huggingface.co/THUDM/chatglm3-6b-32k/raw/main/tokenization_chatglm.py 重新导出后注意导出后的tokenization_chatglm.py 是否需要修改

dragoncdj · 2023-12-19T09:15:48Z

我用Lora微调后合并完了加载也会报这个错目前是把合并目录下tokenizer_config.json文件里的几个*_token删掉才能运行

删掉之后微调效果失效，是哪块做的不对吗

hiyouga added solved This problem has been already solved duplicate This issue or pull request already exists and removed solved This problem has been already solved labels Nov 1, 2023

hiyouga closed this as completed Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatGLM3 全参数微调后，加载checkpoint报错 #1340

ChatGLM3 全参数微调后，加载checkpoint报错 #1340

nansanhao commented Nov 1, 2023

AmeowCAT commented Nov 1, 2023 •

edited

Loading

nansanhao commented Nov 1, 2023

hiyouga commented Nov 1, 2023

yanyuze123 commented Nov 1, 2023

hiyouga commented Nov 1, 2023

CplusHua01 commented Nov 1, 2023 •

edited

Loading

yanyuze123 commented Nov 2, 2023

CplusHua commented Nov 2, 2023

dragoncdj commented Dec 19, 2023

ChatGLM3 全参数微调后，加载checkpoint报错 #1340

ChatGLM3 全参数微调后，加载checkpoint报错 #1340

Comments

nansanhao commented Nov 1, 2023

AmeowCAT commented Nov 1, 2023 • edited Loading

nansanhao commented Nov 1, 2023

hiyouga commented Nov 1, 2023

yanyuze123 commented Nov 1, 2023

hiyouga commented Nov 1, 2023

CplusHua01 commented Nov 1, 2023 • edited Loading

yanyuze123 commented Nov 2, 2023

CplusHua commented Nov 2, 2023

dragoncdj commented Dec 19, 2023

AmeowCAT commented Nov 1, 2023 •

edited

Loading

CplusHua01 commented Nov 1, 2023 •

edited

Loading