Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatGLM3 全参数微调后,加载checkpoint报错 #1340

Closed
nansanhao opened this issue Nov 1, 2023 · 9 comments
Closed

ChatGLM3 全参数微调后,加载checkpoint报错 #1340

nansanhao opened this issue Nov 1, 2023 · 9 comments
Labels
duplicate This issue or pull request already exists

Comments

@nansanhao
Copy link

tokenizer = AutoTokenizer.from_pretrained(model_file_path, trust_remote_code=True)
AttributeError: can't set attribute 'eos_token'

@AmeowCAT
Copy link

AmeowCAT commented Nov 1, 2023

我用Lora微调后合并完了 加载也会报这个错 目前是把合并目录下tokenizer_config.json文件里的几个*_token删掉才能运行

@nansanhao
Copy link
Author

我用Lora微调后合并完了 加载也会报这个错 目前是把合并目录下tokenizer_config.json文件里的几个*_token删掉才能运行

删除之后报这个错:assert self.padding_side == "left" AssertionError @AmeowCAT

@hiyouga
Copy link
Owner

hiyouga commented Nov 1, 2023

需要手动修改 tokenizer config 里面的 padding side 为 left

@yanyuze123
Copy link

@hiyouga 您好,这个项目目前支持chatGLM2嘛,我跑出来以后导出模型也是AttributeError: can't set attribute 'eos_token'这个错误。

tokenizer config 内容如下:
{
"added_tokens_decoder": {},
"auto_map": {
"AutoTokenizer": [
"tokenization_chatglm.ChatGLMTokenizer",
null
]
},
"clean_up_tokenization_spaces": false,
"do_lower_case": false,
"eos_token": "",
"model_max_length": 1000000000000000019884624838656,
"pad_token": "",
"padding_side": "left",
"remove_space": false,
"split_special_tokens": false,
"tokenizer_class": "ChatGLMTokenizer",
"unk_token": ""
}

@hiyouga
Copy link
Owner

hiyouga commented Nov 1, 2023

#1307 (comment)

@hiyouga hiyouga added solved This problem has been already solved duplicate This issue or pull request already exists and removed solved This problem has been already solved labels Nov 1, 2023
@hiyouga hiyouga closed this as completed Nov 1, 2023
@CplusHua01
Copy link

CplusHua01 commented Nov 1, 2023

以下几个参数值不可设定属性,可以注释掉来尝试兼容
https://huggingface.co/THUDM/chatglm3-6b-32k/raw/main/tokenization_chatglm.py

`

# @property
# def unk_token(self) -> str:
#     return "<unk>"

# @property
# def pad_token(self) -> str:
#     return "<unk>"

# @property
# def eos_token(self) -> str:
#     return "</s>"

`

@yanyuze123
Copy link

@CplusHua01
直接改,导出训练好的模型仍然会有这几个参数。

重新训练的话会直接报错 :
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}).

是因为我训练的是chatGLM2嘛?
作者说的直接覆盖除bin和pytorch_model.bin.index.json是可行的。

@CplusHua
Copy link

CplusHua commented Nov 2, 2023

https://huggingface.co/THUDM/chatglm3-6b-32k/raw/main/tokenization_chatglm.py 重新导出后注意导出后的tokenization_chatglm.py 是否需要修改

@dragoncdj
Copy link

我用Lora微调后合并完了 加载也会报这个错 目前是把合并目录下tokenizer_config.json文件里的几个*_token删掉才能运行

删掉之后微调效果失效,是哪块做的不对吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

7 participants