-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
单机多卡Chatglm3 PPO过程中RM模型token加载资源抢占 #1570
Labels
duplicate
This issue or pull request already exists
Comments
单卡也是同样错误 CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--stage ppo \
--model_name_or_path /home/user123/model/chatglm3-6b-base \
--do_train \
--dataset alpaca_gpt4_zh \
--template chatglm3 \
--finetuning_type full \
--reward_model_type full \
--checkpoint_dir /home/user123/model/chatglm3-alpaca-exp \
--reward_model /home/user123/model/chatglm3-rm \
--output_dir /home/user123/model/chatglm3-ppo \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 1000 \
--learning_rate 1e-5 \
--num_train_epochs 1.0 \
--plot_loss \
--fp16
|
这个demo同样错误 from transformers import AutoModel, AutoTokenizer
# Load the model and tokenizer
#model = AutoModel.from_pretrained("/home/user123/model/chatglm3-rm", device_map="auto", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("/home/user123/model/chatglm3-rm", trust_remote_code=True)
#print(model)
print(tokenizer) 错误 Traceback (most recent call last):
File "/home/user123/project/LLaMA-Factory-main/pred.py", line 5, in <module>
tokenizer = AutoTokenizer.from_pretrained("/home/user123/model/chatglm3-rm", trust_remote_code=True)
File "/home/user123/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 738, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/user123/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained
return cls._from_pretrained(
File "/home/user123/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/user123/.cache/huggingface/modules/transformers_modules/chatglm3-rm/tokenization_chatglm.py", line 93, in __init__
super().__init__(padding_side=padding_side, clean_up_tokenization_spaces=clean_up_tokenization_spaces, **kwargs)
File "/home/user123/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 363, in __init__
super().__init__(**kwargs)
File "/home/user123/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1604, in __init__
super().__init__(**kwargs)
File "/home/user123/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 861, in __init__
setattr(self, key, value)
AttributeError: can't set attribute 'eos_token' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Reminder
Reproduction
单机多卡Chatglm3 PPO过程中RM模型token加载资源抢占发送错误
RW模型目录
训练参数
deepspeed --num_gpus 8 src/train_bash.py
--deepspeed ds/ds_config.json
--stage ppo
--model_name_or_path /home/user123/model/chatglm3-6b-base
--do_train
--dataset alpaca_gpt4_zh
--template chatglm3
--resume_lora_training False
--finetuning_type full
--reward_model_type full
--checkpoint_dir /home/user123/model/chatglm3-alpaca-exp
--reward_model /home/user123/model/chatglm3-rm
--output_dir /home/user123/model/chatglm3-ppo/
--overwrite_cache
--per_device_train_batch_size 2
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 1e-5
--num_train_epochs 1.0
--plot_loss
--fp16
Expected behavior
No response
System Info
No response
Others
No response
The text was updated successfully, but these errors were encountered: