-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChatGLM是不是无法做RM和RL的训练? #107
Comments
是,chatglm不是标准CausalLM |
明白,感谢! |
请问新增的DPO方法可以用于ChatGLM2吗? |
我看B站有人这样做 |
dpo可以跑chatglm2-6b |
请问chatglm支持吗,1版本的,不是chatglm2 |
DPO的目标函数是为了扩大Q-A1 和Q-A2 这两个文本之间的差异,跟生成文本的条件概率有关系,跟模型本身关系不大,可以试试trl的代码或者本项目的代码替换模型。 |
好的好的十分感谢 |
Describe the Question
Please provide a clear and concise description of what the question is.
chatglm2是不是做不了PPO相关的训练,我在rm模型中用了bert训练,但是无法合并参数,同时第四部的rl训练也显示ChatGLM2模型没有AutoModelForCausalLMWithValueHead,请问这种情况是不是只能换模型了
The text was updated successfully, but these errors were encountered: