-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
最新版本 transformers 多机训练保存 checkpoint 出错 #1809
Comments
It's an issue related to transformers 4.36.0 |
解决方案1:安装修复后的 transformers 库 pip uninstall transformers
pip install git+https://github.com/hiyouga/transformers.git 或使用国内镜像 pip uninstall transformers
pip install git+https://hub.nuaa.cf/hiyouga/transformers.git 解决方案2:使用 stable 分支代码 并安装 transformers < 4.35 |
这里需要指定pip install transformers==4.36.0 |
@qiuxin610 是 fork 版本的一些问题,已修复 |
@lyk0014 修复了,这次重试下 |
请问一下这是同一个问题吗? @hiyouga |
同一个 |
你好,我transformers==4.38.2已经更新到最新版本 多机训练保存checkpoint也是会出现这种问题 |
请问解决了吗,我这边也是 |
装最新版本的 transformers 试试(4.39.0.dev0) |
最新版的移除了一个LLamaFactory import的一个module,引发了另外一个报错 |
@trotsky1997 同时也要从源码安装 trl |
感谢,正在尝试,方便加个v吗? |
@trotsky1997 有官方交流群 |
好的,看到了,感谢! |
哈喽,没找到群名片,能发一下吗 |
如题,单机多卡微调报错如下(老版能成功训练):
The text was updated successfully, but these errors were encountered: