-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparse training cluster时在pass0后失败 #660
Comments
@tianbingsz 也知晓下。sparse相关模型重构后,用户反馈若干问题,需要深入分析问题。 |
我现在用的是icode的版本,最近一次ci是10月8日(* master 7c60b90 Merge "remove PserverForPython.h which is not used")。 不知道是否已经被重构了。 另外。我们尝试过sparse-binary-vec的cluster版本的sparse-train,遇到同样p-server启动失败的问题。 |
icode版本不再维护,且后续github 主干有若干关于sparse训练的bugfix,故请更新到新代码,内部有新版本receiver(通过内部渠道沟通),您只要更换下receiver配置即可使能最新版本。 |
@CDDB 请关注deeplearning.baidu.com,面向百度同学的使用文档介绍,获取有关集群信息。 |
收到, 能确认我遇到的问题是已知问题,并且已经fix了么? |
暂时不能确定 |
好的, 我转到内网询问。 这个issue我删掉 |
成功跑了一轮Pass,但是在Eval结果还没有出现前挂了。 似乎有几个关联问题? |
确认如果取消Test就可以跑通 |
* fix_windows * Final update 1.3 (PaddlePaddle#653) * thorough clean * delete_DS_Store * update_1.3
集群配置问题,转到内网
The text was updated successfully, but these errors were encountered: