-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checkpoint doc #19
checkpoint doc #19
Conversation
4. **分布式训练**的过程中:每个Trainer都会在checkpoint_dir目录中保存当前Trainer的参数(只有Trainer 0会保存模型的参数),需要**分布式文件系统(HDFS等)**将同checkpoint_dir目录的数据进行合并才能得到完整的数据,恢复训练的时候需要用完整的数据进行恢复。 | ||
|
||
## 后续规划 | ||
1. 支持通过etcd进行参数保存。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
使用文档,用户不在意后续规划的。
@seiriosPlus 请真的编译出html后,确认文档能被正确的显示和引用! |
@reyoung 我重新修改一下,push的时候没太了解编写方法, 感谢。 |
生成网站后,能否贴一份预览图,放在这个pr里? |
好的,文档我会按照规范重新处理一下。 |
This reverts commit 138401b.
注: 目前不清楚Checkpoint功能的位置是否会变化,所以index.rst还不太清楚放在哪里。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent
* add softnms, nonlocal, resnet200_vd_backbone * add CBNet * update model zoo
No description provided.