Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The speed of saving models during multi machine and multi card training is very slow #15

Open
JarvisFei opened this issue Aug 3, 2023 · 1 comment

Comments

@JarvisFei
Copy link

Have you tried training this model on multiple machines?

If you have tried, is there anything special to pay attention to in terms of environment settings and parameters?

@Guangxuan-Xiao
Copy link
Collaborator

Yes, we have. It's important to ensure a good inter-node connection to prevent communication bottlenecks during training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants