Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Create PyTorch multi-node distributed training runtime #2211

Open
Tracked by #2170
andreyvelich opened this issue Aug 14, 2024 · 1 comment
Open
Tracked by #2170

Comments

@andreyvelich
Copy link
Member

Related: #2170

We should create ClusterTrainingRuntime for PyTorch multi-node distributed training.

/area runtime

@yang20150702
Copy link

I'm learning training-operator v1, I want to work for this issue. Please give me some suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants