Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Slurm runtime for model training using V2 APIs #2249

Open
andreyvelich opened this issue Sep 5, 2024 · 1 comment
Open

Create Slurm runtime for model training using V2 APIs #2249

andreyvelich opened this issue Sep 5, 2024 · 1 comment

Comments

@andreyvelich
Copy link
Member

andreyvelich commented Sep 5, 2024

What you would like to be added?

As we discussed during the last Training WG call, we want to design and implement Training Runtime for Slurm, so users can leverage Slurm workload manager for model training on Kubernetes.

Recordings: https://youtu.be/IBDyYUbB0UA

We can continue discussions once we implement the Training Operator V2 APIs.

cc @kubeflow/wg-training-leads @catblade

/area runtime

Love this feature?

Give it a 👍 We prioritize the features with most 👍

@andreyvelich
Copy link
Member Author

/remove-label lifecycle/needs-triage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant