Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comprehensive example task for running multislice TPU workloads with JobSet (and JobSet + Kueue) #428

Open
Tracked by #438 ...
danielvegamyhre opened this issue Feb 16, 2024 · 3 comments
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines.

Comments

@danielvegamyhre
Copy link
Contributor

danielvegamyhre commented Feb 16, 2024

What would you like to be added:
Comprehensive example tasks running training workloads on TPUs with JobSet. Also demonstrating JobSet + Kueue integration would be nice.

Why is this needed:
We need more comprehensive examples that will reduce friction for users trying out JobSet for real training workloads. Right now we mostly just have toy examples with "sleep" containers that demonstrate different features.

@danielvegamyhre danielvegamyhre added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Feb 16, 2024
@uroy-personal
Copy link

/assign

@danielvegamyhre danielvegamyhre changed the title Comprehensive example tasks for running multislice TPU workloads with JobSet (and JobSet + Kueue) Comprehensive example task for running multislice TPU workloads with JobSet (and JobSet + Kueue) Mar 13, 2024
@danielvegamyhre
Copy link
Contributor Author

@uroy-personal since you have multiple tasks already, I'm going to assign this one to someone with some experience with TPUs to distribute the workload

@jedwins1998
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines.
Projects
None yet
Development

No branches or pull requests

3 participants