Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] add slow tests to GH workflow #304

Merged
merged 2 commits into from
Apr 25, 2023

Conversation

younesbelkada
Copy link
Contributor

This PR adds the workflow to run slow tests

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 13, 2023

The documentation is not available anymore as the PR was closed or merged.

@younesbelkada
Copy link
Contributor Author

Added the corresponding GCP machine to the list of hosted runners, I think we should be all good once we add the SLACK_API_TOKEN inside the repo's secret!

Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I'll note here that occasionally the nightly reporter bugs out, but for now this does the job :) (I'll ping when I figure out a better reporting setup :) )

Specifically it'll say X job failed/test if it failed a day before. You can still view the runner to see if it all passed


jobs:
run_all_tests_single_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to use multi-GPU on a single GPU VM 😃 it's definitely a way to save money.

run_all_tests_single_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
env:
CUDA_VISIBLE_DEVICES: "0"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we really need this. If not, we can remove them, and just use a matrix thing to combine the 2 different jobs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with matrix (imo) is it's less clear, vs running them this way when we view the reports in GHA. And matrix get messy quick as well :)

But you are right, both options work. @younesbelkada definitely worth I think playing with both to see which you prefer more eventually :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though thinking on this more, probably worth doing on our nightlies. Good point @ydshieh. Most of my fear resided in matrix upon matrix, but the nightlies don't have split tests like our main does.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a necessary change for me, it is merely a suggestion :-) So your call 🤗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the pointers! I will stick with the first solution proposed by @muellerzr and see if I face into any issue

run_all_tests_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
env:
CUDA_VISIBLE_DEVICES: "0,1"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 but with 2 nits for you to check.

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @younesbelkada, @muellerzr and @ydshieh for setting up the slow tests and Slack notifications! 🚀🤗✨

@younesbelkada
Copy link
Contributor Author

Thank you all for your comments!

@younesbelkada younesbelkada merged commit 49a20c1 into huggingface:main Apr 25, 2023
@younesbelkada younesbelkada deleted the add-slow-tests-ci branch April 25, 2023 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants