-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: frequent, intermittent failures to download testing data #8685
Comments
Based on observation this seems to randomly happen from time to time, and I've just assumed that GitHub just has some bad download days / brown-outs. FYI when I started 5 simultaneous downloads, the first four failed right when I started the fifth. When I just ran two downloads in parallel at a time, I got 6 of 6 downloaded (in pairs) correctly with no failures. So maybe your 4-of-5 failure is the same as mine? If so, I'm guessing that GitHub detects it as spam and kills them when it gets too many identical requests from the same client. I wonder if we're by chance happening to hit some limit like this on CIs, too, where it thinks what we're doing is spam and kills it. Maybe we're doing something suboptimal in terms of the HTTP request or something and pooch will magically fix it... I put in a "fix" for this for CircleCI just by trying twice, and it did seem to decrease the number of problems we've had there. Maybe the "fix" here can be similar, just try twice or a few times. |
pooch does have an auto-retry param. Hopefully that will be a good enough work-around. FWIW, when I tried manual downloads 5 times and failed 4 of them, that was serial, not parallel, so the attempts were about a minute apart, thus I don't think it's likely that it was some kind of server-side anti-bot or anti-spam thing |
thinking more about this... could it be that it coincides with our switch away from Travis? i.e., the github workflows seem to trigger all jobs at once (alongside the azure jobs) so it may be that we're sending ~5-10 download requests in very quick succession. would be nice if the data fetching part of testing could just connect to a storage bucket somewhere, instead of making a bunch of HTTP calls |
maybe we could cache it and make the cache shared between workers?
… |
Caching should be possible: https://docs.github.com/en/free-pro-team@latest/actions/guides/caching-dependencies-to-speed-up-workflows |
any volunteer to give it a try?
… |
I would like to try if you don't mind. I read that the total size of all caches in a repository is limited to 5 GB. That should be enough for Github Actions right? |
our testing dataset is ~1GB so we should be good |
We're caching datasets in CIs nowadays, right? So this can be closed? |
Yep! |
There have been a lot of issues on the CIs today where the "get testing data" step has failed. This failure is typical:
https://github.com/mne-tools/mne-python/pull/8669/checks?check_run_id=1623277082#step:10:4609
Successful downloads of the testing data have been intermittent among these failures, so that re-running a test will sometimes work on the second, third, or fourth attempt. I tried downloading the v0.112 testing data manually through my browser 5 times, and it failed 4 of those times. No obvious difference between
codeload.github.com
links versus regular github.../archive/filename.tar.gz
links.The text was updated successfully, but these errors were encountered: