Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: frequent, intermittent failures to download testing data #8685

Closed
drammock opened this issue Dec 29, 2020 · 10 comments
Closed

CI: frequent, intermittent failures to download testing data #8685

drammock opened this issue Dec 29, 2020 · 10 comments

Comments

@drammock
Copy link
Member

There have been a lot of issues on the CIs today where the "get testing data" step has failed. This failure is typical:

https://github.com/mne-tools/mne-python/pull/8669/checks?check_run_id=1623277082#step:10:4609

Successful downloads of the testing data have been intermittent among these failures, so that re-running a test will sometimes work on the second, third, or fourth attempt. I tried downloading the v0.112 testing data manually through my browser 5 times, and it failed 4 of those times. No obvious difference between codeload.github.com links versus regular github .../archive/filename.tar.gz links.

@larsoner
Copy link
Member

Based on observation this seems to randomly happen from time to time, and I've just assumed that GitHub just has some bad download days / brown-outs. FYI when I started 5 simultaneous downloads, the first four failed right when I started the fifth. When I just ran two downloads in parallel at a time, I got 6 of 6 downloaded (in pairs) correctly with no failures. So maybe your 4-of-5 failure is the same as mine? If so, I'm guessing that GitHub detects it as spam and kills them when it gets too many identical requests from the same client.

I wonder if we're by chance happening to hit some limit like this on CIs, too, where it thinks what we're doing is spam and kills it. Maybe we're doing something suboptimal in terms of the HTTP request or something and pooch will magically fix it...

I put in a "fix" for this for CircleCI just by trying twice, and it did seem to decrease the number of problems we've had there. Maybe the "fix" here can be similar, just try twice or a few times.

@drammock
Copy link
Member Author

pooch does have an auto-retry param. Hopefully that will be a good enough work-around. FWIW, when I tried manual downloads 5 times and failed 4 of them, that was serial, not parallel, so the attempts were about a minute apart, thus I don't think it's likely that it was some kind of server-side anti-bot or anti-spam thing

@drammock
Copy link
Member Author

thinking more about this... could it be that it coincides with our switch away from Travis? i.e., the github workflows seem to trigger all jobs at once (alongside the azure jobs) so it may be that we're sending ~5-10 download requests in very quick succession.

would be nice if the data fetching part of testing could just connect to a storage bucket somewhere, instead of making a bunch of HTTP calls

@agramfort
Copy link
Member

agramfort commented Jan 4, 2021 via email

@cbrnr
Copy link
Contributor

cbrnr commented Jan 4, 2021

Caching should be possible: https://docs.github.com/en/free-pro-team@latest/actions/guides/caching-dependencies-to-speed-up-workflows

@agramfort
Copy link
Member

agramfort commented Jan 4, 2021 via email

@GuillaumeFavelier
Copy link
Contributor

I would like to try if you don't mind.

I read that the total size of all caches in a repository is limited to 5 GB. That should be enough for Github Actions right?

@larsoner
Copy link
Member

larsoner commented Jan 4, 2021

our testing dataset is ~1GB so we should be good

@drammock
Copy link
Member Author

We're caching datasets in CIs nowadays, right? So this can be closed?

@larsoner
Copy link
Member

Yep!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants