-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Travis jobs for slow tests will time out, blocking merge #7073
Comments
https://docs.travis-ci.com/user/customizing-the-build#Build-Timeouts documents the 50 minute timeout. Perhaps this limit could be raised by paying Travis, but that'd still mean that PR results take a very long time to come in, which is a problem in itself. Idea: First run the tests exactly once and make note of how long that took. Only run the tests again if there's enough time. After each run, use the longest of the previous runs to estimate if it's going to be possible to run again. And add some margin of safety, so that we don't start an expected 8 minute run at the 40 minute mark. @bobholt @jgraham, WDYT? We will still probably have cases where running all affected tests even once isn't possible in 50 minutes. Let's cross that bridge when we get there, however. |
Regarding #7006, I just noticed Firefox might have some configuration for disabling the test(s) that were split up for taking too long: Notably https://bugzilla.mozilla.org/show_bug.cgi?id=1351890 is disabling registration.https.html on windows due to timeout. #7006 split that test into multiple files, since on Chrome it was also taking too long to run. But probably Mozilla will need to update their configuration to match the new test file names. @wanderview |
The chrome bug where there's no output is, I think, a problem with Chrome? I suppose I hadn't considered the possibility that it's just not outputting anything when the tests are running; maybe Gecko creates enough log spew that that doesn't happen. We could of course arrange things so that we only run the tests as many times as we can fit into a 50 minute timeout, but we get the issue that if you update a lot of tests you end up bypassing the stability check because they are only running once. That doesn't seem ideal for test stability. |
Is there any way around that, short of maintaining non-Travis infrastructure that is always able to run the full test suite 10 times in 50 minutes? |
Well with different infrastructure we could run for longer (e.g. if we hooked up to Mozilla's taskcluster infrastructure — which I believe is possible [1] — the default timeout is 3 hours). Of course there has to be some limit, and it's going to be pretty annoying if PRs take multiple hours per push to test. In theory one could parallelise, but that could be difficult given the limitations of travis. Multiple parallel runs on a single machine might be possible, but that itself could cause intermittency, and if we are resource limited might not cause speedups. I don't have a great suggestion here. |
Er, sorry 1 hour on TC, but it's configurable. |
@lukebjerring, FYI, I had some ideas in #7073 (comment) and see also #7660 for a similar problem with changing many tests. |
Closing this in favor of #7660, since it comes down to the same thing, whether there are many fast tests or fewer slow tests, it's not always possible to finish running in 50 minutes. |
In #7006 it looks like the changed tests taken together were too slow to finish running 10 times in the stability checker, and so all stability checker jobs failed.
The Firefox job ran for 49 minutes before:
The Chrome job ran for 19 minutes before:
It would probably ultimately fail in the same way as Firefox, but could a periodic watchdog "echo still running" allow it for run for longer?
@bobholt
The text was updated successfully, but these errors were encountered: