-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the total time of our functional tests #3046
Comments
I'm sure we all agree with you @kinow. I'm slightly concerned that it is just difficult to quickly test a system that manages a bunch of other jobs in a workflow (i.e. the jobs can't all run at once). However, we must be able to do better than this. Good idea on recording individual test timings. |
Didn't we get down to about 10 mins when we first started using chunks on Travis CI? |
Coverage could have increased it. I will add an option to run coverage only against pull requests and nightly on master too. |
Side note: GitHub is offering a beta service similar to GitLab and Travis-CI builds: GitHub Actions. Based on containers. Worth if offers the same functionality as Travis, plus performance. One counter-argument for using it (seen in the wild in other projects considering to migrate, is that Microsoft is behind GitHub now, and relying too much on it offers risks too as they could close or block certain features. But Travis was acquired recently too, so |
Interesting, worth considering. |
Had some spare time after dinner today, and had a post-it saying "Azure" here. Jinja2 and Python (CPython) are using Azure pipelines, and I wanted to try it at least once and see what it looks like. Running just functional tests (no unit test), 4 agents (servers), each agent appears to be consistent in finishing between 8 and 10 minutes. https://dev.azure.com/brunodepaulak/cylc/_build/results?buildId=13
GitHub actions might be promising too. The git checkout takes way longer than in my machine - at least that was my impression. I think GitHub actions might be as fast as Azure DevOps (hey, both are MS now?!), and maybe the checkout will be quicker. My feeling is that Travis CI has been taking between 20 and 30~ minutes, and it spends a long time trying to provision the servers for the jobs. So moving to another CI infrastructure might be an alternative? Cheers |
Thanks for investigating @kinow - that looks really good to me. I was vaguely aware of Azure Pipelines but didn't know it was free for Open Source projects. Are there any downsides that you're aware of? Otherwise, if @cylc/core agree, let's move it over. |
So far the only thing I disliked was having to learn their special syntax for certain operations. The pro is that you have some interesting functions, but the con is having to learn these expressions, e.g. There are some variables pre-defined, similar as in Travis. These variables have names like But if that's the price to pay for saving us near 20 minutes of build time... it looks OK to me I guess. |
Yeah, 20 minutes is a big saving. Less pain! |
Current timing for a passing test battery on Travis CI on my fork, approx 1 hr 20 min: |
It is a shame GitHub actions is still in Beta. Really interested to see whether it would perform as well as Azure, but without the extra syntax. I've subscribed for the beta when they announced it, but no reply yet. Otherwise will update the azure pipeline little by little until I can confirm I have an equivalent of Travis (right now it's only functional tests there, and I haven't done a thorough check to see if the tests are being executed correctly). |
Nice web interface, similar to most GitHub screens. Fastest check out (d'oh!), around ~10 seconds. Took 5 seconds or less to start a job once I pushed a commit. Viewing the logs in the UI did not work. It kept complaining that it could not load the logs... but if you ask for the raw logs, or to download all logs as zip, both work fine. I am stuck at the unit tests job. It is failing to find the coverage data. But this is the last part of the unit test job, and up until that point, it took less than 2:35 seconds to complete the unit tests. Travis takes normally around 3 or 4 minutes. I am expecting the functional tests jobs to give much better results, hopefully similar to Azure Pipelines. |
Not sure if it's going to pass or not, but at least managed to pass the unit test job. The All jobs started immediately after the unit test, without a delay to find agents/nodes/slaves/etc. It appears to be taking a bit longer than Azure Pipelines. GitHub actions doesn't have global environment variables, which is a bit annoying... |
The majority of our functional tests should really be implemented as unit tests or integration tests. The new integration testing framework #3616 should allow many more tests to be converted as and when. |
Besides the issue with flaky tests #2894 , our
test-battery
is quite slow. I just had a look at one of the builds in Travis-CI, it took 32 minutes and 8 seconds. Where 29 minutes and 51 seconds were running the functional tests.Tasks like git clone, virtualenv creation, installing dependencies, running coverage, do not add up to 2 minutes (there is a cache in Travis for Python/Ubuntu dependencies).
A few possible alternatives:
sleep
in a loop searching some output) to a more reactive approach when possibleHaving a test suite with 4 builds running in parallel, taking up to 30 minutes, and most times needing a kick for the flaky tests is quite infuriating.
But before doing any of the above, I think we should measure. We can already measure the coverage of the functional tests. I think now we could measure how long each test is taking.
This can be easily done with
prove --timer
, which adds a prefix like[14:59:56] /tmp/tests/cylc-submit/02-remote-with-shared-fs-bg ................. ok
. Doing that we should be able to parse the output and identify what tests are taking the most time, and check whether they can be the first ones to be improved.For me, under 5 minutes is perfect. Under 10 minutes is OK. Under 15 minutes is still OK if the tests are stable.
The text was updated successfully, but these errors were encountered: