Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the total time of our functional tests #3046

Open
kinow opened this issue Mar 27, 2019 · 15 comments
Open

Improve the total time of our functional tests #3046

kinow opened this issue Mar 27, 2019 · 15 comments
Milestone

Comments

@kinow
Copy link
Member

kinow commented Mar 27, 2019

Besides the issue with flaky tests #2894 , our test-battery is quite slow. I just had a look at one of the builds in Travis-CI, it took 32 minutes and 8 seconds. Where 29 minutes and 51 seconds were running the functional tests.

Tasks like git clone, virtualenv creation, installing dependencies, running coverage, do not add up to 2 minutes (there is a cache in Travis for Python/Ubuntu dependencies).

A few possible alternatives:

  • Run more tests in parallel
  • Move some of the timer based tests (e.g. sleep in a loop searching some output) to a more reactive approach when possible
  • Fix only the slowest tests
  • Pythonize some of the tests
  • Remove tests that are covered by unit tests and are not significant enough to be a functional test, or maybe demote/promote them to unit tests
  • Run coverage only against pull requests and run nightly on master too.

Having a test suite with 4 builds running in parallel, taking up to 30 minutes, and most times needing a kick for the flaky tests is quite infuriating.

But before doing any of the above, I think we should measure. We can already measure the coverage of the functional tests. I think now we could measure how long each test is taking.

This can be easily done with prove --timer, which adds a prefix like [14:59:56] /tmp/tests/cylc-submit/02-remote-with-shared-fs-bg ................. ok. Doing that we should be able to parse the output and identify what tests are taking the most time, and check whether they can be the first ones to be improved.

For me, under 5 minutes is perfect. Under 10 minutes is OK. Under 15 minutes is still OK if the tests are stable.

@kinow kinow added the small label Mar 27, 2019
@kinow kinow added this to the cylc-8.0a1 milestone Mar 27, 2019
@kinow kinow self-assigned this Mar 27, 2019
@kinow kinow removed the small label Mar 27, 2019
@kinow kinow modified the milestones: cylc-8.0a1, later Mar 27, 2019
@kinow kinow changed the title Improve the speed of our functional tests Improve the total time of our functional tests Mar 27, 2019
@hjoliver
Copy link
Member

hjoliver commented Mar 27, 2019

I'm sure we all agree with you @kinow.

I'm slightly concerned that it is just difficult to quickly test a system that manages a bunch of other jobs in a workflow (i.e. the jobs can't all run at once). However, we must be able to do better than this.

Good idea on recording individual test timings.

@hjoliver
Copy link
Member

Didn't we get down to about 10 mins when we first started using chunks on Travis CI?

@kinow
Copy link
Member Author

kinow commented Mar 27, 2019

Didn't we get down to about 10 mins when we first started using chunks on Travis CI?

Coverage could have increased it. I will add an option to run coverage only against pull requests and nightly on master too.

@kinow
Copy link
Member Author

kinow commented Apr 3, 2019

Side note: GitHub is offering a beta service similar to GitLab and Travis-CI builds: GitHub Actions. Based on containers. Worth if offers the same functionality as Travis, plus performance.

One counter-argument for using it (seen in the wild in other projects considering to migrate, is that Microsoft is behind GitHub now, and relying too much on it offers risks too as they could close or block certain features.

But Travis was acquired recently too, so ¯\_(ツ)_/¯

@hjoliver
Copy link
Member

hjoliver commented Apr 3, 2019

Interesting, worth considering.

@kinow
Copy link
Member Author

kinow commented Aug 25, 2019

Had some spare time after dinner today, and had a post-it saying "Azure" here. Jinja2 and Python (CPython) are using Azure pipelines, and I wanted to try it at least once and see what it looks like.

Running just functional tests (no unit test), 4 agents (servers), each agent appears to be consistent in finishing between 8 and 10 minutes.

https://dev.azure.com/brunodepaulak/cylc/_build/results?buildId=13

image

  • The set up is easy, you can provide an YAML file or use the interface (I used the interface)
  • It asks for full permissions to webhooks and services, and access to private repos in GitHub... unless you use a "Generic Git connection", which works just fine for me
  • The agent provisioning is really quick. For about 1 hour that I was experimenting with it, I got 4 agents in pretty much always under 30 seconds
  • There is some custom syntax for certain variables, which is not shell, and is not powershell, and is not windows batch script... you have expressions, and other things that exist only in Azure... might be useful, but it was annoying to find how to get the "CHUNK=1/4" (it was with CHUNK="${SYSTEM_JOBPOSITIONINPHASE}/${SYSTEM_TOTALJOBSINPHASE}", not in docs, found with env in a test build)
  • Integration with GitHub appears to work fine.

GitHub actions might be promising too. The git checkout takes way longer than in my machine - at least that was my impression. I think GitHub actions might be as fast as Azure DevOps (hey, both are MS now?!), and maybe the checkout will be quicker.

My feeling is that Travis CI has been taking between 20 and 30~ minutes, and it spends a long time trying to provision the servers for the jobs.

So moving to another CI infrastructure might be an alternative?

Cheers
Bruno

@hjoliver
Copy link
Member

hjoliver commented Aug 25, 2019

Thanks for investigating @kinow - that looks really good to me. I was vaguely aware of Azure Pipelines but didn't know it was free for Open Source projects. Are there any downsides that you're aware of? Otherwise, if @cylc/core agree, let's move it over.

@kinow
Copy link
Member Author

kinow commented Aug 25, 2019

So far the only thing I disliked was having to learn their special syntax for certain operations. The pro is that you have some interesting functions, but the con is having to learn these expressions, e.g. counter

There are some variables pre-defined, similar as in Travis. These variables have names like Build.BuildId, which is very C#-like. In Shell, you have to write echo $(Build.BuildId) to access its value... so your shell becomes shell with a pinch of Azure-Pipelines-Syntax.

But if that's the price to pay for saving us near 20 minutes of build time... it looks OK to me I guess.

@hjoliver
Copy link
Member

Yeah, 20 minutes is a big saving. Less pain!

@hjoliver
Copy link
Member

hjoliver commented Aug 25, 2019

Current timing for a passing test battery on Travis CI on my fork, approx 1 hr 20 min:

#3311 (comment)

@kinow
Copy link
Member Author

kinow commented Aug 25, 2019

It is a shame GitHub actions is still in Beta. Really interested to see whether it would perform as well as Azure, but without the extra syntax. I've subscribed for the beta when they announced it, but no reply yet.

Otherwise will update the azure pipeline little by little until I can confirm I have an equivalent of Travis (right now it's only functional tests there, and I haven't done a thorough check to see if the tests are being executed correctly).

@kinow
Copy link
Member Author

kinow commented Aug 27, 2019

Got GitHub actions ! Next test 🎉

Screen Shot 2019-08-27 at 18 33 44-fullpage

@kinow
Copy link
Member Author

kinow commented Aug 27, 2019

Nice web interface, similar to most GitHub screens.

Fastest check out (d'oh!), around ~10 seconds. Took 5 seconds or less to start a job once I pushed a commit.

Viewing the logs in the UI did not work. It kept complaining that it could not load the logs... but if you ask for the raw logs, or to download all logs as zip, both work fine.

Screenshot_2019-08-27_21-14-07

I am stuck at the unit tests job. It is failing to find the coverage data. But this is the last part of the unit test job, and up until that point, it took less than 2:35 seconds to complete the unit tests. Travis takes normally around 3 or 4 minutes. I am expecting the functional tests jobs to give much better results, hopefully similar to Azure Pipelines.

@kinow
Copy link
Member Author

kinow commented Aug 27, 2019

Not sure if it's going to pass or not, but at least managed to pass the unit test job. The (3.7, 1) is the result of a matrix in GitHub actions. Where 3.7 is a Python version, and 1 is the value used to export CHUNK=1/4.

Screenshot_2019-08-27_23-02-12

All jobs started immediately after the unit test, without a delay to find agents/nodes/slaves/etc. It appears to be taking a bit longer than Azure Pipelines.

GitHub actions doesn't have global environment variables, which is a bit annoying...

@oliver-sanders
Copy link
Member

The majority of our functional tests should really be implemented as unit tests or integration tests.

The new integration testing framework #3616 should allow many more tests to be converted as and when.

@oliver-sanders oliver-sanders modified the milestones: cylc-8.0.0, some-day Nov 12, 2020
@oliver-sanders oliver-sanders mentioned this issue Feb 15, 2021
7 tasks
@kinow kinow removed their assignment Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants