-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster CI #682
Comments
We could also consider running the notebooks and integration tests on merge to master. I.e., these checks only run after a PR is merged, not during commits to an open PR. This would catch any failures earlier than if they were only run on a release. However, this would rely on someone getting notifications of the failures on master. We could of course do both, i.e., run these checks on merge to master, and as part of the release action to ensure releases aren't made unless these checks are passing. |
@leehart what do you think? Should the notebooks and integration tests run...
|
Hi @alimanfoo. Forgive me, I don't like any of those options! Generally, things failing after they've been merged to master, resulting in a broken master, are frustrating. Ideally, we only want to merge clean code to master, which is (in my view) the main benefit of the PR CI checks. Ideally problems ought to be resolved during PR, rather than additional issues raised after a PR has been reviewed, approved and merged. It will potentially block releases and can leave a mess for someone else to clean up. It would be ideal if we could run the notebooks and other GCS-dependent checks for any new set of changes we want, whenever we want to, and update accordingly, before merging into the sacred timeline. I suppose another option is that we could have a branch running parallel to the main master branch, in which we merge PRs that have not had GCS-dependent checks run, but pass all other checks. Then, when we have run GCS-dependent checks, we can fix it on that branch before merging into master. Maybe the branch could be called "staging". |
As part of #366 we have been moving all Anopheles tests to use simulated data, rather than accessing data from GCS. The action which run these new unit tests (tests.yml) are relatively fast, e.g., running in ~6 minutes. They also have the benefit that they don't need to access GCS data and so can be run from a PR from a fork.
The action which runs the remaining legacy tests that access GCS data are defined in legacy_tests.yml. These tests have been gradually removed during the refactoring work, and at some point we might be able to completely remove this action. Until that time, this action currently takes ~13m and should reduce in time with further refactoring ongoing (e.g., via #665).
The slowest action currently is the notebooks.yml action which runs all Jupyter notebooks in the "notebooks" folder. This takes around 41m and accesses data from GCS so cannot be run in a PR from a fork.
It would be great in general if checks that run in PRs run faster and can be run in a PR from a fork.
Here's a suggested approach:
Move the
notebooks.yml
action to only run as part of the release.yml action. I.e., we only run the notebooks when trying to make a release. Any notebook failures will block a release being uploaded to PyPI, and so we can then fix any problems and retag the release.Move the
legacy_tests.yml
action to only run as part of therelease.yml
action, for the same reason as above.At a later date, when all of the Anopheles unit tests that access GCS data have been removed via the refactoring work, maybe remove the
legacy_tests.yml
action completely. Although possibly some Plasmodium tests will remain that still access GCS data. In which case renamelegacy_tests.yml
tointegration_tests.yml
to help better clarify the purpose.The text was updated successfully, but these errors were encountered: