Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Test pathogen repo CI builds with the final packages #27

Merged
merged 1 commit into from
May 12, 2023

Conversation

tsibley
Copy link
Member

@tsibley tsibley commented May 11, 2023

[ Commit message based on that of 12000a20 in nextstrain/docker-base.¹
Code changes also based on that commit, plus subsequent commits.² ]

A useful check for if new packages will break our pathogen builds.

I included all pathogen repos that already use our pathogen-repo-ci reusable workflow. It should be minimal effort to maintain this list over time—I expect it to only grow—but perhaps in the future we will want to abstract it out into a shared list of known pathogen repos.

I don't like that we have to copy the build-args for a few of the repos here since it'll be easy for this copy to diverge from the repo's authoritative build-args, but it's necessary for now. Over time as we work towards increased automation of pathogen builds, I think we can get rid of this build-args copy by further standardizing how each repo configures itself for automation. For example, instead of specifying build-args in a repo's CI workflow, the args for CI could be stored in a broader workflow metadata file (e.g. nextstrain-workflow.yaml) read by pathogen-repo-ci, or defined by some other convention.

An alternative to directly running pathogen-repo-ci against each repo here would be instead triggering the CI workflows themselves within each repo. The downside to that is it would divorce the outcomes of those workflows from this one and render them not visible from PRs in this repo. It would also require updates to each repo to support triggering and passing in of additional parameters (i.e. for the package). And finally those CI workflows sometimes run other jobs, like linting and other integration tests (e.g. with Cram), that aren't always necessary to run with a new package.

Related-to: nextstrain/docker-base#148
Related-to: nextstrain/docker-base#150
Related-to: nextstrain/docker-base#151
Related-to: nextstrain/docker-base#154

¹ nextstrain/docker-base@12000a20
² nextstrain/docker-base@bc22a0bc
nextstrain/docker-base@0a20a474
nextstrain/docker-base@75254e92

Testing

  • Checks pass

@tsibley
Copy link
Member Author

tsibley commented May 11, 2023

Confirmed it's getting the right package.

@tsibley
Copy link
Member Author

tsibley commented May 11, 2023

Two failing jobs seem to be issues with those pathogen repos? but they don't fail with the Docker runtime… so hmm.

ncov fails in augur export v2 with

ERROR: results/europe/rbd_levels.json did not contain either `nodes` or `branches`. Please check the formatting of this JSON!

This was also recently reported by a user. So something's up here… Conda runtime is a common factor.

seasonal-flu fails with

Traceback (most recent call last):
  File "/home/runner/work/conda-base/conda-base/scripts/annotate_haplotypes.py", line 62, in <module>
    if clade == "unassigned" or sequence_by_node[node.name] == sequence_by_clade[clade]:
KeyError: '3C.2'

[ Commit message based on that of 12000a20 in nextstrain/docker-base.¹
  Code changes also based on that commit, plus subsequent commits.² ]

A useful check for if new packages will break our pathogen builds.

I included all pathogen repos that already use our pathogen-repo-ci
reusable workflow.  It should be minimal effort to maintain this list
over time—I expect it to only grow—but perhaps in the future we will
want to abstract it out into a shared list of known pathogen repos.

I don't like that we have to copy the build-args for a few of the repos
here since it'll be easy for this copy to diverge from the repo's
authoritative build-args, but it's necessary for now.  Over time as we
work towards increased automation of pathogen builds, I think we can get
rid of this build-args copy by further standardizing how each repo
configures itself for automation.  For example, instead of specifying
build-args in a repo's CI workflow, the args for CI could be stored in a
broader workflow metadata file (e.g. nextstrain-workflow.yaml) read by
pathogen-repo-ci, or defined by some other convention.

An alternative to directly running pathogen-repo-ci against each repo
here would be instead triggering the CI workflows themselves within each
repo.  The downside to that is it would divorce the outcomes of those
workflows from this one and render them not visible from PRs in this
repo.  It would also require updates to each repo to support triggering
and passing in of additional parameters (i.e. for the package).  And
finally those CI workflows sometimes run other jobs, like linting and
other integration tests (e.g. with Cram), that aren't always necessary
to run with a new package.

Related-to: <nextstrain/docker-base#148>
Related-to: <nextstrain/docker-base#150>
Related-to: <nextstrain/docker-base#151>
Related-to: <nextstrain/docker-base#154>

¹ <nextstrain/docker-base@12000a20>
² <nextstrain/docker-base@bc22a0bc>
  <nextstrain/docker-base@0a20a474>
  <nextstrain/docker-base@75254e92>
@tsibley tsibley force-pushed the trs/test-pathogen-repo-ci branch from 5a1339a to abd21f3 Compare May 11, 2023 18:10
@tsibley
Copy link
Member Author

tsibley commented May 11, 2023

Those failures should be investigated, but they shouldn't block merging this PR.

@tsibley tsibley requested a review from a team May 11, 2023 18:17
@tsibley tsibley force-pushed the trs/test-pathogen-repo-ci branch from abd21f3 to 84ba40c Compare May 12, 2023 16:42
@tsibley tsibley merged commit 223a24d into main May 12, 2023
@tsibley tsibley deleted the trs/test-pathogen-repo-ci branch May 12, 2023 16:42
@huddlej
Copy link
Contributor

huddlej commented May 12, 2023

seasonal-flu issue was caused by Augur 22.0.0 change to augur clades output and resolved by 42a351f.

@corneliusroemer
Copy link
Member

corneliusroemer commented May 15, 2023

Excellent work @tsibley! This is super helpful!

The ncov failure is here:
https://github.com/nextstrain/conda-base/actions/runs/4961116874/jobs/8915514447#step:8:977

[Fri May 12 16:54:35 2023]
Job 4: Exporting data files for Auspice
Reason: Missing output files: results/europe/ncov_with_accessions.json, results/europe/ncov_with_accessions_root-sequence.json; Input files updated by another job: results/europe/logistic_growth.json, results/europe/colors.tsv, results/europe/tree.nwk, results/europe/epiweeks.json, results/europe/clades.json, results/europe/metadata_adjusted.tsv.xz, results/europe/branch_lengths.json, results/europe/nt_muts.json, results/europe/mutational_fitness.json, results/europe/rbd_levels.json, results/europe/recency.json, results/europe/distances.json, results/europe/description.md, results/europe/auspice_config.json, results/europe/traits.json, results/europe/emerging_lineages.json, results/europe/aa_muts.json


        augur export v2             --tree results/europe/tree.nwk             --metadata results/europe/metadata_adjusted.tsv.xz             --node-data results/europe/branch_lengths.json results/europe/nt_muts.json results/europe/aa_muts.json results/europe/emerging_lineages.json results/europe/clades.json results/europe/recency.json results/europe/traits.json results/europe/logistic_growth.json results/europe/mutational_fitness.json results/europe/distances.json results/europe/epiweeks.json results/europe/rbd_levels.json             --auspice-config results/europe/auspice_config.json             --include-root-sequence             --colors results/europe/colors.tsv             --lat-longs defaults/lat_longs.tsv             --title 'Genomic epidemiology of novel coronavirus - Europe-focused subsampling'             --description results/europe/description.md             --output results/europe/ncov_with_accessions.json 2>&1 | tee logs/export_europe.txt
        
ERROR: results/europe/rbd_levels.json did not contain either `nodes` or `branches`. Please check the formatting of this JSON!
Validating schema of 'results/europe/nt_muts.json'...
Validating schema of 'results/europe/aa_muts.json'...

@huddlej your fix does resolve it, just reran the job and it fails only for ncov now, no longer seasonal-flu

@corneliusroemer
Copy link
Member

Aha, the reason --docker doesn't fail this is that latest docker image is still at 21.1.0, see nextstrain/docker-base#155

corneliusroemer added a commit to nextstrain/augur that referenced this pull request May 15, 2023
Resolves #1215

Warn instead error when no nodes in a node data json, fixing issue introduced recently in PR #728

In PR #728, extra node data validation was introduced. In particular, files without information for either `nodes` or `branches` caused erroring.

This is problematic for test scripts that may produce empty node data in test cases.

This PR removes the eager validation. In the future we could reintroduce it as a warning.
And possibly an error but with opt-out.

This type of node data json was previously errored on by augur export, it is now accepted again:

```json
{
  "nodes": {},
  "rbd_level_details": {}
}
```

<!-- Start typing the name of a related issue and GitHub will auto-suggest the issue number for you.  -->
Fixes the ncov pathogen-CI issue: nextstrain/conda-base#27 (comment)

What steps should be taken to test the changes you've proposed?
If you added or changed behavior in the codebase, did you update the tests, or do you need help with this?

- [x] nextstrain/conda-base#27 (comment) is fixed, export now accepts empty nodes dicts again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

4 participants