-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't error if node data file is empty #1214
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for checking into this, @corneliusroemer! Looking at the original commit that added the validation check, @jameshadfield's commit message says:
A side-effect of this work is that the requirement for node-data JSONs
to specify "nodes" has been relaxed (see [2] for an example); however
if neither "nodes" nor "branches" are defined then we raise a validation
error.
Based on this description, it seems the intention was the observed behavior where "nodes" and "branches" cannot both be empty. I agree though that it is possible and even common in some workflows to produce node data JSONs with no annotations.
If that is really the behavior we want as a default, then instead of removing the validation check, we might move the check into the conditional block below where the other validation logic lives. With some slight reorganization of that block, we could use the existing validation mode interface to allow this validation to be skipped.
I think this shouldn't be an error. I can think of scenarios where you annotate specific branches or tips with rare traits that are sometimes present, sometimes not. For example drug resistance mutations, insertions, etc. But I see how empty branches and nodes could also be flag that something went wrong. But if we don't require EVERY node or branch to be present, I don't think we should error when NONE are present. Instead of whole-sale deletion of the check, or making it conditional, maybe we can just provide a warning like here:
|
Given that this is a surprising breaking change that wasn't announced in the changelog, I'd suggest we merge this PR asap and make a bug fix release. This doesn't prejudice how we may want to handle this in the future. I'd be fine with warning and/or making it part of export validation, but I don't think we need to agree on the path forward to make sure we don't break things in unexpected ways. |
why not make it warning now? that would be the minimal change. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1214 +/- ##
==========================================
- Coverage 68.81% 68.81% -0.01%
==========================================
Files 64 64
Lines 6936 6935 -1
Branches 1692 1692
==========================================
- Hits 4773 4772 -1
Misses 1856 1856
Partials 307 307
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to get input form James, but this looks good to me (pending update of the PR description and changelog)
Resolves #1215 Warn instead error when no nodes in a node data json, fixing issue introduced recently in PR #728 In PR #728, extra node data validation was introduced. In particular, files without information for either `nodes` or `branches` caused erroring. This is problematic for test scripts that may produce empty node data in test cases. This PR removes the eager validation. In the future we could reintroduce it as a warning. And possibly an error but with opt-out. This type of node data json was previously errored on by augur export, it is now accepted again: ```json { "nodes": {}, "rbd_level_details": {} } ``` <!-- Start typing the name of a related issue and GitHub will auto-suggest the issue number for you. --> Fixes the ncov pathogen-CI issue: nextstrain/conda-base#27 (comment) What steps should be taken to test the changes you've proposed? If you added or changed behavior in the codebase, did you update the tests, or do you need help with this? - [x] nextstrain/conda-base#27 (comment) is fixed, export now accepts empty nodes dicts again
6f96100
to
b745538
Compare
@rneher I've added a changelog entry, so should be good to go from my end |
A warning is absolutely fine. There may have been a bug in my implementation, I wanted to allow empty dictionaries, but error when both dictionaries were missing from the JSON. |
It works in ncov pathogen CI now: https://github.com/nextstrain/augur/actions/runs/4985866782/jobs/8925984486 |
Specifically, my PR should have had something like: - if not self.nodes and not self.branches:
+ if self.attrs.get("branches") == None and self.attrs.get("nodes") == None:
raise AugurError(
f"{self.fname} did not contain either `nodes` or `branches`. Please check the formatting of this JSON!"
) But it's probably fine as-is -- we'll now get a warning, and maybe intermediates without nodes and without branches are normal behavior in some workflows. |
Resolves #1215
Partial revert of (over) eager validation introduced recently through PR #728
Description of proposed changes
In PR #728, extra node data validation was introduced. In particular, files without information for either
nodes
orbranches
caused erroring.This is problematic for test scripts that may produce empty node data in test cases.
This PR removes the eager validation. In the future we could reintroduce it as a warning.
And possibly an error but with opt-out.
This type of node data json was previously errored on by augur export, it is now accepted again:
Related issue(s)
Fixes the ncov pathogen-CI issue: nextstrain/conda-base#27 (comment)
Testing
What steps should be taken to test the changes you've proposed?
If you added or changed behavior in the codebase, did you update the tests, or do you need help with this?
Checklist