Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't error if node data file is empty #1214

Merged
merged 1 commit into from
May 15, 2023

Conversation

corneliusroemer
Copy link
Member

@corneliusroemer corneliusroemer commented May 15, 2023

Resolves #1215

Partial revert of (over) eager validation introduced recently through PR #728

Description of proposed changes

In PR #728, extra node data validation was introduced. In particular, files without information for either nodes or branches caused erroring.

This is problematic for test scripts that may produce empty node data in test cases.

This PR removes the eager validation. In the future we could reintroduce it as a warning.
And possibly an error but with opt-out.

This type of node data json was previously errored on by augur export, it is now accepted again:

{
  "nodes": {},
  "rbd_level_details": {}
}

Related issue(s)

Fixes the ncov pathogen-CI issue: nextstrain/conda-base#27 (comment)

Testing

What steps should be taken to test the changes you've proposed?
If you added or changed behavior in the codebase, did you update the tests, or do you need help with this?

Checklist

  • Add a message in CHANGES.md summarizing the changes in this PR that are end user focused. Keep headers and formatting consistent with the rest of the file.

Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for checking into this, @corneliusroemer! Looking at the original commit that added the validation check, @jameshadfield's commit message says:

A side-effect of this work is that the requirement for node-data JSONs
to specify "nodes" has been relaxed (see [2] for an example); however
if neither "nodes" nor "branches" are defined then we raise a validation
error.

Based on this description, it seems the intention was the observed behavior where "nodes" and "branches" cannot both be empty. I agree though that it is possible and even common in some workflows to produce node data JSONs with no annotations.

If that is really the behavior we want as a default, then instead of removing the validation check, we might move the check into the conditional block below where the other validation logic lives. With some slight reorganization of that block, we could use the existing validation mode interface to allow this validation to be skipped.

@rneher
Copy link
Member

rneher commented May 15, 2023

I think this shouldn't be an error. I can think of scenarios where you annotate specific branches or tips with rare traits that are sometimes present, sometimes not. For example drug resistance mutations, insertions, etc. But I see how empty branches and nodes could also be flag that something went wrong. But if we don't require EVERY node or branch to be present, I don't think we should error when NONE are present.

Instead of whole-sale deletion of the check, or making it conditional, maybe we can just provide a warning like here:

print_err(f"WARNING: {msg}")

@corneliusroemer
Copy link
Member Author

Given that this is a surprising breaking change that wasn't announced in the changelog, I'd suggest we merge this PR asap and make a bug fix release.

This doesn't prejudice how we may want to handle this in the future. I'd be fine with warning and/or making it part of export validation, but I don't think we need to agree on the path forward to make sure we don't break things in unexpected ways.

@corneliusroemer corneliusroemer requested review from a team, tsibley and joverlee521 May 15, 2023 18:17
@rneher
Copy link
Member

rneher commented May 15, 2023

why not make it warning now? that would be the minimal change.

@codecov
Copy link

codecov bot commented May 15, 2023

Codecov Report

Patch coverage: 66.66% and project coverage change: -0.01 ⚠️

Comparison is base (aade8fd) 68.81% compared to head (6f96100) 68.81%.

❗ Current head 6f96100 differs from pull request most recent head b745538. Consider uploading reports for the commit b745538 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1214      +/-   ##
==========================================
- Coverage   68.81%   68.81%   -0.01%     
==========================================
  Files          64       64              
  Lines        6936     6935       -1     
  Branches     1692     1692              
==========================================
- Hits         4773     4772       -1     
  Misses       1856     1856              
  Partials      307      307              
Impacted Files Coverage Δ
augur/util_support/node_data_file.py 85.45% <66.66%> (-0.26%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@rneher rneher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to get input form James, but this looks good to me (pending update of the PR description and changelog)

Resolves #1215

Warn instead error when no nodes in a node data json, fixing issue introduced recently in PR #728

In PR #728, extra node data validation was introduced. In particular, files without information for either `nodes` or `branches` caused erroring.

This is problematic for test scripts that may produce empty node data in test cases.

This PR removes the eager validation. In the future we could reintroduce it as a warning.
And possibly an error but with opt-out.

This type of node data json was previously errored on by augur export, it is now accepted again:

```json
{
  "nodes": {},
  "rbd_level_details": {}
}
```

<!-- Start typing the name of a related issue and GitHub will auto-suggest the issue number for you.  -->
Fixes the ncov pathogen-CI issue: nextstrain/conda-base#27 (comment)

What steps should be taken to test the changes you've proposed?
If you added or changed behavior in the codebase, did you update the tests, or do you need help with this?

- [x] nextstrain/conda-base#27 (comment) is fixed, export now accepts empty nodes dicts again
@corneliusroemer corneliusroemer force-pushed the remove-overeager-validation branch from 6f96100 to b745538 Compare May 15, 2023 20:27
@corneliusroemer
Copy link
Member Author

@rneher I've added a changelog entry, so should be good to go from my end

@jameshadfield
Copy link
Member

A warning is absolutely fine. There may have been a bug in my implementation, I wanted to allow empty dictionaries, but error when both dictionaries were missing from the JSON.

@corneliusroemer corneliusroemer merged commit 497a5ea into master May 15, 2023
@corneliusroemer corneliusroemer deleted the remove-overeager-validation branch May 15, 2023 22:15
@corneliusroemer
Copy link
Member Author

@jameshadfield
Copy link
Member

Specifically, my PR should have had something like:

- if not self.nodes and not self.branches:
+ if self.attrs.get("branches") == None and self.attrs.get("nodes") == None:
     raise AugurError(
         f"{self.fname} did not contain either `nodes` or `branches`. Please check the formatting of this JSON!"
      )

But it's probably fine as-is -- we'll now get a warning, and maybe intermediates without nodes and without branches are normal behavior in some workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Export complains if node data json contains only empty dicts for nodes and branches
4 participants