fix: don't error if node data file is empty #1214

corneliusroemer · 2023-05-15T16:35:27Z

Resolves #1215

Partial revert of (over) eager validation introduced recently through PR #728

Description of proposed changes

In PR #728, extra node data validation was introduced. In particular, files without information for either nodes or branches caused erroring.

This is problematic for test scripts that may produce empty node data in test cases.

This PR removes the eager validation. In the future we could reintroduce it as a warning.
And possibly an error but with opt-out.

This type of node data json was previously errored on by augur export, it is now accepted again:

{
  "nodes": {},
  "rbd_level_details": {}
}

Related issue(s)

Fixes the ncov pathogen-CI issue: nextstrain/conda-base#27 (comment)

Testing

What steps should be taken to test the changes you've proposed?
If you added or changed behavior in the codebase, did you update the tests, or do you need help with this?

ci: Test pathogen repo CI builds with the final packages conda-base#27 (comment) is fixed, export now accepts empty nodes dicts again

Checklist

Add a message in CHANGES.md summarizing the changes in this PR that are end user focused. Keep headers and formatting consistent with the rest of the file.

huddlej

Thank you for checking into this, @corneliusroemer! Looking at the original commit that added the validation check, @jameshadfield's commit message says:

A side-effect of this work is that the requirement for node-data JSONs
to specify "nodes" has been relaxed (see [2] for an example); however
if neither "nodes" nor "branches" are defined then we raise a validation
error.

Based on this description, it seems the intention was the observed behavior where "nodes" and "branches" cannot both be empty. I agree though that it is possible and even common in some workflows to produce node data JSONs with no annotations.

If that is really the behavior we want as a default, then instead of removing the validation check, we might move the check into the conditional block below where the other validation logic lives. With some slight reorganization of that block, we could use the existing validation mode interface to allow this validation to be skipped.

rneher · 2023-05-15T17:41:51Z

I think this shouldn't be an error. I can think of scenarios where you annotate specific branches or tips with rare traits that are sometimes present, sometimes not. For example drug resistance mutations, insertions, etc. But I see how empty branches and nodes could also be flag that something went wrong. But if we don't require EVERY node or branch to be present, I don't think we should error when NONE are present.

Instead of whole-sale deletion of the check, or making it conditional, maybe we can just provide a warning like here:

augur/augur/util_support/node_data_file.py

Line 93 in 99f1711

print_err(f"WARNING: {msg}")

corneliusroemer · 2023-05-15T18:17:22Z

Given that this is a surprising breaking change that wasn't announced in the changelog, I'd suggest we merge this PR asap and make a bug fix release.

This doesn't prejudice how we may want to handle this in the future. I'd be fine with warning and/or making it part of export validation, but I don't think we need to agree on the path forward to make sure we don't break things in unexpected ways.

rneher · 2023-05-15T18:20:13Z

why not make it warning now? that would be the minimal change.

codecov · 2023-05-15T19:54:40Z

Codecov Report

Patch coverage: 66.66% and project coverage change: -0.01 ⚠️

Comparison is base (aade8fd) 68.81% compared to head (6f96100) 68.81%.

❗ Current head 6f96100 differs from pull request most recent head b745538. Consider uploading reports for the commit b745538 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1214      +/-   ##
==========================================
- Coverage   68.81%   68.81%   -0.01%     
==========================================
  Files          64       64              
  Lines        6936     6935       -1     
  Branches     1692     1692              
==========================================
- Hits         4773     4772       -1     
  Misses       1856     1856              
  Partials      307      307

Impacted Files	Coverage Δ
augur/util_support/node_data_file.py	`85.45% <66.66%> (-0.26%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

rneher

would be good to get input form James, but this looks good to me (pending update of the PR description and changelog)

Resolves #1215 Warn instead error when no nodes in a node data json, fixing issue introduced recently in PR #728 In PR #728, extra node data validation was introduced. In particular, files without information for either `nodes` or `branches` caused erroring. This is problematic for test scripts that may produce empty node data in test cases. This PR removes the eager validation. In the future we could reintroduce it as a warning. And possibly an error but with opt-out. This type of node data json was previously errored on by augur export, it is now accepted again: ```json { "nodes": {}, "rbd_level_details": {} } ```  Fixes the ncov pathogen-CI issue: nextstrain/conda-base#27 (comment) What steps should be taken to test the changes you've proposed? If you added or changed behavior in the codebase, did you update the tests, or do you need help with this? - [x] nextstrain/conda-base#27 (comment) is fixed, export now accepts empty nodes dicts again

corneliusroemer · 2023-05-15T20:32:38Z

@rneher I've added a changelog entry, so should be good to go from my end

jameshadfield · 2023-05-15T21:15:48Z

A warning is absolutely fine. There may have been a bug in my implementation, I wanted to allow empty dictionaries, but error when both dictionaries were missing from the JSON.

corneliusroemer · 2023-05-15T22:47:51Z

It works in ncov pathogen CI now: https://github.com/nextstrain/augur/actions/runs/4985866782/jobs/8925984486

jameshadfield · 2023-05-15T22:54:27Z

Specifically, my PR should have had something like:

- if not self.nodes and not self.branches:
+ if self.attrs.get("branches") == None and self.attrs.get("nodes") == None:
     raise AugurError(
         f"{self.fname} did not contain either `nodes` or `branches`. Please check the formatting of this JSON!"
      )

But it's probably fine as-is -- we'll now get a warning, and maybe intermediates without nodes and without branches are normal behavior in some workflows.

corneliusroemer requested review from jameshadfield, a team and victorlin May 15, 2023 16:36

corneliusroemer mentioned this pull request May 15, 2023

BUG: Export complains if node data json contains only empty dicts for nodes and branches #1215

Closed

corneliusroemer linked an issue May 15, 2023 that may be closed by this pull request

BUG: Export complains if node data json contains only empty dicts for nodes and branches #1215

Closed

huddlej reviewed May 15, 2023

View reviewed changes

corneliusroemer requested review from a team, tsibley and joverlee521 May 15, 2023 18:17

rneher reviewed May 15, 2023

View reviewed changes

corneliusroemer force-pushed the remove-overeager-validation branch from 6f96100 to b745538 Compare May 15, 2023 20:27

jameshadfield approved these changes May 15, 2023

View reviewed changes

corneliusroemer merged commit 497a5ea into master May 15, 2023

corneliusroemer deleted the remove-overeager-validation branch May 15, 2023 22:15

This was referenced May 16, 2023

Update workflow for new augur clades version nextstrain/ncov#1000

Merged

Fixes to augur export #1218

Merged

j23414 mentioned this pull request Jan 16, 2024

Add E gene trees nextstrain/dengue#18

Closed

1 task

joverlee521 mentioned this pull request Mar 18, 2024

[ancestral, translate] node data validation improvements #1440

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: don't error if node data file is empty #1214

fix: don't error if node data file is empty #1214

corneliusroemer commented May 15, 2023 •

edited

Loading

huddlej left a comment

rneher commented May 15, 2023

corneliusroemer commented May 15, 2023

rneher commented May 15, 2023 •

edited

Loading

codecov bot commented May 15, 2023 •

edited

Loading

rneher left a comment

corneliusroemer commented May 15, 2023

jameshadfield commented May 15, 2023

corneliusroemer commented May 15, 2023

jameshadfield commented May 15, 2023

fix: don't error if node data file is empty #1214

fix: don't error if node data file is empty #1214

Conversation

corneliusroemer commented May 15, 2023 • edited Loading

Description of proposed changes

Related issue(s)

Testing

Checklist

huddlej left a comment

Choose a reason for hiding this comment

rneher commented May 15, 2023

corneliusroemer commented May 15, 2023

rneher commented May 15, 2023 • edited Loading

codecov bot commented May 15, 2023 • edited Loading

Codecov Report

rneher left a comment

Choose a reason for hiding this comment

corneliusroemer commented May 15, 2023

jameshadfield commented May 15, 2023

corneliusroemer commented May 15, 2023

jameshadfield commented May 15, 2023

corneliusroemer commented May 15, 2023 •

edited

Loading

rneher commented May 15, 2023 •

edited

Loading

codecov bot commented May 15, 2023 •

edited

Loading