Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tests): Fix false negative cram tests #1666

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

corneliusroemer
Copy link
Member

@corneliusroemer corneliusroemer commented Nov 9, 2024

Description of proposed changes

  • Fix Missing escape of square brackets causes diff_jsons.py in some cram tests to be false negative #1665 by properly escaping square brackets
  • Add a warning to diff_jsons.py when exclude-regex-path is used incorrectly like in issue 1665 to prevent this issue from reoccurring silently
  • Broaden default exclude-paths to now be ["root['generated_by']", "root['meta']['updated']"] instead of ["root['generated_by']['version']"]. This allows simplification of a lot of invocations of diff_jsons.py and in turn reduces chance of errors similar to issue 1665. Anything within generated_by can be ignored by cram tests, meta.updated is only used in export, but no harm excluding by default as it isn't used in any other tests. If there's opposition, happy to revert that change.
  • Make the tests pass that should have failed but weren't failing due to issue 1665, by updating the test files. @jameshadfield or @huddlej should check that the behaviour is as intended, but the changes seems small and reasonable so I'm not too worried. Update: @jameshadfield confirmed that the tests didn't mask any unintended behavior.

Checklist

  • Automated checks pass
  • Check if you need to add a changelog message
  • Check if you need to add tests
  • Check if you need to update docs

@corneliusroemer corneliusroemer marked this pull request as ready for review November 9, 2024 20:40
Copy link
Member

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using --exclude-paths over --exclude-regex-paths (7c99d29) and for the added defaults (21d2227, bd8d892).

I haven't into validity of the fixed false negatives, but the fixes to the diff script make them seem reasonable.

Comment on lines +25 to +26
# Test for most fatal errors in regex path usage
# Exclude regexes should never match `'`, otherwise the diff is always going to pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include " as well? I'm assuming root['generated_by'] and root["generated_by"] are both valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the path that deep diff uses to match against uses a single quote not double quote. Hence excluding only single quote. Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested with both ['seqid'] and ["seqid"]:

   $ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" "$TESTDIR/../data/clades.json" clades_no-labels.json --exclude-regex-paths "['seqid']"
-  {'dictionary_item_removed': [root['branches']]}
+  Traceback (most recent call last):
+    File "/Users/vlin/Dropbox/repos/nextstrain/augur/tests/functional/clades/cram/../../../../scripts/diff_jsons.py", line 30, in <module>
+      raise Exception(
+  Exception: Exclude regex ['seqid'] matches `'` which means this diff will always pass which is probably not what you want.
+  You probably forgot to escape something in your regex. See for example: https://stackoverflow.com/a/79173188/7483211
+  [1]
   $ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" "$TESTDIR/../data/clades.json" clades_no-labels.json --exclude-regex-paths '["seqid"]'
-  {'dictionary_item_removed': [root['branches']]}
+  {}

It looks like " should get the same treatment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, there's a bunch more things that should not not be allowed to be matched, things like [ and ] and any single letter. But in a way it's not super clearcut as someone might want to exclude everything but a single path (they should use an include path instead then) - so I don't know how far we want to go here. Including " sounds good though, no harm I can see.

scripts/diff_jsons.py Outdated Show resolved Hide resolved
parser.add_argument("--exclude-regex-paths", nargs="+", action="extend", help="list of path regular expressions to exclude from consideration when performing a diff")
parser.add_argument("--ignore-numeric-type-changes", action="store_true", help="ignore numeric type changes in the diff (e.g., int of 1 to float of 1.0)")

args = parser.parse_args()

# Test for most fatal errors in regex path usage
# Exclude regexes should never match `'`, otherwise the diff is always going to pass
for regex in args.exclude_regex_paths or []:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Adding or [] (e596b50) doesn't seem useful:

  • We don't run mypy/pyright on this file
  • A better solution would be to set default=[] on add_argument()

Copy link
Member Author

@corneliusroemer corneliusroemer Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because we don't run mypy doesn't mean that we shouldn't prevent type errors from happening? The error actually happens irrespective of running mypy. It's not mypy erroring it's a real runtime error.

Why is changing the defaults better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is changing the defaults better?

Setting the default up top allows for a valid assumption that args.exclude_regex_paths is always iterable. Otherwise we would have to do another args.exclude_regex_paths or [] if using it as an iterable somewhere else in the code for whatever reason.

Co-authored-by: Victor Lin <13424970+victorlin@users.noreply.github.com>
Copy link

codecov bot commented Nov 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.29%. Comparing base (29188d4) to head (27d18b4).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1666      +/-   ##
==========================================
+ Coverage   72.26%   72.29%   +0.02%     
==========================================
  Files          79       79              
  Lines        8272     8276       +4     
  Branches     1691     1691              
==========================================
+ Hits         5978     5983       +5     
+ Misses       2009     2008       -1     
  Partials      285      285              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@corneliusroemer
Copy link
Member Author

This might be worth an internal changelog entry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing escape of square brackets causes diff_jsons.py in some cram tests to be false negative
3 participants