Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified ANDed read filter output message for readability #6315

Merged
merged 4 commits into from
Dec 20, 2019

Conversation

KevinCLydon
Copy link
Member

Added logic for generating a simplified read filter output in the case of multiple filters joined by AND

Fixes #3520

@KevinCLydon KevinCLydon requested a review from droazen December 12, 2019 13:32
@droazen droazen self-assigned this Dec 13, 2019
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to @KevinCLydon with some comments

{
String simplifiedSummary = getSummaryLineForLevelAllAndsSimplified();
if(!simplifiedSummary.isEmpty())
return simplifiedSummary;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always include curly braces around if statement bodies, even in the one-line case, to guard against hypothetical bugs when this method is modified in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

return "No reads filtered by: " + getName();
}
else if(indentLevel == 0)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be using a mix of curly brace styles here -- both styles are fine, but you should pick one and use it consistently. Most of us prefer to have the opening curly brace on the same line as the if, etc. statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bad habit of mine. I've corrected it here and I'll try to stick with same line in the future.

while(!unread.empty())
{
curFilter = unread.pop();
if(curFilter.getFilteredCount() > 0) {
Copy link
Contributor

@droazen droazen Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you should return empty String and fall back to the verbose/old reporting style if any OR filters are present in the tree at all, regardless of what their respective filter counts are -- so I recommend removing this check. You already check the filter count in your else if clause below when deciding whether to append a summary line for a node, so this higher-level check seems unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed

unread.push(((CountingAndReadFilter) curFilter).lhs);
unread.push(((CountingAndReadFilter) curFilter).rhs);
} else if (curFilter instanceof CountingBinopReadFilter) {
return "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about NOTs (CountingNegateReadFilter, which does not extend CountingBinopReadFilter)? Are you currently handling those correctly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to account for those. I've added a check for those so it will properly consider the filter not all ANDs if it finds a NOT

@@ -219,6 +220,52 @@ public boolean test(final GATKRead read) {
return accept;
}

@Override
protected String getSummaryLineForLevel(final int indentLevel) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit tests for the new CountingAndReadFilter.getSummaryLineForLevel() method. The test should have a DataProvider that constructs various trees of read filters and passes them into the @Test method along with an expected output String. Include at least the following test cases:

  • A tree with 2 filters joined by AND, both of which have a filter count > 0
  • A tree with 2 filters joined by AND, only one of which has a filter count > 0
  • A tree with 2 filters joined by AND, both with filter count == 0
  • A multi-level tree with > 2 filters joined by AND, and a mix of 0 and non-0 filter counts
  • A tree with an OR at a lower level below the AND
  • A tree with a NOT at a lower level below the AND

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@@ -219,6 +220,52 @@ public boolean test(final GATKRead read) {
return accept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you (very sensibly) chose to implement the simplified output logic in CountingAndReadFilter, could you also check the case of a single standalone read filter (ie., no AND operator present), and make sure that the output in the single-filter case is stylistically consistent with your new output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a couple tests with single filters and it looks to be consistent

@droazen droazen assigned KevinCLydon and unassigned droazen Dec 18, 2019
…D filter summary generation to correct minor quirks, and corrected some stylistic inconsistencies from previous commit
@KevinCLydon KevinCLydon assigned droazen and unassigned KevinCLydon Dec 19, 2019
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final comment/request @KevinCLydon, then this should be good to go

@Override
protected String getSummaryLineForLevel(final int indentLevel) {
if (0 == filteredCount) {
return "No reads filtered by: " + getName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where we're the root node (indentLevel == 0), and there are no reads filtered at all, I think we want to try to output a simplified summary as well. Eg., we want something like:

0 read(s) filtered by: Filter1
0 read(s) filtered by: Filter2
0 read(s) filtered by: Filter3
0 total reads filtered

I think you can achieve this by just swapping the order of the if and else if below, and removing the check on the filtered count in getSummaryLineForLevelAllAndsSimplified(). It's true that we would then output stats for filters that filter no reads, but I think it's actually useful for the user to see the complete list of filters, and to get reassurance that certain filters had no effect.

Once you do this, you'll need to update the expected output for some of your new tests (which look good!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If you do take my suggestion to swap the order of the checks, you'll need to remove the else in order to handle the case where simplified output can't be generated due to ORs/NOTs, and 0 total reads were filtered at the root level)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your suggestion. I just pushed a change that will include that. As part of making the change, I also made a minor change to the way summary output is printed for all types of CountingReadFilter to make it consistent (basically "0 read(s)" instead of "No reads"), and updated the existing test cases accordingly (after first forgetting to do so).

@droazen droazen assigned KevinCLydon and unassigned droazen Dec 20, 2019
@gatk-bot
Copy link

gatk-bot commented Dec 20, 2019

Travis reported job failures from build 28449
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 28449.13 logs
unit openjdk8 28449.3 logs

@KevinCLydon KevinCLydon assigned droazen and unassigned KevinCLydon Dec 20, 2019
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Latest version looks good @KevinCLydon -- merging!

@droazen droazen merged commit 946f395 into master Dec 20, 2019
@droazen droazen deleted the kl_read_filter_summary_output_readable_issue3520 branch December 20, 2019 22:59
@vdauwera
Copy link
Contributor

Thank you for getting this done!!

@lbergelson
Copy link
Member

First PR merged!?! Congrats! Welcome to the GATK :)

@KevinCLydon
Copy link
Member Author

Thanks!

lbergelson pushed a commit that referenced this pull request Jan 6, 2020
* Modified read filter output message generation to be simplified for when filters are joined using all ANDs

* Added unit tests for AND filter summary line, made some changes to AND filter summary generation to correct minor quirks, and corrected some stylistic inconsistencies

* Updated output of simplified AND read filters to also display filters that filtered 0 reads

* Updated test cases that were failing as a result of new output changes

Fixes #3520
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make ReadFilter summary output readable
5 participants