Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental FilterFuncotations tool #4991

Merged
merged 8 commits into from
Aug 17, 2018

Conversation

danxmoran
Copy link
Contributor

The tool is currently hard-coded to work with funcotations from v3 of the AoU data source, but opening for early feedback / discussion of the best way to generalize things.

Copy link
Contributor

@LeeTL1220 LeeTL1220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danxmoran Nothing huge, but a bunch of little things.

if (funcotationHeaderLine != null) {
funcotationKeys = FuncotatorUtils.extractFuncotatorKeysFromHeaderDescription(funcotationHeaderLine.getDescription());
outputVcfWriter = createVCFWriter(outputFile);
vcfHeader.addMetaDataLine(new VCFFilterHeaderLine(NOT_CLINSIG_FILTER, "Filter for clinically insignificant variants."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a new constants class for the description and other header info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

abstract class FuncotationFiltrationRule {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classes should go in a separate file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and separate/sub-package)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


static final String ONE_LINE_SUMMARY = "Filter variants based on clinically-significant Funcotations.";
static final String SUMMARY = ONE_LINE_SUMMARY +
" Proof-of-concept hard-coded to look for specific Funcotations from ClinVar, ExAC, and LMM.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cite Laboratory for Molecular Medicine (LMM) (http://personalizedmedicine.partners.org/laboratory-for-molecular-medicine/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And cite the others as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a doc line that this tool is an example of parsing and using the VCF output of Funcotator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

});
});

String clinicalSignificance = matchingFilters.isEmpty() ? "NONE" : String.join(",", matchingFilters);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

AFR, AMR, EAS, FIN, NFE, OTH, SAS
}

private static Logger logger = LogManager.getLogger(ClinVarFilter.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger should be FuncotationFiltrationRule.class, right? Or you should create a new one in each subclass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was copy-paste typo, but removed now

return result;
}

double getMaxMinorAlleleFreq(final Map<String, String> funcotations) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public? private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, now private

}

@Override
public void apply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameters should be final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

outputVcfWriter.add(applyFilters(variant));
}

private VariantContext applyFilters(VariantContext variant) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameter should be final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

this.ruleName = ruleName;
}

boolean checkRule(final VariantContext variant,final Map<String, String> prunedTranscriptFuncotations) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space after comma

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

final List<FuncotationFiltrationRule> lofFiltrationRules = new ArrayList<>();
// 1) 1) Variant classification is FRAME_SHIFT_*, NONSENSE, START_CODON_DEL, and SPLICE_SITE
// (within 2 bases on either side of exon or intron) on any transcript.
// TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the TODO? Can this be removed or addressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover, removed

@danxmoran danxmoran force-pushed the dm-filter-funcotations-merge-back-DSDEGP-2550 branch from cd1dfc0 to bacc647 Compare July 18, 2018 21:35
@danxmoran
Copy link
Contributor Author

@LeeTL1220 I think I hit all your comments. While moving stuff around I massaged the abstract rule class into a SAM type, which I think cleaned things up pretty well. I added a bunch of javadoc too.

/**
* Check if a set of Funcotations matches this rule.
*/
boolean checkRule(final Map<String, String> funcotations);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically Predicate<Map<String, String>> now... worth refactoring to just use that type, or is the custom naming helpful for understanding?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I say leave it in for now.

Copy link
Contributor

@LeeTL1220 LeeTL1220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danxmoran Apologies, I found a few more small items. Generally, we want all public methods to have complete javadocs, including the parameters (and whether null is accepted, for example). And you can use Utils.nonNull(...) to enforce some of the rules.

/**
* Check all of this filter's rules against a set of Funcotations.
*
* @return true if the Funcotations match all of this filter's rules, and false otherwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java doc needs to be complete for a public method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Build a {@link FuncotationFiltrationRule} matching Funcotations with a MAF less than
* the given value across all sub-populations of ExAC.
*/
public static FuncotationFiltrationRule buildExacMaxMafRule(final double maxMaf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please finish javadocs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* Variant classifications which should be matched by this filter.
*/
private static final List<String> CONSTANT_LOF_CLASSIFICATIONS = Arrays.asList(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tie these to the constants actually used by Funcotator?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically you can use the serialized versions of the enums in GencodeFuncotation.VariantClassification (except for START_CODON_DEL).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few relatively minor things in addition to @LeeTL1220 comments.

"\n * Exome Aggregation Consortium (ExAC) (http://exac.broadinstitute.org/)" +
"\n * Laboratory for Molecular Medicine (LMM) (http://personalizedmedicine.partners.org/laboratory-for-molecular-medicine/)";

public enum ReferenceVersion {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a javadoc for the public enum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public enum ReferenceVersion {
hg19(19), hg38(27);

private final int gencodeVersion;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinning to an internal gencode version may not be the best idea - some reference versions don't have an official gencode release (like b37).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping doesn't have to be one-to-one, so reference versions without an official gencode release would point at whatever gencode version they do use in Funcotator.

Bigger-picture, the only reason this is needed is to derive names for the gencode funcotations on each variant. hg19 vs. hg38 seemed more user-friendly than taking the raw gencode version (which is buried in the data source). Are there any utils for extracting out the gencode stuff from a FuncotationMap which abstract over the version number? I could find code for building/writing, but not reading/parsing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danxmoran Ah, I see.

I don't think there are any utilities for that right now.

In the mean time, can you add an entry in here for the b37 reference? It would correspond to Gencode 19 as well.


private VariantContextWriter outputVcfWriter;
private String[] funcotationKeys;
private List<FuncotationFilter> funcotationFilters = new ArrayList<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this final.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Mark the individual filters that make the variant significant.
final String clinicalSignificance = matchingFilters.isEmpty() ?
FilterFuncotationsConstants.CLINSIG_INFO_NOT_SIGNIFICANT :
String.join(",", matchingFilters);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a constant in the VcfOutputRenderer that you can use instead of hard-coding the comma.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There wasn't one that I could find... should I create one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danxmoran Oops. My mistake. Yeah - I think adding one in FilterFuncotations itself is sufficient.

final VariantContextBuilder variantContextBuilder = new VariantContextBuilder(variant);

// Mark the individual filters that make the variant significant.
final String clinicalSignificance = matchingFilters.isEmpty() ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method you check matchingFilters.isEmpty twice. It might simplify the code if you had one if statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked

/**
* Check if a set of Funcotations matches this rule.
*/
boolean checkRule(final Map<String, String> funcotations);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I say leave it in for now.

/**
* Variant classifications which should be matched by this filter.
*/
private static final List<String> CONSTANT_LOF_CLASSIFICATIONS = Arrays.asList(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically you can use the serialized versions of the enums in GencodeFuncotation.VariantClassification (except for START_CODON_DEL).

return classification.startsWith(FRAME_SHIFT_PREFIX) || CONSTANT_LOF_CLASSIFICATIONS.contains(classification);
},
funcotations -> funcotations.getOrDefault(LOF_GENE_FUNCOTATION, "").equals("YES"),
FilterFuncotationsExacUtils.buildExacMaxMafRule(0.01));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth pulling this constant out into a variable in case it changes down the road.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

final Set<String> expectedFilters,
final Set<String> expectedAnnotations) throws IOException {

final Path tmpOut = Files.createTempFile(vcfName + ".filtered", ".vcf");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use BaseTest::createTempFile for this - it'll automatically delete the file on exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@danxmoran
Copy link
Contributor Author

@LeeTL1220 @jonn-smith I think I covered everything, back to you!

@codecov-io
Copy link

codecov-io commented Jul 23, 2018

Codecov Report

Merging #4991 into master will increase coverage by 0.012%.
The diff coverage is 93.893%.

@@               Coverage Diff               @@
##              master     #4991       +/-   ##
===============================================
+ Coverage     86.654%   86.667%   +0.012%     
- Complexity     28997     29045       +48     
===============================================
  Files           1800      1808        +8     
  Lines         134531    134662      +131     
  Branches       14930     14935        +5     
===============================================
+ Hits          116577    116707      +130     
+ Misses         12546     12543        -3     
- Partials        5408      5412        +4
Impacted Files Coverage Δ Complexity Δ
.../tools/funcotator/FilterFuncotationsConstants.java 0% <0%> (ø) 0 <0> (?)
.../funcotator/filtrationRules/FuncotationFilter.java 100% <100%> (ø) 4 <4> (?)
...ools/funcotator/filtrationRules/ClinVarFilter.java 100% <100%> (ø) 5 <5> (?)
...er/tools/funcotator/filtrationRules/LmmFilter.java 100% <100%> (ø) 2 <2> (?)
...er/tools/funcotator/filtrationRules/LofFilter.java 100% <100%> (ø) 5 <5> (?)
.../funcotator/FilterFuncotationsIntegrationTest.java 100% <100%> (ø) 5 <5> (?)
...r/filtrationRules/FilterFuncotationsExacUtils.java 82.353% <82.353%> (ø) 7 <7> (?)
...ellbender/tools/funcotator/FilterFuncotations.java 93.103% <93.103%> (ø) 18 <18> (?)
... and 8 more

Copy link
Contributor

@LeeTL1220 LeeTL1220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danxmoran One more minor comment. Feel free to merge if Jonn is okay and my last comment is addressed.

I have not commented on how we could do less hardcoding/make this tool more versatile, since that is better handled in an issue. We can discuss whenever you like, here or in person.

* Build a {@link FuncotationFiltrationRule} matching Funcotations from variants with a
* maximum MAF less than some threshold.
*
* @param maxMaf the MAF threshold to check in the rule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any parameter checks that need to happen here? Such as >0 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See ParamUtils ...

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I thought this was going to be accepted in my absence.

Three very minor changes. Good to merge after and when tests pass.

* Used to derive names of Gencode Funcotations.
*/
public enum ReferenceVersion {
hg19(19), hg38(27);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in an above response to a comment - can you add a b37 reference version here as well? It would correspond to Gencode version 19 as well.

VCFHeaderLineType.String, FilterFuncotationsConstants.CLINSIG_INFO_KEY_DESCRIPTION));
outputVcfWriter.writeHeader(vcfHeader);
} else {
throw new UserException.BadInput("Input VCF does not have Funcotator annotations.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention in this exception that it's an error in the header as opposed to the data itself?


// Mark the individual filters that make the variant significant, if any.
final String clinicalSignificance =
isSignificant ? String.join(",", matchingFilters) : FilterFuncotationsConstants.CLINSIG_INFO_NOT_SIGNIFICANT;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in a comment above, can you extract the comma here into a constant in this class?

@jonn-smith jonn-smith assigned danxmoran and unassigned jonn-smith Aug 17, 2018
@danxmoran danxmoran force-pushed the dm-filter-funcotations-merge-back-DSDEGP-2550 branch from b967d91 to ab78b93 Compare August 17, 2018 16:24
@danxmoran danxmoran merged commit 12104fc into master Aug 17, 2018
@danxmoran danxmoran deleted the dm-filter-funcotations-merge-back-DSDEGP-2550 branch August 17, 2018 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants