-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experimental FilterFuncotations tool #4991
Add experimental FilterFuncotations tool #4991
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danxmoran Nothing huge, but a bunch of little things.
if (funcotationHeaderLine != null) { | ||
funcotationKeys = FuncotatorUtils.extractFuncotatorKeysFromHeaderDescription(funcotationHeaderLine.getDescription()); | ||
outputVcfWriter = createVCFWriter(outputFile); | ||
vcfHeader.addMetaDataLine(new VCFFilterHeaderLine(NOT_CLINSIG_FILTER, "Filter for clinically insignificant variants.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make a new constants class for the description and other header info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} | ||
|
||
abstract class FuncotationFiltrationRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Classes should go in a separate file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and separate/sub-package)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
static final String ONE_LINE_SUMMARY = "Filter variants based on clinically-significant Funcotations."; | ||
static final String SUMMARY = ONE_LINE_SUMMARY + | ||
" Proof-of-concept hard-coded to look for specific Funcotations from ClinVar, ExAC, and LMM."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should cite Laboratory for Molecular Medicine (LMM) (http://personalizedmedicine.partners.org/laboratory-for-molecular-medicine/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And cite the others as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a doc line that this tool is an example of parsing and using the VCF output of Funcotator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
}); | ||
}); | ||
|
||
String clinicalSignificance = matchingFilters.isEmpty() ? "NONE" : String.join(",", matchingFilters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
AFR, AMR, EAS, FIN, NFE, OTH, SAS | ||
} | ||
|
||
private static Logger logger = LogManager.getLogger(ClinVarFilter.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger should be FuncotationFiltrationRule.class
, right? Or you should create a new one in each subclass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was copy-paste typo, but removed now
return result; | ||
} | ||
|
||
double getMaxMinorAlleleFreq(final Map<String, String> funcotations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public? private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved, now private
} | ||
|
||
@Override | ||
public void apply(VariantContext variant, ReadsContext readsContext, ReferenceContext referenceContext, FeatureContext featureContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameters should be final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
outputVcfWriter.add(applyFilters(variant)); | ||
} | ||
|
||
private VariantContext applyFilters(VariantContext variant) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameter should be final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
this.ruleName = ruleName; | ||
} | ||
|
||
boolean checkRule(final VariantContext variant,final Map<String, String> prunedTranscriptFuncotations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add space after comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
final List<FuncotationFiltrationRule> lofFiltrationRules = new ArrayList<>(); | ||
// 1) 1) Variant classification is FRAME_SHIFT_*, NONSENSE, START_CODON_DEL, and SPLICE_SITE | ||
// (within 2 bases on either side of exon or intron) on any transcript. | ||
// TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the TODO? Can this be removed or addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover, removed
cd1dfc0
to
bacc647
Compare
@LeeTL1220 I think I hit all your comments. While moving stuff around I massaged the abstract rule class into a SAM type, which I think cleaned things up pretty well. I added a bunch of javadoc too. |
/** | ||
* Check if a set of Funcotations matches this rule. | ||
*/ | ||
boolean checkRule(final Map<String, String> funcotations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically Predicate<Map<String, String>>
now... worth refactoring to just use that type, or is the custom naming helpful for understanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I say leave it in for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danxmoran Apologies, I found a few more small items. Generally, we want all public methods to have complete javadocs, including the parameters (and whether null is accepted, for example). And you can use Utils.nonNull(...) to enforce some of the rules.
/** | ||
* Check all of this filter's rules against a set of Funcotations. | ||
* | ||
* @return true if the Funcotations match all of this filter's rules, and false otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java doc needs to be complete for a public method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* Build a {@link FuncotationFiltrationRule} matching Funcotations with a MAF less than | ||
* the given value across all sub-populations of ExAC. | ||
*/ | ||
public static FuncotationFiltrationRule buildExacMaxMafRule(final double maxMaf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please finish javadocs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/** | ||
* Variant classifications which should be matched by this filter. | ||
*/ | ||
private static final List<String> CONSTANT_LOF_CLASSIFICATIONS = Arrays.asList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tie these to the constants actually used by Funcotator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically you can use the serialized versions of the enums in GencodeFuncotation.VariantClassification
(except for START_CODON_DEL
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few relatively minor things in addition to @LeeTL1220 comments.
"\n * Exome Aggregation Consortium (ExAC) (http://exac.broadinstitute.org/)" + | ||
"\n * Laboratory for Molecular Medicine (LMM) (http://personalizedmedicine.partners.org/laboratory-for-molecular-medicine/)"; | ||
|
||
public enum ReferenceVersion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a javadoc for the public enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
public enum ReferenceVersion { | ||
hg19(19), hg38(27); | ||
|
||
private final int gencodeVersion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinning to an internal gencode version may not be the best idea - some reference versions don't have an official gencode release (like b37
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mapping doesn't have to be one-to-one, so reference versions without an official gencode release would point at whatever gencode version they do use in Funcotator
.
Bigger-picture, the only reason this is needed is to derive names for the gencode funcotations on each variant. hg19 vs. hg38 seemed more user-friendly than taking the raw gencode version (which is buried in the data source). Are there any utils for extracting out the gencode stuff from a FuncotationMap
which abstract over the version number? I could find code for building/writing, but not reading/parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danxmoran Ah, I see.
I don't think there are any utilities for that right now.
In the mean time, can you add an entry in here for the b37
reference? It would correspond to Gencode 19 as well.
|
||
private VariantContextWriter outputVcfWriter; | ||
private String[] funcotationKeys; | ||
private List<FuncotationFilter> funcotationFilters = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make this final
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Mark the individual filters that make the variant significant. | ||
final String clinicalSignificance = matchingFilters.isEmpty() ? | ||
FilterFuncotationsConstants.CLINSIG_INFO_NOT_SIGNIFICANT : | ||
String.join(",", matchingFilters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a constant in the VcfOutputRenderer
that you can use instead of hard-coding the comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There wasn't one that I could find... should I create one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danxmoran Oops. My mistake. Yeah - I think adding one in FilterFuncotations
itself is sufficient.
final VariantContextBuilder variantContextBuilder = new VariantContextBuilder(variant); | ||
|
||
// Mark the individual filters that make the variant significant. | ||
final String clinicalSignificance = matchingFilters.isEmpty() ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this method you check matchingFilters.isEmpty
twice. It might simplify the code if you had one if statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked
/** | ||
* Check if a set of Funcotations matches this rule. | ||
*/ | ||
boolean checkRule(final Map<String, String> funcotations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I say leave it in for now.
/** | ||
* Variant classifications which should be matched by this filter. | ||
*/ | ||
private static final List<String> CONSTANT_LOF_CLASSIFICATIONS = Arrays.asList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically you can use the serialized versions of the enums in GencodeFuncotation.VariantClassification
(except for START_CODON_DEL
).
return classification.startsWith(FRAME_SHIFT_PREFIX) || CONSTANT_LOF_CLASSIFICATIONS.contains(classification); | ||
}, | ||
funcotations -> funcotations.getOrDefault(LOF_GENE_FUNCOTATION, "").equals("YES"), | ||
FilterFuncotationsExacUtils.buildExacMaxMafRule(0.01)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be worth pulling this constant out into a variable in case it changes down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
final Set<String> expectedFilters, | ||
final Set<String> expectedAnnotations) throws IOException { | ||
|
||
final Path tmpOut = Files.createTempFile(vcfName + ".filtered", ".vcf"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use BaseTest::createTempFile
for this - it'll automatically delete the file on exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@LeeTL1220 @jonn-smith I think I covered everything, back to you! |
Codecov Report
@@ Coverage Diff @@
## master #4991 +/- ##
===============================================
+ Coverage 86.654% 86.667% +0.012%
- Complexity 28997 29045 +48
===============================================
Files 1800 1808 +8
Lines 134531 134662 +131
Branches 14930 14935 +5
===============================================
+ Hits 116577 116707 +130
+ Misses 12546 12543 -3
- Partials 5408 5412 +4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danxmoran One more minor comment. Feel free to merge if Jonn is okay and my last comment is addressed.
I have not commented on how we could do less hardcoding/make this tool more versatile, since that is better handled in an issue. We can discuss whenever you like, here or in person.
* Build a {@link FuncotationFiltrationRule} matching Funcotations from variants with a | ||
* maximum MAF less than some threshold. | ||
* | ||
* @param maxMaf the MAF threshold to check in the rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any parameter checks that need to happen here? Such as >0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See ParamUtils
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I thought this was going to be accepted in my absence.
Three very minor changes. Good to merge after and when tests pass.
* Used to derive names of Gencode Funcotations. | ||
*/ | ||
public enum ReferenceVersion { | ||
hg19(19), hg38(27); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in an above response to a comment - can you add a b37
reference version here as well? It would correspond to Gencode version 19 as well.
VCFHeaderLineType.String, FilterFuncotationsConstants.CLINSIG_INFO_KEY_DESCRIPTION)); | ||
outputVcfWriter.writeHeader(vcfHeader); | ||
} else { | ||
throw new UserException.BadInput("Input VCF does not have Funcotator annotations."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention in this exception that it's an error in the header as opposed to the data itself?
|
||
// Mark the individual filters that make the variant significant, if any. | ||
final String clinicalSignificance = | ||
isSignificant ? String.join(",", matchingFilters) : FilterFuncotationsConstants.CLINSIG_INFO_NOT_SIGNIFICANT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in a comment above, can you extract the comma here into a constant in this class?
Co-authored-by: Jay Carey <jcarey@broadinstitute.org>
b967d91
to
ab78b93
Compare
The tool is currently hard-coded to work with funcotations from v3 of the AoU data source, but opening for early feedback / discussion of the best way to generalize things.