Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve VCF fields in MAF output #4872

Merged
merged 7 commits into from
Jun 15, 2018
Merged

Conversation

LeeTL1220
Copy link
Contributor

@LeeTL1220 LeeTL1220 commented Jun 8, 2018

@codecov-io
Copy link

codecov-io commented Jun 11, 2018

Codecov Report

Merging #4872 into master will increase coverage by 0.106%.
The diff coverage is 58.929%.

@@               Coverage Diff               @@
##              master     #4872       +/-   ##
===============================================
+ Coverage     80.453%   80.559%   +0.106%     
- Complexity     17815     18060      +245     
===============================================
  Files           1089      1092        +3     
  Lines          64168     64761      +593     
  Branches       10338     10491      +153     
===============================================
+ Hits           51625     52171      +546     
- Misses          8486      8513       +27     
- Partials        4057      4077       +20
Impacted Files Coverage Δ Complexity Δ
...tils/annotatedinterval/AnnotatedIntervalCodec.java 79.31% <ø> (ø) 18 <0> (ø) ⬇️
...itute/hellbender/tools/funcotator/Funcotation.java 33.333% <ø> (ø) 3 <0> (ø) ⬇️
...ataSources/xsv/SimpleKeyXsvFuncotationFactory.java 87.097% <100%> (ø) 27 <0> (ø) ⬇️
...dataSources/gencode/GencodeFuncotationBuilder.java 96.774% <100%> (+0.053%) 30 <1> (ø) ⬇️
...dataSources/gencode/GencodeFuncotationFactory.java 83.634% <100%> (+0.025%) 164 <5> (+1) ⬆️
...ncotator/mafOutput/MafOutputRendererConstants.java 98.98% <100%> (ø) 1 <0> (ø) ⬇️
...ataSources/xsv/LocatableXsvFuncotationFactory.java 85.185% <100%> (ø) 24 <0> (ø) ⬇️
...te/hellbender/tools/funcotator/FuncotationMap.java 84.211% <100%> (ø) 28 <0> (ø) ⬇️
.../tools/funcotator/vcfOutput/VcfOutputRenderer.java 85.714% <100%> (+0.468%) 16 <4> (+4) ⬆️
...ava/org/broadinstitute/hellbender/utils/Utils.java 80.476% <100%> (+0.235%) 145 <3> (+3) ⬆️
... and 24 more

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some relatively minor comments and a few questions.

@@ -250,57 +252,57 @@
optional = true,
doc = "Ignore/drop variants that have been filtered in the input. These variants will not appear in the output file."
)
protected boolean removeFilteredVariants = false;
private boolean removeFilteredVariants = false;

@Argument(
fullName = FuncotatorArgumentDefinitions.TRANSCRIPT_SELECTION_MODE_LONG_NAME,
optional = true,
doc = "Method of detailed transcript selection. This will select the transcript for detailed annotation (CANONICAL or BEST_EFFECT)."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the doc here to include the new ALL option?

@@ -228,19 +230,19 @@
fullName = FuncotatorArgumentDefinitions.REFERENCE_VERSION_LONG_NAME,
doc = "The version of the Human Genome reference to use (e.g. hg19, hg38, etc.). This will correspond to a sub-folder of each data source corresponding to that data source for the given reference."
)
protected String referenceVersion;
private String referenceVersion;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change to private for the input args? (I have no objection.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of IntelliJ warnings.

No action.

@@ -2022,7 +2024,7 @@ public TranscriptCodingSequenceException( final String msg, final Throwable thro
*/
public static String sanitizeFuncotationForVcf(final String individualFuncotation) {
Utils.nonNull(individualFuncotation);
return StringUtils.replaceEach(individualFuncotation, new String[]{",", ";", "=", "\t", "|"}, new String[]{"_%2C_", "_%3B_", "_%3D_", "_%09_", "_%7C_"});
return StringUtils.replaceEach(individualFuncotation, new String[]{",", ";", "=", "\t", "|", " "}, new String[]{"_%2C_", "_%3B_", "_%3D_", "_%09_", "_%7C_", "_%20_"});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the VCF files are TSVs, do we need to escape a space in the values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting errors when I left it in...

No action.

@@ -2075,5 +2077,43 @@ public static boolean isGencodeFuncotation(final Funcotation f) {
public static boolean areAnyGencodeFuncotation(final List<Funcotation> funcotations) {
return funcotations.stream().anyMatch(FuncotatorUtils::isGencodeFuncotation);
}

/**
* Create funcotations (one for each alt allele) corresponding to the given variant context.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this more descriptive? I understand what this is doing but the documentation here is pretty sparse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -29,6 +31,8 @@ public GencodeFuncotationBuilder() {
}

public GencodeFuncotation build() {
// TODO: File an issue to give mechanism for populating the metadata (https://github.com/broadinstitute/gatk/issues/4857)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment can be updated since issue now exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

@VisibleForTesting
static String determineFinalFieldName(final String funcotationFactoryName, final String fieldName) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could even be shorter if named createFinalFieldName.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* @return Immutable copy of the output field map used by this MafOutputRenderer.
*/
public ImmutableMap<String, List<String>> getOutputFieldNameMap() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

Done

public final static String UNKNOWN_DESCRIPTION = "Unknown";

/**
* @param fieldNames Never {@code null}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you complete the Javadoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and added null check.

import java.util.function.Function;
import java.util.stream.Collectors;

public class VcfFuncotationMetadata implements FuncotationMetadata {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add Javadoc for the class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1367,7 +1369,16 @@ void testCreateFuncotations(final String expectedGeneName,
genomeChangeCorrect && strandCorrect && cDnaChangeCorrect && codonChangeCorrect && proteinChangeCorrect,
errorMessageStringBuilder.toString() + "\n"
);


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refactor these test cases to be similar to the test cases above (checks with individual error messages and one assert at the end)? That way if there is any regression it is clear how many and which of the checks fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I kept up with the spirit of this comment, but I did implement it slightly different.

@jonn-smith

Done, unless I misinterpreted.

@jonn-smith jonn-smith assigned LeeTL1220 and unassigned jonn-smith Jun 12, 2018
@jonn-smith
Copy link
Collaborator

Back to you, @LeeTL1220

…endering. This is very useful for internal consistency.

Made TableFuncotation constructors private, since the logic may get more complex.
Putting unknown metadata into TableFuncotations.
Populating metadata for Gencode Funcotations (unknown) and adding a test.
Added space to the disallowed characters to be sanitized, since it was causing other parsing code in the GATK to fail.
Fixed regression where test was looking for a space, which is disallowed.
Fixed issue where an extra field was being rendered if afuncotation factory had no annotations.
Adding test for FuncotationMetadata generated from VCFs.
Addressed testing of other fields in the metadata from a VcfFuncotationFactory
Added a reverse map utility and added it to MafOutputRenderer so that we can get aliasing from unit tests.
@LeeTL1220 LeeTL1220 force-pushed the ll_VCF_MAF_field_preservation branch from de2200e to ade975d Compare June 13, 2018 22:11
@LeeTL1220
Copy link
Contributor Author

LeeTL1220 commented Jun 13, 2018

@jonn-smith Back to you.... Tell me when/if I can merge.

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeeTL1220 looks good. Merge away when tests are good.

@LeeTL1220
Copy link
Contributor Author

@jonn-smith I'll fix the tests and notify you.

…t had "END" column which was throwing off MAF processing.
@LeeTL1220
Copy link
Contributor Author

@jonn-smith Assuming the tests pass, please review latest changes. If the tests have failed, I will see it tomorrow and address.

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change then I think it looks good.

@@ -1934,7 +1934,7 @@ private GencodeFuncotation createFuncotationForSpanningDeletion(final VariantCon
* @param refAllele The reference {@link Allele} for this variant.
* @param altAllele The alternate {@link Allele} for this variant.
* @return A {@link GencodeFuncotation.VariantType} representing the variation type between the given reference and alternate {@link Allele}.
* Spanning deletions and no calls will get a type of {@link org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotation.VariantType.NA}
* Spanning deletions and no calls will get a type of NA
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked having the actual link to the NA VariantType here.

@LeeTL1220
Copy link
Contributor Author

Offline...Was given permission to merge if tests pass.

@LeeTL1220 LeeTL1220 merged commit 41788f5 into master Jun 15, 2018
@LeeTL1220 LeeTL1220 deleted the ll_VCF_MAF_field_preservation branch June 15, 2018 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Funcotator - Must preserve other VCF fields / annotations in both MAF and VCF
3 participants