-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve VCF fields in MAF output #4872
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4872 +/- ##
===============================================
+ Coverage 80.453% 80.559% +0.106%
- Complexity 17815 18060 +245
===============================================
Files 1089 1092 +3
Lines 64168 64761 +593
Branches 10338 10491 +153
===============================================
+ Hits 51625 52171 +546
- Misses 8486 8513 +27
- Partials 4057 4077 +20
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some relatively minor comments and a few questions.
@@ -250,57 +252,57 @@ | |||
optional = true, | |||
doc = "Ignore/drop variants that have been filtered in the input. These variants will not appear in the output file." | |||
) | |||
protected boolean removeFilteredVariants = false; | |||
private boolean removeFilteredVariants = false; | |||
|
|||
@Argument( | |||
fullName = FuncotatorArgumentDefinitions.TRANSCRIPT_SELECTION_MODE_LONG_NAME, | |||
optional = true, | |||
doc = "Method of detailed transcript selection. This will select the transcript for detailed annotation (CANONICAL or BEST_EFFECT)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the doc here to include the new ALL
option?
@@ -228,19 +230,19 @@ | |||
fullName = FuncotatorArgumentDefinitions.REFERENCE_VERSION_LONG_NAME, | |||
doc = "The version of the Human Genome reference to use (e.g. hg19, hg38, etc.). This will correspond to a sub-folder of each data source corresponding to that data source for the given reference." | |||
) | |||
protected String referenceVersion; | |||
private String referenceVersion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change to private
for the input args? (I have no objection.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of IntelliJ warnings.
No action.
@@ -2022,7 +2024,7 @@ public TranscriptCodingSequenceException( final String msg, final Throwable thro | |||
*/ | |||
public static String sanitizeFuncotationForVcf(final String individualFuncotation) { | |||
Utils.nonNull(individualFuncotation); | |||
return StringUtils.replaceEach(individualFuncotation, new String[]{",", ";", "=", "\t", "|"}, new String[]{"_%2C_", "_%3B_", "_%3D_", "_%09_", "_%7C_"}); | |||
return StringUtils.replaceEach(individualFuncotation, new String[]{",", ";", "=", "\t", "|", " "}, new String[]{"_%2C_", "_%3B_", "_%3D_", "_%09_", "_%7C_", "_%20_"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the VCF files are TSVs, do we need to escape a space in the values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting errors when I left it in...
No action.
@@ -2075,5 +2077,43 @@ public static boolean isGencodeFuncotation(final Funcotation f) { | |||
public static boolean areAnyGencodeFuncotation(final List<Funcotation> funcotations) { | |||
return funcotations.stream().anyMatch(FuncotatorUtils::isGencodeFuncotation); | |||
} | |||
|
|||
/** | |||
* Create funcotations (one for each alt allele) corresponding to the given variant context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this more descriptive? I understand what this is doing but the documentation here is pretty sparse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -29,6 +31,8 @@ public GencodeFuncotationBuilder() { | |||
} | |||
|
|||
public GencodeFuncotation build() { | |||
// TODO: File an issue to give mechanism for populating the metadata (https://github.com/broadinstitute/gatk/issues/4857) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment can be updated since issue now exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} | ||
|
||
@VisibleForTesting | ||
static String determineFinalFieldName(final String funcotationFactoryName, final String fieldName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could even be shorter if named createFinalFieldName
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/** | ||
* @return Immutable copy of the output field map used by this MafOutputRenderer. | ||
*/ | ||
public ImmutableMap<String, List<String>> getOutputFieldNameMap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
Done
public final static String UNKNOWN_DESCRIPTION = "Unknown"; | ||
|
||
/** | ||
* @param fieldNames Never {@code null} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you complete the Javadoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and added null check.
import java.util.function.Function; | ||
import java.util.stream.Collectors; | ||
|
||
public class VcfFuncotationMetadata implements FuncotationMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add Javadoc for the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -1367,7 +1369,16 @@ void testCreateFuncotations(final String expectedGeneName, | |||
genomeChangeCorrect && strandCorrect && cDnaChangeCorrect && codonChangeCorrect && proteinChangeCorrect, | |||
errorMessageStringBuilder.toString() + "\n" | |||
); | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you refactor these test cases to be similar to the test cases above (checks with individual error messages and one assert at the end)? That way if there is any regression it is clear how many and which of the checks fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I kept up with the spirit of this comment, but I did implement it slightly different.
Done, unless I misinterpreted.
Back to you, @LeeTL1220 |
…endering. This is very useful for internal consistency. Made TableFuncotation constructors private, since the logic may get more complex. Putting unknown metadata into TableFuncotations. Populating metadata for Gencode Funcotations (unknown) and adding a test. Added space to the disallowed characters to be sanitized, since it was causing other parsing code in the GATK to fail. Fixed regression where test was looking for a space, which is disallowed. Fixed issue where an extra field was being rendered if afuncotation factory had no annotations. Adding test for FuncotationMetadata generated from VCFs. Addressed testing of other fields in the metadata from a VcfFuncotationFactory Added a reverse map utility and added it to MafOutputRenderer so that we can get aliasing from unit tests.
de2200e
to
ade975d
Compare
@jonn-smith Back to you.... Tell me when/if I can merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeeTL1220 looks good. Merge away when tests are good.
@jonn-smith I'll fix the tests and notify you. |
…t had "END" column which was throwing off MAF processing.
@jonn-smith Assuming the tests pass, please review latest changes. If the tests have failed, I will see it tomorrow and address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor change then I think it looks good.
@@ -1934,7 +1934,7 @@ private GencodeFuncotation createFuncotationForSpanningDeletion(final VariantCon | |||
* @param refAllele The reference {@link Allele} for this variant. | |||
* @param altAllele The alternate {@link Allele} for this variant. | |||
* @return A {@link GencodeFuncotation.VariantType} representing the variation type between the given reference and alternate {@link Allele}. | |||
* Spanning deletions and no calls will get a type of {@link org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotation.VariantType.NA} | |||
* Spanning deletions and no calls will get a type of NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked having the actual link to the NA
VariantType
here.
Offline...Was given permission to merge if tests pass. |
FuncotationMetadata
class. An instance of this will appear as an attribute on all Funcotations. This describes the fields in the Funcotation. This is also useful for when you need to know all possible field names in formats where these may not be specified in every variant (e.g. VCF)FuncotationMetadata
only supports INFO fields in a VCF.tumor_f
was being mapped to an incorrect field. This has been removed, but not fixed. See Funcotator - Must implement t_alt_count / t_ref_count, alt_count/ref_count #4634 and Does FuncotationMetadata need to support FORMAT fields in a VCF input/datasource? #4871VcfOutputRenderer
ignore VCF input funcotations.