Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes parsing variant annotations for multi-allelic rows #1346

Conversation

majkiw
Copy link

@majkiw majkiw commented Jan 10, 2017

When I tried to run the code in master I had java.lang.ArrayIndexOutOfBoundsException exceptions in org.bdgenomics.adam.converters.VariantContextConverter#fromArrayExtractor whenever multi-allelic row was encountered.
It turned out the index was off by 1 when accessing multivalued annotations.

For example in gvcf_dir/gvcf_multiallelic.g.vcf there is a line:
chr22 18030096 . TAAA T,TA,TAA,<NON_REF> 564.73 . BaseQRankSum=-0.133;ClippingRankSum=-1.438;DP=114;MLEAC=0,1,1,0;MLEAF=0.00,0.500,0.500,0.00;MQ=69.72;MQ0=0;MQRankSum=-0.686;ReadPosRankSum=-0.013 GT:AD:DP:GQ:PL 2/3:13,3,17,17,0:50:86:602,508,1628,86,678,553,137,342,0,281,467,744,353,309,659

ALT is T,TA,TAA,<NON_REF> with alt allele indexes (as returned by vc.getAlleleIndex(allele)): 1,2,3,4
When accessing annotation MLEAC=0,1,1,0 it would load indexes 1,2,3 for T,TA,TAA resulting in 1,1,0.
However there is never a value for reference allele here! So instead it should be loading indexes 0,1,2 resulting in values 0,1,1.

Incidentally test using gvcf_dir/gvcf_multiallelic.g.vcf wasn't breaking for you because there was always extra 0 as the last value corresponding to <NON_REF> - which was effecting in silent shift of values.
This becomes much easier to break in "real" VCF where there is no <NON_REF> anymore.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@heuermh
Copy link
Member

heuermh commented Jan 10, 2017

Jenkins, test this please

@heuermh
Copy link
Member

heuermh commented Jan 10, 2017

Thanks for reporting this, @majkiw!

The test looks good; give me a bit to review the fix, to make sure there aren't other cases where we're off. That's probably the only method in VariantContextConverter that wasn't rewritten in the refactor.

If you could extract a test file from a non-gVCF with multiple alts, that would be helpful. Additionally we should probably craft one with some pathological cases.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1726/
Test PASSed.

@heuermh heuermh modified the milestones: 0.21.1, 0.22.0 Jan 10, 2017
@fnothaft
Copy link
Member

Thanks for the bugfix, @majkiw! @heuermh, I ran out of time today to make a review pass but this is first in line on my docket tomorrow.

Copy link
Member

@fnothaft fnothaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for the fix, @majkiw! I will merge this manually.

@fnothaft
Copy link
Member

Merged manually as 5f0ef28. Thank you for the contribution, @majkiw!

@fnothaft fnothaft closed this Jan 11, 2017
@heuermh
Copy link
Member

heuermh commented Jan 11, 2017

Thank you, @majkiw!

@majkiw
Copy link
Author

majkiw commented Jan 12, 2017

Thank you guys for processing this PR so quickly :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants