Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARN VariantContextConverter:924 - Ran into Array Out of Bounds when accessing indices 0,1,2 of genotype . #2024

Closed
rezacsedu opened this issue Aug 1, 2018 · 6 comments
Milestone

Comments

@rezacsedu
Copy link

rezacsedu commented Aug 1, 2018

Hi there,

I'm trying to extract genomics data from VCF files. However, I'm experiencing the following warning.

2018-08-01 16:31:25 WARN VariantContextConverter:924 - Ran into Array Out of Bounds when accessing indices 0,1,2 of genotype [NA21126 A* GQ 99 PL 0,677 {CN=1, CNL=-1000,0,-67.74, CNP=-1000,0,-70.53, CNQ=99, GP=0,-70.53}].

I'm seeing 1000s of warnings like this. Is it a big issue or will it return fewer quality features? Please note that I'm using the following software versions:

Apache Spark: v2.3.0,
H2O: v3.14.0.1
Sparkling Water: v1.2.5
ADAM: v0.22.0
Scala: 2.11.8

@heuermh
Copy link
Member

heuermh commented Aug 1, 2018

Generally, you may tweak validation for flat file formats using ValidationStringency

val genotypes = sc.loadGenotypes("sample0.vcf", stringency = ValidationStringency.SILENT)

As to that specific warning, the line number on the 0.22.0 release branch doesn't look like what I would expect, perhaps you might be on a different version?

In any case, if you could extract a short bit of VCF that demonstrates the warning, we could investigate further.

@rezacsedu
Copy link
Author

@heuermh sorry for the wrong versions but here are the exact versions I'm using:

	<properties>
		<spark.version>2.2.1</spark.version>
		<scala.version>2.11.12</scala.version>
		<h2o.version>3.16.0.2</h2o.version>
		<sparklingwater.version>2.2.6</sparklingwater.version>
		<adam.version>0.23.0</adam.version>
	</properties>

@heuermh
Copy link
Member

heuermh commented Aug 1, 2018

Yep, this line looks better https://github.com/bigdatagenomics/adam/blob/maint_spark2_2.11-0.23.0/adam-core/src/main/scala/org/bdgenomics/adam/converters/VariantContextConverter.scala#L924

It appears your data might have the wrong cardinality for the PL Number=G VCF FORMAT field
https://github.com/samtools/hts-specs/blob/master/VCFv4.3.tex#L399

Or perhaps we're doing something wrong. 😄

@rezacsedu
Copy link
Author

Actually, I'm using the genetic variants data from the 1000 Genomes Project. However, when I use the following old versions, I don't experience such warnings:

    <properties>
        <spark.version>1.2.0</spark.version>
        <h2o.version>3.0.0.8</h2o.version>
        <sparklingwater.version>1.2.5</sparklingwater.version>
        <adam.version>0.16.0</adam.version>
</properties>

@heuermh
Copy link
Member

heuermh commented Aug 1, 2018

I'm using the genetic variants data from the 1000 Genomes Project

That doesn't mean they're adhering correctly to the VCF specification. ;)

Which 1000G file(s) are you looking at, specifically?

when I use the following old versions, I don't experience such warnings

There have been 868 commits since version 0.16.0; I'd recommend trying newer versions rather than older ones.

@heuermh
Copy link
Member

heuermh commented Jan 6, 2020

Closing due to lack of context for the error. Please reopen if you can provide an example 1000G file that causes the issue.

@heuermh heuermh closed this as completed Jan 6, 2020
@heuermh heuermh added this to the 0.31.0 milestone Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants