You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having some trouble indexing a VCF of structural variants when I specify an END position for interchromosomal BND sites. The indexing assumes that the END position is the end of a region on the primary chromosome. This results in breakends that don't lie in a query interval being returned, and the exclusion of breakends that do.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1
22 16872532 var0 C ]X:153010136]TC 999 PASS CHR2=X;END=153010136;SVTYPE=BND GT:GL:GQ:DR:PE:RR:SR 0/1:975,0,999:925.0:38:3:41:29
22 16945961 var1 A A[Y:13529835[ 999 MaxDepth CHR2=Y;END=13529835;SVTYPE=BND GT:GL:GQ:DR:PE:RR:SR 0/1:999,0,999:999.0:69:1:45:51
22 16945961 var2 A A[Y:13529835[ 999 MaxDepth CHR2=Y;SVTYPE=BND GT:GL:GQ:DR:PE:RR:SR 0/1:999,0,999:999.0:69:1:45:51
22 16945961 var3 A <DEL> 999 MaxDepth CHR2=22;END=16946061;SVTYPE=DEL GT:GL:GQ:DR:PE:RR:SR 0/1:999,0,999:999.0:69:1:45:51
22 17010316 var4 C [Y:28549365[AC 999 MaxDepth CHR2=Y;END=28549365;SVTYPE=BND GT:GL:GQ:DR:PE:RR:SR 0/1:999,0,999:999.0:46:0:36:35
I'd expect a tabix query on the region 22:16900000-17100000 to return variants var1, var2, var3, and var4. However, var1 is excluded despite sharing the same coordinate as var2 and var3, and var0 is included despite lying outside the query region.
As far as I can tell, this is due to tabix parsing the coordinates specified in the END INFO fields as the intervals 22:16872532-153010136 and 22:16945961-13529835, which span the query region and are a null interval, respectively.
Is there any way to turn this behavior off and force tabix to index only on the POS column? I tried indexing with tabix -b2 -e2 $vcf instead of tabix -p vcf $vcf, but observed the same issue. (EDIT: the two tabix commands produce identical index files. Is this expected behavior?) I'm using tabix version 1.3-48-g1afaf0c.
Thanks!
The text was updated successfully, but these errors were encountered:
pd3
added a commit
to pd3/htslib
that referenced
this issue
Oct 24, 2016
Hi,
thank you for the test case. Yes, tabix is not read for the SVs going in the opposite direction. It is not easy to fix this directly, because the VCF records are sorted by the POS coordinate and indexing by END for records where POS>END would break the order.
So the only possibility is to fix tabix so that the file format auto detection is turned off when -s, -b, -e is given
Hi,
I'm having some trouble indexing a VCF of structural variants when I specify an END position for interchromosomal BND sites. The indexing assumes that the END position is the end of a region on the primary chromosome. This results in breakends that don't lie in a query interval being returned, and the exclusion of breakends that do.
Here's an example VCF.
I'd expect a tabix query on the region 22:16900000-17100000 to return variants var1, var2, var3, and var4. However, var1 is excluded despite sharing the same coordinate as var2 and var3, and var0 is included despite lying outside the query region.
As far as I can tell, this is due to tabix parsing the coordinates specified in the END INFO fields as the intervals 22:16872532-153010136 and 22:16945961-13529835, which span the query region and are a null interval, respectively.
Is there any way to turn this behavior off and force tabix to index only on the POS column? I tried indexing with
tabix -b2 -e2 $vcf
instead oftabix -p vcf $vcf
, but observed the same issue. (EDIT: the two tabix commands produce identical index files. Is this expected behavior?) I'm using tabix version 1.3-48-g1afaf0c.Thanks!
The text was updated successfully, but these errors were encountered: