Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ref Prefixes #284

Closed
denisemauldin opened this issue Jun 26, 2015 · 12 comments
Closed

Ref Prefixes #284

denisemauldin opened this issue Jun 26, 2015 · 12 comments

Comments

@denisemauldin
Copy link

I'm getting the following error:

The REF prefixes differ: c vs C (1,1)
Failed to merge alleles at chrX:60001

This seems to say that the reference is 'c'?

My files are:
chrX 60001 . C . 0.00 LowGQX END=60008;BLOCKAVG_min30p3a GT:GQX:DP:DPF 0/0:10:4:0
chrX 60001 . C . 0.00 LowGQX . GT:GQX:DP:DPF 0/0:7:3:0
and a third file that is LowGQX through that region:
chrX 1 . N . 0.00 LowGQX END=60028;BLOCKAVG_min30p3a GT:GQX:DP:DPF .:.:0:0

How do I fix this error? I have 674 files that have this error on chromosome X, mostly at position 60001, but also at other positions (2699521, 154931044)

@jmarshall
Copy link
Member

What version of bcftools are you using? There was a fix in 1.2 to ignore uppercase/lowercase differences in the REF column when merging events (see #157).

@denisemauldin
Copy link
Author

Latest version as far as I know.

bcftools --version
bcftools 1.2
Using htslib 1.2.1
Copyright (C) 2015 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

@mcshane
Copy link
Contributor

mcshane commented Jul 3, 2015

Looking at this more closely, I see you are merging gVCFs. Support for this is limited at the moment. @pd3 has worked on this in his experimental branch, but not ready for the main release yet.

However, I can't reproduce this with the current develop bcftools nor with 1.2. Making 3 VCFs with your example records and merging with bcftools merge gives me this result:

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chrX,length=155270560>
##INFO=<ID=BLOCKAVG_min30p3a,Type=Flag,Description="BLOCKAVG_min30p3a">
##INFO=<ID=END,Number=1,Type=Integer,Description="END">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="DP">
##FORMAT=<ID=DPF,Number=1,Type=Integer,Description="DPF">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQX,Number=1,Type=Integer,Description="GQX">
##FILTER=<ID=LowGQX,Description="LowGQX">
##bcftools_mergeVersion=1.2+htslib-1.2.1
##bcftools_mergeCommand=merge t1.vcf.gz t2.vcf.gz t3.vcf.gz
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NA00001 NA00002 NA00003
chrX    1   .   N   .   0   LowGQX  END=60028;BLOCKAVG_min30p3a GT:GQX:DP:DPF   .:.:.:. .:.:.:. .:.:0:0
chrX    60001   .   C   .   0   LowGQX  END=60008;BLOCKAVG_min30p3a GT:GQX:DP:DPF   0/0:10:4:0  0/0:7:3:0   ./.:.:.:.

The two gVCF blocks are not merged, but that is to be expected with what we currently support.

Perhaps your error is caused by something else? Can you send example VCFs (email address on my profile page)?

@mcshane
Copy link
Contributor

mcshane commented Jul 17, 2015

This should be fixed on develop now. Please reopen if not.

@denisemauldin
Copy link
Author

This is fixed for this case. Thanks!

@denisemauldin
Copy link
Author

Hi there,

I installed the latest github version (1.2-157) and attempted to do a bcftools merge on VCF files. I'm getting:

The REF prefixes differ: T vs C (1,1)
Failed to merge alleles at chrY:195 in

That file contains:
chrY 1 . N . . nc END=10000;NS=1;AN=0 GT .

Other files I'm attempting to merge with:
chrY 1 . N . 0 nc END=2918288;BLOCKAVG_min30p3a GT:GQX:DP:DPF .:.:0:0
chrY 1 . N . 0 nc END=2918275;BLOCKAVG_min30p3a GT:GQX:DP:DPF .:.:0:0
chrY 1 . N . 0 nc END=2918278;BLOCKAVG_min30p3a GT:GQX:DP:DPF .:.:0:0
chrY 1 . N . 0 nc END=2918303;BLOCKAVG_min30p3a GT:GQX:DP:DPF .:.:0:0
chrY 1 . N . 0 nc END=2918280;BLOCKAVG_min30p3a GT:GQX:DP:DPF .:.:0:0
chrY 1 . N . 0 nc END=2649520;BLOCKAVG_min30p3a GT:GQX:DP:DPF .:.:0:0
chrY 1 . N . 0 nc END=2649533;BLOCKAVG_min30p3a GT:GQX:DP:DPF .:.:0:0
chrY 1 . N . 0 nc END=2918274;BLOCKAVG_min30p3a GT:GQX:DP:DPF .:.:0:0
chrY 1 . N . 0 nc END=2649520;BLOCKAVG_min30p3a GT:GQX:DP:DPF .:.:0:0
and one file that doesn't have a chrY because it contains a female genome.

Why is it failing in a no-call region?

@pd3
Copy link
Member

pd3 commented Oct 20, 2015

The error message suggests that the problem is at position Y:195. Can you show what is there? A small test case would be very helpful.

@denisemauldin
Copy link
Author

Hi there,

My error message includes the information that's at that position. It's a
huge no-call region from position Y:1 to position Y:10000 in the file that
has the error and the no-call region is larger in the other files, except
where it doesn't exist in the last file. I'm not sure where bcftools is
getting the reference from or why it's not matching.

Thanks,
Denise

On Tue, Oct 20, 2015 at 12:51 AM, pd3 notifications@github.com wrote:

The error message suggests that the problem is at position Y:195. Can you
show what is there? A small test case would be very helpful.


Reply to this email directly or view it on GitHub
#284 (comment).

@mcshane
Copy link
Contributor

mcshane commented Oct 22, 2015

I've made a bunch of VCF files with the records you have above:

https://gist.github.com/mcshane/2416a548a5c23f4d5b4f

(for local people, they are here: ~sm15/dev/bcftools-main/284/*.vcf.gz)

However I don't get your error message. The command works fine, although the END tag is the tag in the first VCF file (could use the --info-rule END:max option to force it though). This is to be expected as we are not supporting gVCF merging yet.

@mcshane mcshane reopened this Oct 22, 2015
@denisemauldin
Copy link
Author

Hi Shane,

I went to make you some example VCFs, which are a slightly different error than above:

The REF prefixes differ: G vs A (1,1)
Failed to merge alleles at chrY:73 in B02.vcf.gz

I think it's actually failing to merge on chromosome M. So I think it's proper to throw an error because the chrM REF alleles are different (A versus G), but I don't know why it thinks they're on chrY instead of chrM.

https://gist.github.com/denisemauldin/cfd12e9ad6a34040a58c

These VCF files have M before X and Y because I ran vcf-sort -c on them and that's how they came out. I wanted to make sure they all had a standard sort order. If I create files with X, Y and M, merge fails. If I create files with X, Y and 22, merge works. If I create files with X and Y, merge works. So it's only if I include M that the merge fails.

@pd3
Copy link
Member

pd3 commented Oct 30, 2015

The incorrect chromosome was caused by a small error in error reporting. This part is fixed by the commit b1f04d6.

@denisemauldin
Copy link
Author

Thanks for the quick commit Petr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants