Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-i and -e are not complement when I use query subcommand #1783

Closed
takosa opened this issue Aug 29, 2022 · 6 comments
Closed

-i and -e are not complement when I use query subcommand #1783

takosa opened this issue Aug 29, 2022 · 6 comments

Comments

@takosa
Copy link

takosa commented Aug 29, 2022

I noticed the result of

bcftools query -i 'FMT/DP>=20' -f '[%CHROM %POS %SAMPLE\n]' input.vcf

and

bcftools query -e 'FMT/DP<20' -f '[%CHROM %POS %SAMPLE\n]' input.vcf

are not complement.

This occur if only I specified %SAMPLE but %FORMAT tags, so I can avoid it by specifying other %FORMAT tags but it is a little inconvenient and confusing.

I think this is because the process is skipped by the code shown below when I use -e and the condition is met in some samples.
If only %SAMPLE is specified, not %FMT/DP, etc., the flag BCF_UN_FMT is not set. When I use -i, this checking and skipping do not occur.

if ( !(max_convert_unpack & BCF_UN_FMT) ) continue;

The simple solutions I think is to omit the above code or to change the code below to set the flag.

case T_SAMPLE: fmt->handler = &process_sample; break;

However, I'm not sure how this change will affect the rest of the code.

Thank you,

@pd3
Copy link
Member

pd3 commented Sep 8, 2022

Can you provide a specific example to demonstrate the behavior please?

@takosa
Copy link
Author

takosa commented Sep 9, 2022

I explain the issue by using this example vcf file named input.vcf.

##fileformat=VCFv4.1
##contig=<ID=1,assembly=test,length=100>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample1	sample2
1	10	snp1	G	C	.	.	.	GT:DP	0/0:10	0/1:20
1	20	snp2	G	C	.	.	.	GT:DP	0/0:10	0/1:20
1	30	snp3	G	C	.	.	.	GT:DP	0/0:10	0/1:20

I ran bcftools query command and this is the outputs.

$ bcftools query -i 'FMT/DP>=20' -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample2
snp2 sample2
snp3 sample2
$ bcftools query -e 'FMT/DP<20'  -f '[%ID %SAMPLE\n]' input.vcf # no outputs
$ bcftools query -i 'FMT/DP<20'  -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample1
snp2 sample1
snp3 sample1
$ bcftools query -e 'FMT/DP>=20'  -f '[%ID %SAMPLE\n]' input.vcf # no outputs

However, I think the outputs should be like this.

$ bcftools query -i 'FMT/DP>=20' -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample2
snp2 sample2
snp3 sample2
$ bcftools query -e 'FMT/DP<20'  -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample2
snp2 sample2
snp3 sample2
$ bcftools query -i 'FMT/DP<20'  -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample1
snp2 sample1
snp3 sample1
$ bcftools query -e 'FMT/DP>=20'  -f '[%ID %SAMPLE\n]' input.vcf
snp1 sample1
snp2 sample1
snp3 sample1

In addition, if I add DP tag in my query, the outputs is like this. The number of rows are changed even though I don't change the filtering option.

$ bcftools query -e 'FMT/DP<20'  -f '[%ID %SAMPLE %DP\n]' input.vcf
snp1 sample2 20
snp2 sample2 20
snp3 sample2 20

Thank you for your consideration.

@pd3
Copy link
Member

pd3 commented Sep 10, 2022

I see. It's probably an older version of bcftools, there were some bugs fixed over the time. The latest gives the expected output:

$ cat rmme.vcf
##fileformat=VCFv4.1
##contig=<ID=1,assembly=test,length=100>
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	smpl10	smpl20
1	10	.	G	C	.	.	.	DP	10	20

$ bcftools query -f '[%SAMPLE %DP\n]' rmme.vcf -i 'FMT/DP>=20' 
smpl20 20

$ bcftools query -f '[%SAMPLE %DP\n]' rmme.vcf -e 'FMT/DP>=20' 
smpl10 10

$ bcftools query -f '[%SAMPLE %DP\n]' rmme.vcf -e 'FMT/DP<20' 
smpl20 20

$ bcftools query -f '[%SAMPLE %DP\n]' rmme.vcf -i 'FMT/DP<20' 
smpl10 10

@pd3 pd3 closed this as completed Sep 10, 2022
@takosa
Copy link
Author

takosa commented Sep 11, 2022

Thank you for your help.
I think I use latest version and outputs is like this.
I don't know, but something seems wrong... or Is this a specification?
(DP tag is not needed for my analysis.)

$ cat rmme.vcf
##fileformat=VCFv4.1
##contig=<ID=1,assembly=test,length=100>
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	smpl10	smpl20
1	10	.	G	C	.	.	.	DP	10	20

$ bcftools --version
bcftools 1.16-6-gfd9470e
Using htslib 1.16-10-g6366029
Copyright (C) 2022 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ bcftools query -f '[%SAMPLE\n]' rmme.vcf -i 'FMT/DP>=20'
smpl20

$ bcftools query -f '[%SAMPLE\n]' rmme.vcf -e 'FMT/DP>=20'

$ bcftools query -f '[%SAMPLE\n]' rmme.vcf -e 'FMT/DP<20'

$ bcftools query -f '[%SAMPLE\n]' rmme.vcf -i 'FMT/DP<20'
smpl10

@pd3 pd3 reopened this Sep 12, 2022
@pd3
Copy link
Member

pd3 commented Sep 12, 2022

I was able to reproduce the problem now

$ bcftools query -f '[%SAMPLE %DP\n]' rmme.vcf -e 'FMT/DP>=20'
smpl10 10
$ bcftools query -f '[ %SAMPLE\n]' rmme.vcf -e 'FMT/DP>=20'

@pd3 pd3 closed this as completed in f4dee4b Sep 12, 2022
@pd3
Copy link
Member

pd3 commented Sep 12, 2022

I believe the bug is now fixed by f4dee4b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants