Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcf_hdr_idinfo_exists(hdr,type,int_id) on binary VCFs with gapped BCF_DT_ID tables #1538

Closed
freeseek opened this issue Dec 14, 2022 · 0 comments · Fixed by #1544
Closed

bcf_hdr_idinfo_exists(hdr,type,int_id) on binary VCFs with gapped BCF_DT_ID tables #1538

freeseek opened this issue Dec 14, 2022 · 0 comments · Fixed by #1544
Assignees

Comments

@freeseek
Copy link

Let's suppose I generate a binary VCF with a gapped BCF_DT_ID table:

$ echo -e "##fileformat=VCFv4.2\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO" | \
  bcftools view --no-version -Ou -c0 | \
  bcftools annotate --no-version -Ou -x INFO/AC
BCF�##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed",IDX=0>
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes",IDX=2>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO

Where now the IDX=1 entry is gone.

Now I generate a simple executable:

$ echo '#include <htslib/vcf.h>
int main() {
  htsFile *in = hts_open("-", "r");
  bcf_hdr_t *hdr = bcf_hdr_read(in);
  fprintf(stderr, "hdr->n[BCF_DT_ID]=%d\n", hdr->n[BCF_DT_ID]);
  for (int int_id = 0; int_id < hdr->n[BCF_DT_ID]; int_id++) {
    fprintf(stderr, "hdr->id[BCF_DT_ID][%d].val=%p\n", int_id, hdr->id[BCF_DT_ID][int_id].val);
    fprintf(stderr, "bcf_hdr_idinfo_exists(hdr, BCF_HL_INFO, %d)=%d\n", int_id, bcf_hdr_idinfo_exists(hdr, BCF_HL_INFO, int_id));
  }
  bcf_hdr_destroy(hdr);
  hts_close(in);
  return 0;
}' > /tmp/main.c
$ gcc /tmp/main.c -o /tmp/main -lhts

If I pipe the previous binary VCF into this executable:

$ echo -e "##fileformat=VCFv4.2\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO" | \
  bcftools view --no-version -Ou -c0 | \
  bcftools annotate --no-version -Ou -x INFO/AC | \
  /tmp/main
hdr->n[BCF_DT_ID]=3
hdr->id[BCF_DT_ID][0].val=0x5617da06f918
bcf_hdr_idinfo_exists(hdr, BCF_HL_INFO, 0)=0
hdr->id[BCF_DT_ID][1].val=(nil)
Segmentation fault (core dumped)

I get that bcf_hdr_idinfo_exists(hdr,type,int_id) causes segmentation fault. This macro is defined in htslib/vcf.h as follows:

    #define bcf_hdr_id2coltype(hdr,type,int_id) (uint32_t)((hdr)->id[BCF_DT_ID][int_id].val->info[type] & 0xf)
    #define bcf_hdr_idinfo_exists(hdr,type,int_id)  ((int_id)>=0 && bcf_hdr_id2coltype((hdr),(type),(int_id))!=0xf)

And the problem is that it tries to access (hdr)->id[BCF_DT_ID][int_id].val without first verifying that it exists. And there is no attempt at verifying that int_id might refer to a gap in the BCF_DT_ID table.

Is this the intended behavior or should the bcf_hdr_idinfo_exists(hdr,type,int_id) macro be amended to first check that (hdr)->id[BCF_DT_ID][int_id].val indeed exists? Or what should be the proper way to loop through all the INFO fields in the header?

pd3 added a commit to pd3/htslib that referenced this issue Jan 6, 2023
Do not rely on int_id being return value of bcf_hdr_id2int()

Resolves samtools#1538
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants