You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When bcftools encounters a contig name that it cannot parse (for instance, the one containing a comma), it produced an error message: [E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=BVAB1.CP049781.1_Clostridiales_genomosp._BVAB1_isolate_UAB071_chromosome,_complete_genome,length=1649642>"
The output BCF, however, is still produced. Apparently, with a parsing error, some internal pointer for contigs in the BAM header gets messed up and the resulting BCF contains the variants not from the correct contig, but from the next contig in the BAM header.
My example
BAM header contains a lot of contig names, but all reads in BAM are mapped to one contig called right.contig.
FASTA file contains only the contig right.contig.
Resulting BCF file contains variants from contig wrong.contig. Why?
Sample of the BAM header with the correct and incorrect contigs being next to each other:
... somewhere above is the contig with the forbidden comma ...
@SQ SN:right.contig LN:876514
@SQ SN:wrong.contig LN:100745
...
Apparently, the pointer shifts by one when encountering the parsing error.
When the bad contig name is changed to remove the forbidden comma, all works as expected (BCF file's variants are annotated as right.contig)
Suggestion
When parsing issue happens, either don't generate the BCF (because it may be erroneously annotated) or fix the pointer issue that results in wrong contig being outputted to BCF. Thanks!
The text was updated successfully, but these errors were encountered:
When bcftools encounters a contig name that it cannot parse (for instance, the one containing a comma), it produced an error message:
[E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=BVAB1.CP049781.1_Clostridiales_genomosp._BVAB1_isolate_UAB071_chromosome,_complete_genome,length=1649642>"
The output BCF, however, is still produced. Apparently, with a parsing error, some internal pointer for contigs in the BAM header gets messed up and the resulting BCF contains the variants not from the correct contig, but from the next contig in the BAM header.
My example
BAM header contains a lot of contig names, but all reads in BAM are mapped to one contig called
right.contig
.FASTA file contains only the contig
right.contig
.Resulting BCF file contains variants from contig
wrong.contig
. Why?Sample of the BAM header with the correct and incorrect contigs being next to each other:
Apparently, the pointer shifts by one when encountering the parsing error.
Command used:
bcftools mpileup -q 30 -f {fasta} {bam} -o {bcf}
When the bad contig name is changed to remove the forbidden comma, all works as expected (BCF file's variants are annotated as
right.contig
)Suggestion
When parsing issue happens, either don't generate the BCF (because it may be erroneously annotated) or fix the pointer issue that results in wrong contig being outputted to BCF. Thanks!
The text was updated successfully, but these errors were encountered: