Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools norm: Why did it "normalize" this deletion by shifting REF 5' by 29 bp? #2336

Closed
toddajohnson opened this issue Dec 13, 2024 · 2 comments

Comments

@toddajohnson
Copy link

Tool: bcftools norm (ver 1.21)
Problem: 276 bp REF sequence for a 275 bp deletion was shifted 29 bp to the left, so normalized DEL does not represent original record. Blat of both the original and "normalized" REF match 100% to their respective locations.

Original record
chr9 125780340 . GGACGGGGCGGCTGGCTGGGCGGGGAGCTGACCCCACCTCCCTCCCAGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCCCACCTCCCTCCCAGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCACCTCCCTCCCGGACGGGGCGTCTGGCCGGGTGGGGGGCTAACTCCCCCACCTCCCTCCTGGACGGGGTGGCTGGCGAGGCAGAGGGGCTCCTCACTTCCCAGTAGGGGAGGCTGGGCAGAGGCGCCCCTCACCTCCCG G . PASS GT:AD:AF:DP:F1R2:F2R1 0/0:17,1:0.055:18:11,1:6,0 0/1:28,12:0.193:40:15,5:13,7

bcftools norm -m-any -f ${HOME}/reference/GATK/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta
chr9 125780311 . CGGCTGGGCAGAGGCGCCCCTCACCTCCCGGACGGGGCGGCTGGCTGGGCGGGGAGCTGACCCCACCTCCCTCCCAGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCCCACCTCCCTCCCAGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCACCTCCCTCCCGGACGGGGCGTCTGGCCGGGTGGGGGGCTAACTCCCCCACCTCCCTCCTGGACGGGGTGGCTGGCGAGGCAGAGGGGCTCCTCACTTCCCAGTAGGGGA C . PASS GT:AD:AF:DP:F1R2:F2R1 0/0:17,1:0.055:18:11,1:6,0 0/1:28,12:0.193:40:15,5:13,7

@davmlaw
Copy link

davmlaw commented Dec 20, 2024

If you remove the 1st base common between ref/alt - the bases removed in the original variant are

GACGGGGCGGCTGGCTGGGCGGGGAGCTGACCCCACCTCCCTCCCAGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCCCACCTCCCTCCCAGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCACCTCCCTCCCGGACGGGGCGTCTGGCCGGGTGGGGGGCTAACTCCCCCACCTCCCTCCTGGACGGGGTGGCTGGCGAGGCAGAGGGGCTCCTCACTTCCCAGTAGGGGAGGCTGGGCAGAGGCGCCCCTCACCTCCCG

Which if you break it into parts are:

A = "GACGGGGCGGCTGGCTGGGCGGGGAGCTGACCCCACCTCCCTCCCAGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCCCACCTCCCTCCCAGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCACCTCCCTCCCGGACGGGGCGTCTGGCCGGGTGGGGGGCTAACTCCCCCACCTCCCTCCTGGACGGGGTGGCTGGCGAGGCAGAGGGGCTCCTCACTTCCCAGTAGGGGA"
B = "GGCTGGGCAGAGGCGCCCCTCACCTCCCG"

There is another copy of "B" upstream, so the variant represents a change of BAB>B (your variant gets there by describing the deletion of the final AB to be left with B)

The second one also describes a deletion of BAB>B but does so by deleting the 1st BA to be left with B - the equivalent deletion, but left-aligned

Normalized representation of sequence (after removing common 1st base) is:

B = "GGCTGGGCAGAGGCGCCCCTCACCTCCCG"
A  = "GACGGGGCGGCTGGCTGGGCGGGGAGCTGACCCCACCTCCCTCCCAGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCCCACCTCCCTCCCAGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCCACCTCCCTCCCGGACGGGGCGTCTGGCCGGGTGGGGGGCTAACTCCCCCACCTCCCTCCTGGACGGGGTGGCTGGCGAGGCAGAGGGGCTCCTCACTTCCCAGTAGGGGA

@pd3
Copy link
Member

pd3 commented Jan 8, 2025

I believe this can be closed? Please shout if not

@pd3 pd3 closed this as completed Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants