-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop producing empty alleles #17
Comments
So the issue here is to do with multiple levels of nesting which can make it really hard to "get the previous letter and add to both". E.g. suppose that the allele is at level 3, but the bubble its embedded in has no string before the start of the nested split. Gets very messy with the recursion very quickly.
On the other hand, in pandora at least it's pretty easy to fix this so that there are no empty string alleles once the template vcf has been created from the graph and given the vcf reference.
If you can find a fix, great!
Sent from my Samsung Galaxy smartphone.
…-------- Original message --------
From: Brice Letcher <notifications@github.com>
Date: 11/08/2020 11:16 (GMT+00:00)
To: rmcolq/make_prg <make_prg@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Subject: [rmcolq/make_prg] Stop producing empty alleles (#17)
make_prg can produce sites with i) direct deletions (eg REF AT, ALT "") ii) direct insertions (eg REF "", ALT AT). I refer to REF as the first allele in the site, that's how we embed a 'reference' in gramtools.
Though @mbhall88<https://github.com/mbhall88> has rightly pointed a site made by this tool does not have to translate to one in pandora/gramtools, I argue if we fix this problem here, there's no need to deal with it there. This is especially relevant for gramtools as by default each site produced here is a variant site in the output of genotype. It is also important since vcf spec (https://github.com/samtools/hts-specs/blob/master/VCFv4.3.pdf section 1.6.1) states neither REF nor ALT should be empty.
I will have a look at how to fix this
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#17>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACLIWO5R7CYRES6ONOXHXNLSAEK7NANCNFSM4P237U7A>.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
|
Thanks for pointing this out @rmcolq , indeed it looks like we can't easily prepend/postpend sequence to non-match intervals to guarantee the recursive clustering won't eventually hit an empty sequence. However i'd like to at least try to enforce that at 'level 1' we can have that guarantee- WIP! |
bricoletc
added a commit
that referenced
this issue
Aug 12, 2020
Merged
bricoletc
added a commit
that referenced
this issue
Nov 18, 2020
bricoletc
added a commit
that referenced
this issue
Nov 18, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
make_prg
can produce sites with i) direct deletions (eg REF AT, ALT "") ii) direct insertions (eg REF "", ALT AT). I refer to REF as the first allele in the site, that's how we embed a 'reference' in gramtools.Though @mbhall88 has rightly pointed a site made by this tool does not have to translate to one in pandora/gramtools, I argue if we fix this problem here, there's no need to deal with it there. This is especially relevant for gramtools as by default each site produced here is a variant site in the output of
genotype
. It is also important since vcf spec (https://github.com/samtools/hts-specs/blob/master/VCFv4.3.pdf section 1.6.1) states neither REF nor ALT should be empty.I will have a look at how to fix this
The text was updated successfully, but these errors were encountered: