Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-Brujin graph flag #49

Open
heringerp opened this issue Oct 16, 2024 · 9 comments
Open

De-Brujin graph flag #49

heringerp opened this issue Oct 16, 2024 · 9 comments
Labels
discussion Needs to be discussed more before it can be implemented enhancement New feature or request

Comments

@heringerp
Copy link
Collaborator

It might make sense to introduce a parameter to tell panacus whether a graph is a variation graph or a De-Brujin graph. With this we could make sure all commands work for both types of graphs or at least tell the user if something won't work. Also we could change the debug/warning statements we discussed on 2024-10-15 conditional on the graph type.

Do you agree with this @lucaparmigiani? Is there anything else to think about?

@heringerp heringerp added enhancement New feature or request discussion Needs to be discussed more before it can be implemented labels Oct 16, 2024
@lucaparmigiani
Copy link
Collaborator

Yes, I think this is a very nice idea.

Running panacus on compact de Bruijn graph requires the parameter k. Of course, we could retrieve it by the matching of the links in the GFA, but it might be better just to give it as a parameter. Apart from this, I dont think there is any difference for the user.

@danydoerr
Copy link
Member

danydoerr commented Oct 16, 2024

If the user informs panacus that the input graph is c(c)DBG, then for what it's worth, we can assume that k is equal to the length of the shortest node in the graph. That can be identified very fast, so no need for the user to specify explicitly.

@lucaparmigiani
Copy link
Collaborator

This is not the case necessarily. Since it is a compacted graph you might have that all nodes are > k.
The most reliable way is using links:
L 3073 - 758274 + 10M
L 3073 - 962680 + 10M
...
Since all links will have the same matching (in this case 10M means that k is 11 for example)

@heringerp
Copy link
Collaborator Author

But this can still be checked very fast, right? So their would still be no need for letting the user specifiy k?

@lucaparmigiani
Copy link
Collaborator

Yes, but the problem is that the matching cigar is optional:

http://gfa-spec.github.io/GFA-spec/GFA1.html

@danydoerr
Copy link
Member

Thanks for pointing this out, @lucaparmigiani! Does Bifrost output the CIGAR string?

@lucaparmigiani
Copy link
Collaborator

Yes. Bifrost outputs the cigar. I am not so sure about other tools.

@danydoerr
Copy link
Member

If no cigar string is given, the default assumption in GFA format is that the edge is blunt. I feel like we should assume that if c(c)DBGs are given, they must provide the cigar.

But yeah, an option that specifies k in absence of the cigar won't hurt either, would it?

@lucaparmigiani
Copy link
Collaborator

I agree. So if there is the parameter passed we just assume we want to parse it as a cdbg, otherwise we just base ourselves on the cigar of the first link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Needs to be discussed more before it can be implemented enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants