Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: split fasta cluster output into separate files #406

Closed
joelb123 opened this issue Feb 5, 2021 · 4 comments
Closed

Feature request: split fasta cluster output into separate files #406

joelb123 opened this issue Feb 5, 2021 · 4 comments

Comments

@joelb123
Copy link

joelb123 commented Feb 5, 2021

Again, this is an easy script, but one might as well ask for what one wants and have it done centrally.

A typical downstream use of the clusters is to do multple-sequence-alignment calculations and then look at some stats on those such as the fraction missing and fraction parsimony-informative. This means making a directory to hold a fasta of each cluster and then running one's favorite MSA/treebuilder algorithm on it (MUSCLE, in my case), then doing some descriptive statistics on them.

It would be nice if optionally mmseqs would do this splitting on its own. Nicer still if it would spawn the MSA/treebuilder with a user-specified set of arguments and do the summary stat calculation with output to TSV. I note that MUSCLE is public-domain and pretty fast.

@milot-mirdita
Copy link
Member

You probably want to use mmseqs apply for this purpose (see: https://github.com/soedinglab/hh-suite/blob/master/scripts/createdb.sh#L37).

We don't have a module to unpack a database into separate files. You can use ffindex_unpack from the HH-suite. MMseqs2 could use the same functionality.

@joelb123
Copy link
Author

joelb123 commented Feb 9, 2021

I'm not asking how to write a script to do this splitting; doing so takes only a few minutes. I'm pointing out that the splitting is a workflow much more common than the default. Instead of having a sizeable fraction of your users have to each cough up their own scripts, this would make a fine option.

@milot-mirdita
Copy link
Member

I added a module unpackdb for this purpose.

@joelb123
Copy link
Author

Thanks, I'll use it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants