Feature request: split fasta cluster output into separate files #406

joelb123 · 2021-02-05T17:07:26Z

Again, this is an easy script, but one might as well ask for what one wants and have it done centrally.

A typical downstream use of the clusters is to do multple-sequence-alignment calculations and then look at some stats on those such as the fraction missing and fraction parsimony-informative. This means making a directory to hold a fasta of each cluster and then running one's favorite MSA/treebuilder algorithm on it (MUSCLE, in my case), then doing some descriptive statistics on them.

It would be nice if optionally mmseqs would do this splitting on its own. Nicer still if it would spawn the MSA/treebuilder with a user-specified set of arguments and do the summary stat calculation with output to TSV. I note that MUSCLE is public-domain and pretty fast.

milot-mirdita · 2021-02-05T17:26:17Z

You probably want to use mmseqs apply for this purpose (see: https://github.com/soedinglab/hh-suite/blob/master/scripts/createdb.sh#L37).

We don't have a module to unpack a database into separate files. You can use ffindex_unpack from the HH-suite. MMseqs2 could use the same functionality.

joelb123 · 2021-02-09T01:16:34Z

I'm not asking how to write a script to do this splitting; doing so takes only a few minutes. I'm pointing out that the splitting is a workflow much more common than the default. Instead of having a sizeable fraction of your users have to each cough up their own scripts, this would make a fine option.

milot-mirdita · 2021-02-15T13:23:16Z

I added a module unpackdb for this purpose.

joelb123 · 2021-02-16T23:26:45Z

Thanks, I'll use it!

milot-mirdita added a commit that referenced this issue Feb 15, 2021

Add unpackdb to split a database into separate files #406

0cc7e67

joelb123 closed this as completed Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: split fasta cluster output into separate files #406

Feature request: split fasta cluster output into separate files #406

joelb123 commented Feb 5, 2021

milot-mirdita commented Feb 5, 2021

joelb123 commented Feb 9, 2021

milot-mirdita commented Feb 15, 2021

joelb123 commented Feb 16, 2021

Feature request: split fasta cluster output into separate files #406

Feature request: split fasta cluster output into separate files #406

Comments

joelb123 commented Feb 5, 2021

milot-mirdita commented Feb 5, 2021

joelb123 commented Feb 9, 2021

milot-mirdita commented Feb 15, 2021

joelb123 commented Feb 16, 2021