-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: split fasta cluster output into separate files #406
Comments
You probably want to use We don't have a module to unpack a database into separate files. You can use |
I'm not asking how to write a script to do this splitting; doing so takes only a few minutes. I'm pointing out that the splitting is a workflow much more common than the default. Instead of having a sizeable fraction of your users have to each cough up their own scripts, this would make a fine option. |
I added a module |
Thanks, I'll use it! |
Again, this is an easy script, but one might as well ask for what one wants and have it done centrally.
A typical downstream use of the clusters is to do multple-sequence-alignment calculations and then look at some stats on those such as the fraction missing and fraction parsimony-informative. This means making a directory to hold a fasta of each cluster and then running one's favorite MSA/treebuilder algorithm on it (MUSCLE, in my case), then doing some descriptive statistics on them.
It would be nice if optionally mmseqs would do this splitting on its own. Nicer still if it would spawn the MSA/treebuilder with a user-specified set of arguments and do the summary stat calculation with output to TSV. I note that MUSCLE is public-domain and pretty fast.
The text was updated successfully, but these errors were encountered: