Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Missing USearch in installation instruction and workflow DAG declarations #2309

Open
lmrodriguezr opened this issue Jul 23, 2024 · 1 comment

Comments

@lmrodriguezr
Copy link

Short description of the problem

DAS Tool needs USearch, but it's not listed in the requirements.

anvi'o version

Anvi'o .......................................: marie (v8-dev)
Python .......................................: 3.10.8

Profile database .............................: 40
Contigs database .............................: 23
Pan database .................................: 17
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

OS: Rocky Linux 8.6 (Green Obsidian). Installed using the instructions for developer version.

Detailed description of the issue

I'm not sure if this is an issue for the conda recipe of DAS Tool, or for Anvi'o. However, since Anvi'o explicitly uses --search_engine usearch, I'm reporting it here: The bioconda installation of DAS Tool does not include usearch (see recipe), and Anvi'o fails with the message:

Config Error: One of the critical output files is missing ('OUTPUT_DASTool_contig2bin.tsv').
              Please take a look at the log file: /tmp/tmpwmogo2i1/logs.txt                 

If you loose access to the temporal (e.g., in a cluster infrastructure) it would be pretty hard to debug. But the actual error is simply a missing usearch:

DAS Tool 1.1.6 
Error:  Cannot find dependencies: usearch 
Execution halted

Perhaps this could be documented somewhere in the installation? Or at least the workflow could be aware of that dependency, as it currently doesn't list it when building the DAG:

Shell programs for the workflow
===============================================
Needed .......................................: gunzip, anvi-script-reformat-fasta, anvi-script-reformat-fasta, anvi-gen-contigs-d
atabase, anvi-import-functions, anvi-get-sequences-for-gene-calls, centrifuge, anvi-import-taxonomy-for-genes, anvi-run-hmms, anvi
-run-pfams, anvi-run-kegg-kofams, anvi-run-ncbi-cogs, anvi-run-scg-taxonomy, anvi-scan-trnas, anvi-get-sequences-for-gene-calls, i
u-gen-configs, iu-filter-quality-minoche, gzip, bowtie2-build, bowtie2, samtools, anvi-init-bam, anvi-profile, echo, anvi-import-c
ollection, anvi-script-add-default-collection, anvi-summarize, anvi-split, mv, krakenuniq, krakenuniq-mpa-report, anvi-import-taxo
nomy-for-layers, anvi-cluster-contigs
Missing ......................................: None

In any case, the solution is pretty simple: install usearch :)

Thank you!
Miguel.

@lmrodriguezr lmrodriguezr changed the title [BUG] Insert a short but descriptive title (leave the '[BUG]' part) [BUG] Missing USearch in installation instruction and workflow DAG declarations Jul 23, 2024
@meren
Copy link
Member

meren commented Jul 24, 2024

Dear @lmrodriguezr, I'm sorry you are running into issues with anvi-cluster-contigs :/

To be honest, we are often considering removing that program and the underlying structure completely from anvi'o. We had started that project with high hopes, but the diversity of binning algorithms, their changing input/output formats from one version to the next, and lack of proper APIs for almost ANY of them made us realize that perhaps it is best if the user does the automatic binning outside of anvi'o, and bring in their bins into the anvi'o system with anvi-import-collection for refinement efforts, or anything else downstream.

If we had someone interested in pushing the automatic binning capabilities of anvi'o, we would happily give them full access to the codebase so they could do whatever they wanted, fix the issues, and update documentation and so on. But currently every core developer is dealing with much more immediate needs, so anvi-cluster-contigs and workflows linked to it starts accumulating bugs as you notice.

Best wishes,
Meren

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants