Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple CAT prepare from nr #66

Merged
merged 13 commits into from
Jan 19, 2022

Conversation

papanikos
Copy link
Contributor

Hi @bastiaanvonmeijenfeldt ,

This is a rewrite of the prepare module to work with any input. It should facilitate building custom databases, since users, including me, seem to be interested in that. Being explicit about inputs and not having to "trick" CAT into thinking the input is nr I think makes it more functional.

Most notable changes

  • The only affected modules are prepare.py and shared.py.
  • The interface to prepare now is changed and requires the user to provide explicitly all inputs (fasta, names, nodes, accesion2taxid).
  • All automatic downloads of the nr are removed. I am hoping to turn them into a new module that will do just that, preprocess the nr and the gtdb, as we discussed.
  • No more --fresh or --existing flags. This is always fresh. If any of the .dmnd, .fastaid2LCAtaxid, .taxids_with_multiple_offspring are there their creation is skipped.

There are many more minor details here and there, glad to take you through it if you want.

I have also included a few minimal test sets under the newly created tests dir in the root of the project dir.

I have tested that these changes work based on those with

# Create an output dir
$ mkdir tests/output

# Create the small db
$ ./CAT_pack/CAT prepare --db_fasta tests/data/prepare/small.fa.gz --names tests/data/prepare/names.dmp --nodes tests/data/prepare/nodes.dmp --acc2tax tests/data/prepare/prot2acc.txt --db_dir tests/output/prepare

# Test contigs run
 $ ./CAT_pack/CAT contigs -c tests/data/contigs/small_contigs.fa -d tests/output/prepare/ -t tests/output/prepare/db -o tests/output/contigs/out

All things seem to run as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants