Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: 1 taxid not found #65

Closed
bioprojects opened this issue Aug 29, 2020 · 4 comments
Closed

ValueError: 1 taxid not found #65

bioprojects opened this issue Aug 29, 2020 · 4 comments

Comments

@bioprojects
Copy link

bioprojects commented Aug 29, 2020

Thank you very much for your kind instruction
on the installation of mob_suite.

I've installed mob_suite in a server, but the following
test command caused an error of "ValueError: 1 taxid not found".
Could you let me know how to fix it?

(mob_suite) [k-yahara@gc016 plasmid-jp_Ohsumi]$ mob_typer
usage: mob_typer [-h] -i INFILE -o OUT_FILE [-a ANALYSIS_DIR] [-n NUM_THREADS] [-s SAMPLE_ID] [-f] [-x] [--min_rep_evalue MIN_REP_EVALUE]
[--min_mob_evalue MIN_MOB_EVALUE] [--min_con_evalue MIN_CON_EVALUE] [--min_length MIN_LENGTH] [--min_rep_ident MIN_REP_IDENT]
[--min_mob_ident MIN_MOB_IDENT] [--min_con_ident MIN_CON_IDENT] [--min_rep_cov MIN_REP_COV] [--min_mob_cov MIN_MOB_COV]
[--min_con_cov MIN_CON_COV] [--min_overlap MIN_OVERLAP] [-k] [--debug] [--plasmid_mash_db PLASMID_MASH_DB] [-m PLASMID_META]
[--plasmid_db_type PLASMID_DB_TYPE] [--plasmid_replicons PLASMID_REPLICONS] [--repetitive_mask REPETITIVE_MASK]
[--plasmid_mob PLASMID_MOB] [--plasmid_mpf PLASMID_MPF] [--plasmid_orit PLASMID_ORIT] [-d DATABASE_DIRECTORY]
[--primary_cluster_dist PRIMARY_CLUSTER_DIST] [--secondary_cluster_dist SECONDARY_CLUSTER_DIST] [-V]

$ mob_typer --infile test_rep.fas --out_file test_rep.mob_typer.out
2020-08-30 06:25:41,501 mob_suite.mob_typer INFO: Running Mob-typer version 3.0.0 [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_typer.py:163]
2020-08-30 06:25:41,501 mob_suite.mob_typer INFO: Processing fasta file test_plasmid.fas [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_typer.py:165]
2020-08-30 06:25:41,502 mob_suite.mob_typer INFO: SUCCESS: Found program blastn at /home/k-yahara/miniconda3/bin/blastn [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py:571]
2020-08-30 06:25:41,502 mob_suite.mob_typer INFO: SUCCESS: Found program makeblastdb at /home/k-yahara/miniconda3/bin/makeblastdb [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py:571]
2020-08-30 06:25:41,502 mob_suite.mob_typer INFO: SUCCESS: Found program tblastn at /home/k-yahara/miniconda3/bin/tblastn [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py:571]
2020-08-30 06:25:41,502 root INFO: Creating Lock file /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/databases/ETE3_DB.lock [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py:438]
2020-08-30 06:25:41,503 root INFO: Testing ETE3 taxonomy db /yshare2/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/databases/taxa.sqlite [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py:441]
Traceback (most recent call last):
File "/home/k-yahara/miniconda3/bin/mob_typer", line 10, in
sys.exit(main())
File "/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_typer.py", line 316, in main
dbstatus = ETE3_db_status_check(1, ETE3_LOCK_FILE, ETE3DBTAXAFILE, logging)
File "/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/utils.py", line 444, in ETE3_db_status_check
lineage = ncbi.get_lineage(taxid)
File "/home/k-yahara/miniconda3/lib/python3.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 238, in get_lineage
raise ValueError("%s taxid not found" %taxid)
ValueError: 1 taxid not found

The input file test_rep.fas is as follows:
>test_rep
TTGAAAAAAATATGTGTACTTATGAAGAAGGAACTTGTTGTCAAAGACAATGCACTAATAAATGCCAGTTATAATTTAGACCTTTCAGAACAACGTCTAATATTGTTAGCAATCCTTGAAGCTAGACAATCAAACACACCCAATGATAAAGATTTAACAATTCATGCTGAAAGCTATATCAACCATTTTAACGTTCATAGAAATACAGCCTATAAAGTCCTTAAAGATGCATGTAAGAGTCTATTTGATCGTAGATTCAGCTATCAAAAACTAACTCAGAAGGGCAACATTGAAAATGTAATAAGCCGATGGGTACAACGCATATCTTATGTTGAGAATGAAGCTCTTGTTCGTATTAAGTTTTCTGATGATGTTGTACCGTTGATTACAAACTTAGAAAAACACTTCACCAGTTATGAATTAGAACAAGTCAGTAGTTTAACCAGTGTTTACGCTATACGCTTATATGAATTGCTTATTGCATGGCGTAGTACTGGTAAAGTCATTTTGGTAGAGCTAGAAGAACTTAGATTAAAACTAGGTATAGAATCCCATGAATATAAGAGAATGGGGCAATTTAAAGAAAAAGTTTTACACCTTGCTATTGATCAAATAAACAAATACACCGATATAAAAGCAGAGTATGAACAACACAAACGTGGCCGTTCGATTATTGGCTTTTCATTTAAGTTTAAACAGAAACAACAACCCCAAAAAGCAGATTCCAAGCGAGCCCCTAACACCCCAGACTTCTTTGTCAAAATGACCGATGCACAACGCCATCTATTCGCCAATAAAATGTCTGAGATGCCTGAAATGAGCAAATATTCACAAGGCACAGAAAGCTATCAACAGTTTGCTATCCGTATCGCTGACATGCTTTTAGAGCCTGAAAAGTTTAGAGAGCTTTATCCAATCTTAGAAAAAGCAGGGTTTAAAGGTTAA

Many thanks again.

Koji Yahara

@jrober84
Copy link
Collaborator

MOB-suite uses the ETE3 NCBI taxonomy database and uses taxid 1 as a simple query to make sure that the database is in good health. If it is failing on that taxid, then I would suspect something is wrong with your ETE3 databases. Can you run mob_init and see if that runs successfully? If it does, then try running command again and if not can you post what the error message is?

@bioprojects
Copy link
Author

bioprojects commented Aug 31, 2020

Thank you very much for your quick response. The messages after "mob_init" are as follows. At the end, an error message of "Inserting synonyms: 135000 2020-09-01 05:04:29,745 mob_suite.utils ERROR: Init of ete3 library failed with error UNIQUE constraint failed: synonym.spname, synonym.taxid. Removing lock file [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:230]" appears.

mob_suite) [k-yahara@gc017 ~]$ mob_init
2020-09-01 05:02:04,100 mob_suite.utils INFO: Database directory folder already exists at /yshare2/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/databases [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:131]
2020-09-01 05:02:04,119 mob_suite.utils INFO: Placed lock file at /yshare2/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/databases/.lock [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:142]
2020-09-01 05:02:04,119 mob_suite.utils INFO: Initializing databases...this will take some time [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:165]
2020-09-01 05:02:04,119 mob_suite.utils INFO: Downloading databases...this will take some time [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:178]
2020-09-01 05:02:04,119 mob_suite.utils INFO: Trying mirror https://share.corefacility.ca/index.php/s/rYaAH7oxrSVtilN/download [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:182]
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 450M 100 450M 0 0 15.0M 0 0:00:29 0:00:29 --:--:-- 16.8M
2020-09-01 05:02:34,249 mob_suite.utils INFO: Downloading databases successful, now building databases at /yshare2/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/databases [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:191]
2020-09-01 05:02:34,249 mob_suite.utils INFO: Decompressing /yshare2/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/databases/data.tar.gz [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:107]
2020-09-01 05:02:49,926 mob_suite.utils INFO: Building repetitive mask database [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:204]
2020-09-01 05:02:50,874 mob_suite.utils INFO: Building complete plasmid database [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:208]
2020-09-01 05:03:13,742 mob_suite.utils INFO: Sketching complete plasmid database [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:212]
2020-09-01 05:03:21,819 mob_suite.utils INFO: Init ete3 library ... [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:224]
Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...
Done. Parsing...
Loading node names...
2269838 names loaded.
226442 synonyms loaded.
Loading nodes...
2269838 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /yshare2/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/databases/taxa.sqlite ...
2269000 generating entries...
Uploading to /yshare2/home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/databases/taxa.sqlite

Inserting synonyms: 135000 2020-09-01 05:04:29,745 mob_suite.utils ERROR: Init of ete3 library failed with error UNIQUE constraint failed: synonym.spname, synonym.taxid. Removing lock file [in /home/k-yahara/miniconda3/lib/python3.7/site-packages/mob_suite/mob_init.py:230]

@kbessonov1984
Copy link
Collaborator

kbessonov1984 commented Sep 1, 2020

Hello, I've also got a similar error using ete3 library. Apparently there is a bug in that library when building a taxonomy sqlite database detailed here and here. The easiest way is to make sure to install ete3>=3.1.2 as it incorporates bug fix (ete3 issue 469) with duplicated taxid. This new ete3 version was only released 3 days ago. Theoretically conda recipe for mob-suite should pull the newest ete3 library, but due to very recent ete3 release, install it first manually making sure the version is correct after install (conda list | grep ete3) and that channel is etetoolkit (not bioconda). The following should appear after library install ete3 3.1.2 pyh39e3cac_0 etetoolkit. The newest ete3 version is currently only available from the etetoolkit conda channel and eventually will appear in the bioconda channel.

conda create -n mob_suite-3.0.0 -y
conda install -c etetoolkit ete3
conda install -c bioconda mob_suite=3.0.0=py_2  -y
mob_init

After running these commands there should be no more issues. I was able to successfully initialize mob-suite databases as below

...
Updating database: /Drives/K/kbessono/.conda/envs/mob_suite-3.0.0/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite ...
 2271000 generating entries... 
Uploading to /Drives/K/kbessono/.conda/envs/mob_suite-3.0.0/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite

Inserting synonyms:      225000 
Inserting taxid merges:  55000 
Inserting taxids:       2270000 
2020-09-01 23:23:25,811 mob_suite.utils INFO: Removed residual taxdump.tar.gz as ete3 is not doing proper cleaning job. [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.0/lib/python3.8/site-packages/mob_suite/mob_init.py:236]
2020-09-01 23:23:25,818 mob_suite.utils INFO: MOB init completed successfully [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.0/lib/python3.8/site-packages/mob_suite/mob_init.py:248]

@bioprojects
Copy link
Author

Thank you so much for your quick response. Yes, it worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants