Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error occurred while ETE3 updating NCBI taxonomy database #2

Closed
SamLMG opened this issue Aug 27, 2020 · 5 comments
Closed

Error occurred while ETE3 updating NCBI taxonomy database #2

SamLMG opened this issue Aug 27, 2020 · 5 comments
Labels
Dependency issue Not a bug in the code, but the dependency. Fixed

Comments

@SamLMG
Copy link

SamLMG commented Aug 27, 2020

Hello,
I'm having the following issue with ete3 when running the ./ncbi.py script
Do you have any ideas what the issue here might be?
Thanks in advance
Sam

(mitoflex) [leeming@l33 test1]$ ../ncbi.py
Filesystem status:
Total: 109.00 GB
Free: 98.00 GB

If the free disk space is too low (<1G), database updating can be failed!
Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...
Done. Parsing...
Loading node names...
2269854 names loaded.
226194 synonyms loaded.
Loading nodes...
2269854 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /home/lv71312/leeming/.etetoolkit/taxa.sqlite ...
2269000 generating entries...
Uploading to /home/lv71312/leeming/.etetoolkit/taxa.sqlite

Inserting synonyms: 60000 Errors occured when fetching data from NCBI database, falling back to the last fetched database.
Loading node names...
2269574 names loaded.
225837 synonyms loaded.
Loading nodes...
2269574 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /home/lv71312/leeming/.etetoolkit/taxa.sqlite ...
2269000 generating entries...
Uploading to /home/lv71312/leeming/.etetoolkit/taxa.sqlite
Traceback (most recent call last):
File "../ncbi.py", line 65, in
ncbi.update_taxonomy_database()
File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 129, in update_taxonomy_database
update_db(self.dbfile)
File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db
upload_data(dbfile)
File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 802, in upload_data
db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))
sqlite3.IntegrityError: UNIQUE constraint failed: synonym.spname, synonym.taxid

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "../ncbi.py", line 80, in
ncbi = NCBITaxa(taxdump_file=os.path.abspath(dump_file))
File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 106, in init
self.update_taxonomy_database(taxdump_file)
File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 131, in update_taxonomy_database
update_db(self.dbfile, taxdump_file)
File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db
upload_data(dbfile)
File "/home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 791, in upload_data
db.execute(cmd)
sqlite3.OperationalError: database is locked

@Prunoideae
Copy link
Owner

Prunoideae commented Aug 27, 2020

Hello SamLMG,
This error is maybe there's something malfunctioned in the taxonomy database, or an possible situation not covered in ete3's code.
There's a error report and solution for this on the ETE toolkit's Google Group, also, this is already an official issue on ete3's repository [etetoolkit/ete#469]

If you need a fix, I would suggest you to do as follow:

  1. Goto the ncbiquery.py, which is located in /home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py as traceback shown.
  2. Edit the file at line 802, which should be db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)).
  3. Change it to db.execute("INSERT OR REPLACE INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)).
  4. Delete the sql database, which is located in ~/.etetoolkit/taxa.sqlite, and rerun the ncbi.py.

This maybe caused by a change of NCBI's taxonomy data, which unexpectedly generated same entries, and broke the code.
The future plan to fix this is to update the ete3 once that issue is closed and fixed, I will have this issue open to that time.

@Prunoideae Prunoideae added Dependency issue Not a bug in the code, but the dependency. Pending update Waiting for next update to do something further. labels Aug 27, 2020
@Prunoideae
Copy link
Owner

Prunoideae commented Aug 27, 2020

Or, you can use the already downloaded taxdump.tar.gz.

If so, please do as follow:

  1. Delete the sql database, path mentioned as above.
  2. Launch the python interpreter at MitoFlex's root directory.
  3. Enter these :
from ete3 import NCBITaxa
from os import path
ncbi = NCBITaxa(taxdump_file=path.abspath('taxdump.tar.gz')

This method reused the old taxdump.tar.gz previously downloaded, and should not be affected by current NCBI's change. Though newly added taxonomy record will not present, this is enough for most the program's function.

Prunoideae added a commit that referenced this issue Aug 27, 2020
reference to #2, this enabled a choice to directly use already downloaded file in the repo, without downloading it.
@SamLMG
Copy link
Author

SamLMG commented Aug 28, 2020

Hello SamLMG,
This error is maybe there's something malfunctioned in the taxonomy database, or an possible situation not covered in ete3's code.
There's a error report and solution for this on the ETE toolkit's Google Group, also, this is already an official issue on ete3's repository [etetoolkit/ete#469]

If you need a fix, I would suggest you to do as follow:

  1. Goto the ncbiquery.py, which is located in /home/lv71312/leeming/miniconda3/envs/mitoflex/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py as traceback shown.
  2. Edit the file at line 802, which should be db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)).
  3. Change it to db.execute("INSERT OR REPLACE INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname)).
  4. Delete the sql database, which is located in ~/.etetoolkit/taxa.sqlite, and rerun the ncbi.py.

This maybe caused by a change of NCBI's taxonomy data, which unexpectedly generated same entries, and broke the code.
The future plan to fix this is to update the ete3 once that issue is closed and fixed, I will have this issue open to that time.

Hi,
Thanks for the fix. The database is now successfully updated

@Prunoideae Prunoideae pinned this issue Aug 28, 2020
@Prunoideae Prunoideae changed the title error updating ncbi database Error occurred while ETE3 updating NCBI taxonomy database Aug 28, 2020
Prunoideae added a commit that referenced this issue Aug 29, 2020
To prevent problems from #2 since ete3 is likely to be inactive in future days.
@Prunoideae Prunoideae added Fixed and removed Pending update Waiting for next update to do something further. labels Aug 29, 2020
@Prunoideae
Copy link
Owner

Fixed by hacking into ETE3 module in ncbi.py, replacing the wrong database query method by the correct one.

The main part of MitoFlex did not have a patch like this to prevent instability occurring from inside, so running ncbi.py will be a necessary step in installation.

@JanDrouaud
Copy link

Hi all,
I think this happens because of the "COLLATE NOCASE" statement used for creating the synonym table. Consequently taxid/synonym pairs that appear as identical because the case is not considered for comparison .. cause a SQLite insertion error..
So you can either delete this statement, or include the line :
db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))
in a try / except block like this:
try: db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))
except: print(i,taxid,spname)
That way you really don't modify the expected ouput of ete3 and get track of the taxid/synonym pairs that were skipped.
Jan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependency issue Not a bug in the code, but the dependency. Fixed
Projects
None yet
Development

No branches or pull requests

3 participants