Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCMetagen.py: mysql error when setting up the taxonomy database #14

Closed
walshaw opened this issue Aug 17, 2020 · 5 comments
Closed

CCMetagen.py: mysql error when setting up the taxonomy database #14

walshaw opened this issue Aug 17, 2020 · 5 comments

Comments

@walshaw
Copy link

walshaw commented Aug 17, 2020

Thanks for producing this software. I'm having a problem running CCMetagen.py . The previous step of running kma seems to have worked fine, and the kma output files look normal. The problem seems to be with CCMetagen.py getting the taxonomy database ready the first time, rather than an issue with the input data. The problem appears to be the specification of a non-unique compound key when inserting the data.

Python 3.8.5, Debian 8.7. pandas and ete3 were installed via conda as per tutorial.

$ SOMEPATH/CCMetagen.py -i mySample_out_kma.res -o mySample_out_ccmetagen > mySample_ccmetagen_1.stdout 2> mySample_ccmetagen_1.stderr &

STDOUT content as follows:

Loading node names...
2267145 names loaded.
225185 synonyms loaded.
Loading nodes...
2267145 nodes loaded.
Linking nodes...
Tree is loaded.
Updating database: /myhomedir/.etetoolkit/taxa.sqlite ...
 2267000 generating entries...
Uploading to /myhomedir/.etetoolkit/taxa.sqlite

STDERR:

$ cat mySample_ccmetagen_1.stderr
NCBI database not present yet (first time used?)
Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...
Done. Parsing...
Inserting synonyms:      60000 Traceback (most recent call last):
  File "../../CCMetagen/CCMetagen.py", line 170, in <module>
    NCBITaxa()
  File "/myhomedir/miniconda3/envs/ccmetagen/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 110, in __init__
    self.update_taxonomy_database(taxdump_file)
  File "/myhomedir/miniconda3/envs/ccmetagen/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 129, in update_taxonomy_database
    update_db(self.dbfile)
  File "/myhomedir/miniconda3/envs/ccmetagen/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db
    upload_data(dbfile)
  File "/myhomedir/miniconda3/envs/ccmetagen/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 802, in upload_data
    db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))
sqlite3.IntegrityError: UNIQUE constraint failed: synonym.spname, synonym.taxid
@vrmarcelino
Copy link
Owner

Hi!

Thanks for reporting this issue.
This is a new problem with ete3 (or with one of its libraries), a module that CCMetagen uses.
While we wait for the ete3 team to release a new version with this fixed, there seems to be a temporary solution here: etetoolkit/ete#469
I asked more info as I couldn't find the file that needs to be edited, will keep you posted.

@vrmarcelino
Copy link
Owner

Hi again,

Already got a response (impressive!)

In your case, you'll need to edit the file "/myhomedir/miniconda3/envs/ccmetagen/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py"

At line 785, just delete the word "COLLATE NOCASE", so the line reads:
CREATE TABLE synonym (taxid INT,spname VARCHAR(50), PRIMARY KEY (spname, taxid));

@vrmarcelino
Copy link
Owner

Fixed, please update ete3 if you encounter this issue.
Thanks @jhcepas!

@Ahmed-Shibl
Copy link

Hi! I'm having this very similar issue with ete3/ncbi_taxonomy/ncbiquery.py

The command I ran was this (after a successful kma run):
CCMetagen.py -i ~/miniconda3/envs/ccmetagen/CD73/CD73_kma_output.res -o ~/miniconda3/envs/ccmetagen/CD73/CD73_ccmetagen_output --mapstat ~/miniconda3/envs/ccmetagen/CD73/CD73_kma_output.mapstat -ef y

and the output/error was:

Reading file ~/miniconda3/envs/ccmetagen/CD73/CD73_kma_output.res

~/miniconda3/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py:243: UserWarning: taxid 70781 was translated into 676058
  warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid]))
~/miniconda3/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py:243: UserWarning: taxid 431157 was translated into 2558455
  warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid]))
~/miniconda3/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py:243: UserWarning: taxid 293213 was translated into 2558453
  warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid]))
~/miniconda3/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py:243: UserWarning: taxid 1550498 was translated into 154029
  warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid]))
~/miniconda3/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py:243: UserWarning: taxid 1720309 was translated into 2652724
  warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid]))
csv file saved as ~/miniconda3/envs/ccmetagen/CD73/CD73_ccmetagen_output.csv

Writing ~/miniconda3/envs/ccmetagen/CD73/CD73_ccmetagen_output.html...
krona file saved as ~/miniconda3/envs/ccmetagen/CD73/CD73_ccmetagen_output.html

calculating read mapping stats...

Stats file saved as ~/miniconda3/envs/ccmetagen/CD73/CD73_ccmetagen_output_stats.csv

Proportion of reads mapped to the database: 18.412085%

I made the modification to the file as you suggested above {At line 785, just delete the word "COLLATE NOCASE"} but still got the same error.

In any case, I ended up with the expected .csv file, the .html file, the .tsv file, and the _stats.csv files - does this error have an effect on the results in any way?

Please let me know if you need any more information.
Thanks!

@vrmarcelino
Copy link
Owner

Hi!

This is not an error, it is just a common warning saying that a taxon nomenclature has changed, and CCMetagen is using the most up-to-date taxonomy. Your results will not be affected.

All the best,
Vanessa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants