creating custom db causes classification issues #257

najoshi · 2023-02-23T23:42:52Z

So I have created a custom database by taking the refseq proteins and adding proteins from a database called RUG2. When I run classification with the regular refseq database on one of my samples, I get about 18M classified reads. When I run the same sample with refseq plus RUG2, I only get about 11K reads. I don't understand why adding proteins to an existing database to create a new database results in so much fewer classifications. I'm happy to share any files you need to debug the issue. Any help would be highly appreciated.

pmenzel · 2023-02-24T15:27:42Z

Some points you can check:

the taxonomy must work out also with the RUG2 database: does your fasta file has proper headers with proper taxonomy IDs that are also contained in your names.dmp / tree.dmp
what happens when you make a kaiju index only of the RUG2 database and classify the reads
use one of the sequences from your DB and give it as input to kaiju -p to classify it, it should be found (obvisouly)

najoshi · 2023-03-01T05:32:40Z

So if I have some headers in my custom fasta file that do NOT have tax IDs that occur in nodes.dmp... will that cause problems?
When I run kaiju using only the RUG2 database, I get very few classifications.
When I get one of the RUG2 protein sequences and run it against my custom DB with kaiju -p, it DOES NOT classify it. So that's obviously a problem.

najoshi · 2023-03-01T11:18:43Z

Looks like if there is any header where the tax ID does not occur in nodes.dmp then it screws up the database. Once I took out the proteins that had tax IDs that don't occur in nodes.dmp (and proteins with X's in them), the database built properly and it seems to be classifying reads well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

creating custom db causes classification issues #257

creating custom db causes classification issues #257

najoshi commented Feb 23, 2023 •

edited

Loading

pmenzel commented Feb 24, 2023

najoshi commented Mar 1, 2023

najoshi commented Mar 1, 2023

creating custom db causes classification issues #257

creating custom db causes classification issues #257

Comments

najoshi commented Feb 23, 2023 • edited Loading

pmenzel commented Feb 24, 2023

najoshi commented Mar 1, 2023

najoshi commented Mar 1, 2023

najoshi commented Feb 23, 2023 •

edited

Loading