Question on kaiju classification from PacBio data #246

DaRinker · 2022-11-09T21:57:18Z

Looking for insight/confirmation that Kaiju is operating as expected. My results thus far are suggesting that kaigu is misclassify reads that are very string BLAST hits to divergent taxa.

Specifically, I ran kaiju using the nr_euk database over ~300000 Pacbio reads using kaiju default parameters. I just happed to spot-check a 6.6kb PacBio read that was classified by kaiju as best matching txid 36630. However that taxon doesn't appear even among the 100 top-BLAST hits (NCBI nr) and the top NCBI blast hit (88% coverage with 99% sequence identity) corresponds to NCBI:txid5061 (a divergent taxa that should also be present in the nr_euk)

Conversely, if I then take another read that Kaiju has classified as belonging to that divergent taxa (i.e. txid 5061), it actually BLASTs very strongly (99% match over 7.7kb) to a different taxon (NCBI:txid746128). That taxon is also very divergent from the taxon assigned by kaiju.

Any ideas what might be going on? Might kaiju be having problems with these long reads? I'm now concerned that I cannot trust any of the classifications I'm getting back.

EDIT: Am guessing I need to adjust some default parameters. Will begin with specifying a higher minimum score
EDIT2: Did not fix the problem. I tried running kaiju with more stringent score requirements (-s 500 and -s 1000) and those resulted in either the same misclassification or no classification (respectively). Any advice on what to try next?
EDIT3: I have now tried truncating all my PacBio reads to just the first 500bp (under the assumption that the long reads were somehow "breaking" kaiju. This (sort of) helped and I'm no longer seeing flagrant misclassifications; however, I'm now only able to classify ~25% of my reads, so still not there yet.

pguenzi-tiberi · 2023-07-24T14:12:54Z

Hello @DaRinker,

Have you continued your trials with Kaiju and long reads ? Did you manage to improve the classification scores ?

DaRinker · 2023-07-24T19:25:19Z

No. I decided it was the wrong tool for the job. Maybe there have been developments since? Right now I'm most confident when using Kaiju on Illumina reads.

pguenzi-tiberi · 2023-07-25T09:19:41Z

Thank you very much for your reply !!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on kaiju classification from PacBio data #246

Question on kaiju classification from PacBio data #246

DaRinker commented Nov 9, 2022 •

edited

Loading

pguenzi-tiberi commented Jul 24, 2023

DaRinker commented Jul 24, 2023

pguenzi-tiberi commented Jul 25, 2023

Question on kaiju classification from PacBio data #246

Question on kaiju classification from PacBio data #246

Comments

DaRinker commented Nov 9, 2022 • edited Loading

pguenzi-tiberi commented Jul 24, 2023

DaRinker commented Jul 24, 2023

pguenzi-tiberi commented Jul 25, 2023

DaRinker commented Nov 9, 2022 •

edited

Loading