Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on kaiju classification from PacBio data #246

Open
DaRinker opened this issue Nov 9, 2022 · 3 comments
Open

Question on kaiju classification from PacBio data #246

DaRinker opened this issue Nov 9, 2022 · 3 comments

Comments

@DaRinker
Copy link

DaRinker commented Nov 9, 2022

Looking for insight/confirmation that Kaiju is operating as expected. My results thus far are suggesting that kaigu is misclassify reads that are very string BLAST hits to divergent taxa.

Specifically, I ran kaiju using the nr_euk database over ~300000 Pacbio reads using kaiju default parameters. I just happed to spot-check a 6.6kb PacBio read that was classified by kaiju as best matching txid 36630. However that taxon doesn't appear even among the 100 top-BLAST hits (NCBI nr) and the top NCBI blast hit (88% coverage with 99% sequence identity) corresponds to NCBI:txid5061 (a divergent taxa that should also be present in the nr_euk)

Conversely, if I then take another read that Kaiju has classified as belonging to that divergent taxa (i.e. txid 5061), it actually BLASTs very strongly (99% match over 7.7kb) to a different taxon (NCBI:txid746128). That taxon is also very divergent from the taxon assigned by kaiju.

Any ideas what might be going on? Might kaiju be having problems with these long reads? I'm now concerned that I cannot trust any of the classifications I'm getting back.

EDIT: Am guessing I need to adjust some default parameters. Will begin with specifying a higher minimum score
EDIT2: Did not fix the problem. I tried running kaiju with more stringent score requirements (-s 500 and -s 1000) and those resulted in either the same misclassification or no classification (respectively). Any advice on what to try next?
EDIT3: I have now tried truncating all my PacBio reads to just the first 500bp (under the assumption that the long reads were somehow "breaking" kaiju. This (sort of) helped and I'm no longer seeing flagrant misclassifications; however, I'm now only able to classify ~25% of my reads, so still not there yet.

@pguenzi-tiberi
Copy link

Hello @DaRinker,

Have you continued your trials with Kaiju and long reads ? Did you manage to improve the classification scores ?

@DaRinker
Copy link
Author

No. I decided it was the wrong tool for the job. Maybe there have been developments since? Right now I'm most confident when using Kaiju on Illumina reads.

@pguenzi-tiberi
Copy link

Thank you very much for your reply !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants