You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking for insight/confirmation that Kaiju is operating as expected. My results thus far are suggesting that kaigu is misclassify reads that are very string BLAST hits to divergent taxa.
Specifically, I ran kaiju using the nr_euk database over ~300000 Pacbio reads using kaiju default parameters. I just happed to spot-check a 6.6kb PacBio read that was classified by kaiju as best matching txid 36630. However that taxon doesn't appear even among the 100 top-BLAST hits (NCBI nr) and the top NCBI blast hit (88% coverage with 99% sequence identity) corresponds to NCBI:txid5061 (a divergent taxa that should also be present in the nr_euk)
Conversely, if I then take another read that Kaiju has classified as belonging to that divergent taxa (i.e. txid 5061), it actually BLASTs very strongly (99% match over 7.7kb) to a different taxon (NCBI:txid746128). That taxon is also very divergent from the taxon assigned by kaiju.
Any ideas what might be going on? Might kaiju be having problems with these long reads? I'm now concerned that I cannot trust any of the classifications I'm getting back.
EDIT: Am guessing I need to adjust some default parameters. Will begin with specifying a higher minimum score
EDIT2: Did not fix the problem. I tried running kaiju with more stringent score requirements (-s 500 and -s 1000) and those resulted in either the same misclassification or no classification (respectively). Any advice on what to try next?
EDIT3: I have now tried truncating all my PacBio reads to just the first 500bp (under the assumption that the long reads were somehow "breaking" kaiju. This (sort of) helped and I'm no longer seeing flagrant misclassifications; however, I'm now only able to classify ~25% of my reads, so still not there yet.
The text was updated successfully, but these errors were encountered:
No. I decided it was the wrong tool for the job. Maybe there have been developments since? Right now I'm most confident when using Kaiju on Illumina reads.
Looking for insight/confirmation that Kaiju is operating as expected. My results thus far are suggesting that kaigu is misclassify reads that are very string BLAST hits to divergent taxa.
Specifically, I ran kaiju using the nr_euk database over ~300000 Pacbio reads using kaiju default parameters. I just happed to spot-check a 6.6kb PacBio read that was classified by kaiju as best matching txid 36630. However that taxon doesn't appear even among the 100 top-BLAST hits (NCBI nr) and the top NCBI blast hit (88% coverage with 99% sequence identity) corresponds to NCBI:txid5061 (a divergent taxa that should also be present in the nr_euk)
Conversely, if I then take another read that Kaiju has classified as belonging to that divergent taxa (i.e. txid 5061), it actually BLASTs very strongly (99% match over 7.7kb) to a different taxon (NCBI:txid746128). That taxon is also very divergent from the taxon assigned by kaiju.
Any ideas what might be going on? Might kaiju be having problems with these long reads? I'm now concerned that I cannot trust any of the classifications I'm getting back.
EDIT: Am guessing I need to adjust some default parameters. Will begin with specifying a higher minimum score
EDIT2: Did not fix the problem. I tried running kaiju with more stringent score requirements (-s 500 and -s 1000) and those resulted in either the same misclassification or no classification (respectively). Any advice on what to try next?
EDIT3: I have now tried truncating all my PacBio reads to just the first 500bp (under the assumption that the long reads were somehow "breaking" kaiju. This (sort of) helped and I'm no longer seeing flagrant misclassifications; however, I'm now only able to classify ~25% of my reads, so still not there yet.
The text was updated successfully, but these errors were encountered: