-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lowered barcode recognition of bonito basecalled data with Bonito 0.4.0 #176
Comments
Dear @iiSeymour Any update on this yet? Thanks a lot! |
Hey @menickname See #175 - can you try with |
Dear @iiSeymour Unfortunately this does not result in better demultiplexing. Only high quality and longest reads were used for model training. One of my datasets results in only 33.84% of the reads being classified (original Guppy basecalled dataset 70-80%) with or without the --ctc-min-coverage 0.99 option. The issues seems not to be solved in this way. I would also be surprised that after Bonito basecalling I have a reduction of higher quality reads? I have verified my model on separate (single isolate and no multiplexed) files on the generation of higher accurate genomes of my species of interest and this gave a significant increase, hence I am rather surprised this is happening during the demultiplexing. Any other thoughts? |
I have seen a similar issue when using a bonito-trained model with guppy. I lose a HUGE amount of reads to the dreaded "unclassified" bin. Have you managed to find any way of recovering these lost reads @menickname? |
Hi @mbhall88 and @menickname, I am experiencing the exact same problem, do you have any update on this issue? any progress/experience will be greatly appreciated. Thank you very much! |
Hi @CWYuan08, sadly no. I tried a lot of different things - e.g., chopping raw signal of the start and end before training etc. But to no avail. I basically had to abandon the project as I couldn't justify losing so many reads to demultiplexing |
Thank you @mbhall88 for sharing your update, sorry to hear you had to stop there. |
hi @CWYuan08 and @mbhall88, I have indeed not found a solution on the Bonito demultiplexing itself. To still make use of the Bonito tool, I use demultiplexed files from MinKNOW as input. Since we are using a GridION sequencing device, we perform real-time super-accurate base calling and demultiplexing with Guppy (within MinKNOW). This generates both fastq and fast5 files per barcode. Then I simply use the demultiplexed fast5 files as input for the Bonito software. Not the most efficient solution, but it is how I can still use Bonito for base calling with custom models. |
Would using an existing model and improving it ( |
Dear @iiSeymour
I do experience a similar issue as in issue 26 earlier. After training a new model using the Bonito 0.4.0 software the demultiplexing (qcat) command results in only 40-70% of the reads being addressed to the correct barcode. I am using a subset of my dataset which is already a selection of a single barcode within the initial fast5 files. Hence, I would expect a significant higher number (>80%) of reads designated to my barcode.
Thank you in advance.
Regards,
Nick
The text was updated successfully, but these errors were encountered: