Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lowered barcode recognition of bonito basecalled data with Bonito 0.4.0 #176

Open
menickname opened this issue Aug 30, 2021 · 9 comments
Open
Assignees

Comments

@menickname
Copy link

Dear @iiSeymour

I do experience a similar issue as in issue 26 earlier. After training a new model using the Bonito 0.4.0 software the demultiplexing (qcat) command results in only 40-70% of the reads being addressed to the correct barcode. I am using a subset of my dataset which is already a selection of a single barcode within the initial fast5 files. Hence, I would expect a significant higher number (>80%) of reads designated to my barcode.

Thank you in advance.
Regards,
Nick

@menickname
Copy link
Author

Dear @iiSeymour

Any update on this yet?

Thanks a lot!

@iiSeymour iiSeymour self-assigned this Sep 14, 2021
@iiSeymour
Copy link
Member

Hey @menickname

See #175 - can you try with --ctc-min-coverage 0.99, also filtering out any lower quality reads should help.

@menickname
Copy link
Author

Dear @iiSeymour

Unfortunately this does not result in better demultiplexing. Only high quality and longest reads were used for model training. One of my datasets results in only 33.84% of the reads being classified (original Guppy basecalled dataset 70-80%) with or without the --ctc-min-coverage 0.99 option. The issues seems not to be solved in this way.

I would also be surprised that after Bonito basecalling I have a reduction of higher quality reads? I have verified my model on separate (single isolate and no multiplexed) files on the generation of higher accurate genomes of my species of interest and this gave a significant increase, hence I am rather surprised this is happening during the demultiplexing.

Any other thoughts?
Thank you in advance.

@mbhall88
Copy link

mbhall88 commented May 3, 2022

I have seen a similar issue when using a bonito-trained model with guppy. I lose a HUGE amount of reads to the dreaded "unclassified" bin.

Have you managed to find any way of recovering these lost reads @menickname?

@CWYuan08
Copy link

Hi @mbhall88 and @menickname, I am experiencing the exact same problem, do you have any update on this issue? any progress/experience will be greatly appreciated. Thank you very much!

@mbhall88
Copy link

Hi @CWYuan08, sadly no. I tried a lot of different things - e.g., chopping raw signal of the start and end before training etc. But to no avail.

I basically had to abandon the project as I couldn't justify losing so many reads to demultiplexing

@CWYuan08
Copy link

Thank you @mbhall88 for sharing your update, sorry to hear you had to stop there.

@menickname
Copy link
Author

hi @CWYuan08 and @mbhall88, I have indeed not found a solution on the Bonito demultiplexing itself. To still make use of the Bonito tool, I use demultiplexed files from MinKNOW as input. Since we are using a GridION sequencing device, we perform real-time super-accurate base calling and demultiplexing with Guppy (within MinKNOW). This generates both fastq and fast5 files per barcode. Then I simply use the demultiplexed fast5 files as input for the Bonito software. Not the most efficient solution, but it is how I can still use Bonito for base calling with custom models.

@fergsc
Copy link

fergsc commented Jun 2, 2023

Would using an existing model and improving it (--pretrained) for our species of interest be a better strategy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants