-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency in read numbers between Bonito input and output #407
Comments
When you prepare data with There are two things that cause the difference in the number of input reads compared to output chunks:
|
Thank you so much for the reply! Please correct me if I'm misunderstood: Based on this, I did have some following questions:
Thanks again! Best, |
Dear bonito team,
Thank you for providing this powerful tool! I’m a new user and recently encountered an issue regarding inconsistent read numbers between my input and output after basecalling with Bonito.
Initially, I extracted read IDs from POD5 files using:
This resulted in all_reads.txt containing 1,837,954 reads.
Due to limited RAM on our workstation, I performed chunk-wise basecalling by splitting the read IDs:
and then I ran the following Bonito basecalling command:
However, when examining the outputs (references.npy), I found discrepancies in the total number of processed reads:
The total number of reads after basecalling is 1,772,052, differing significantly from the original 1,837,954 reads.
The training step shows even less reads of 1718890
Could you please help me understand the potential reasons for this inconsistency?
Thank you for your assistance!
Best regards,
Kun
The text was updated successfully, but these errors were encountered: