-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which part of over 200 X Hifi reads is selected ? #2241
Comments
At that point in the process it is picking random reads. Parameter readSamplingCoverage controls this behavior: https://canu.readthedocs.io/en/latest/parameter-reference.html#readsamplingcoverage. Increasing the coverage here will greatly slow down correction and the only real benefit is if there are sequences (e.g., plasmids) in the input. After correction, reads are further down-sampled to (roughly) 40x, but this does take into account 'rare' sequence. This is done because overlap-layout-consensus methods seem to suffer with excessive coverage. This particular parameter is corOutCoverage: https://canu.readthedocs.io/en/latest/faq.html?highlight=coroutcoverage#why-is-my-assembly-is-missing-my-favorite-short-plasmid |
Thanks for your in-time reply! I know Canu will only output 40X corrected reads when I was assembling with PacBio CLR reads. is there also a 'correction' process for Hifi reads?I have been supposing Hicanu will assemble Hifi reads directly. |
Oops, I missed that you had HiFi data! Sorry for the confusion. There is no correction phase for HiFi reads; they're also assumed to be pre-trimmed. |
There is still default subsampling for HiFi reads (it should have been 50x not 200x) because higher coverage doesn't help overlap-layout type algorithms. I would suggest subsampling your reads to 50x randomly, definitely do not select the longest reads with HiFi as those will be the lowest quality. |
There was a bug introduced in v2.2 that was setting the hifi subsampling to 200x as well instead of 50x. I suggest using maxInputCoverage=50 for hifi data until the next release. |
Hi, I am assembling a microorgaism genome with over 200X Hifi reads, but canu told be that only 200X was selected for assembling, do you have any ideas on which part of it was selected?
Canu says this:
For genome size of 28000000 bases,
retain 5600000000 bases (200.00X coverage).
Found 1106878 reads with 20999061795 bases (749.97X coverage).
Dropped 811663 reads with 15399054876 bases (549.97X coverage).
Retained 295215 reads with 5600006919 bases (200.00X coverage).
The text was updated successfully, but these errors were encountered: