-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very subtle differences from minimap2 #75
Comments
I don't think quality scores are used in minimap2 for mapping. The mapping functions are here: The second function is mm_map_frag which has qlens argument, but I believe that is query lengths, rather than quality lengths. Happy to be wrong, but I don't see it. As for passing in query name, I can fix that. I think I was just trying to minimize string copies everywhere, and conversion to/from CStr. |
Hi @jguhlin, Thanks for the feedback on this. As I said, I’m not certain what the exact causes of the small discrepancies are. Essentially, what I observe is that out of 10M (simulated) long reads, both native minimap2 and minimap2-rs (with compatible settings) return the same number of mapped reads. The difference I observe is that standard minimap2 reports approximately 6 more supplementary alignments than similarly-configured minimap2-rs. This discrepancy is so small (and the assessed results using these mappings for quantification so similar), that I do not have any major concern. Yet, it’s strange to me that there should be any difference once the seed is set to be the same, as I thought the algorithm was deterministic (and, indeed, I observe exactly the same numbers under multiple runs of either standard minimap2 or using the minimap2-rs library - they consistently differ by this tiny amount, but are perfectly replicable over many runs). I also agree with your design choice around the —Rob |
Hi @jguhlin, Ok, so I've figured this out and it is indeed because of cases where there are chain ties and the hashing of the name breaks them in a way that makes certain alignments supplementary and others not. I've been able to achieve exact concordance with command-line Thanks! |
So this should be fixed now, but if not, let's work on a test for it and I'll try to figure out what went wrong. Thanks for all your work on this! |
I'm just trying to track down the very subtle differences in aligning a dataset between the minimap2-rs bindings and what is output by the reference minimap2 implementation.
Mostly everything is the same, but I noticed some small differences (mostly in the number of supplementary alignments so far, so it's unlikely to affect anything important). However, it raises a broader issue about the degree to which reproducibility might be expected.
For example, one thing I noticed is that the current codebase has the ability to map a read against the index but (1) it doesn't take as input (and therefore doesn't consider) the quality values of the read. What would be required to have a
map_with_qual
function that also takes the quality string? Also I'm not sure it's related to my issue but (2) themap
function also ignores the read name. However, the--seed
parameter of minimap2 mentions this is used in tie breaking procedures:is this something that is done at the level of an ffi function currently exposed, and if so, what would be required to have another
map_with_name_and_qual
function that takes all 3 of these slices as input?Anyway, these are very minor details, and so far, using minimap2-rs has been a breeze! Check out the
dev
branch ofoarfish
if you want to see how we're currently using it; specifically here.--Rob
The text was updated successfully, but these errors were encountered: