[MRG] speed up prefetch by adjusting calls to use len(minhash)
#2132
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Several comparison dataclasses were introduced in #1955 and #2050, to facilitate gather/prefetch output and ANI calculation. Some of the methods in these dataclasses use
len(minhash.hashes)
which creates a new Python object and is typically quite slow.This PR replaces
len(minhash.hashes)
in several key locations withlen(minhash)
, which returns the same result but does so directly in Rust; on at least one benchmark, the speedup is about 10x (see numbers below).Fixes #2124
timings
Using the benchmark from #1771 (comment) and also used here, we find:
This PR:
latest
branch / v4.4.2 -flamegraph profile for this PR
flamegraph profile for
latest
branch (equiv v4.4.2)