-
Notifications
You must be signed in to change notification settings - Fork 91
Description
arXiv:1709.07821 Roaring Bitmaps: Implementation of an Optimized Software Library (Section 4.1.1) (Section 3.2)
Cardinality tracking was added in #127, however we're leaving some perf on the floor.
We also find that if we use bit-manipulation instructions and hand-tuned assembly, we can roughly
double the speed at which we can change bits in a bitset, compared to compiler-generated machine
code with all optimization flags activated. It is true not only when setting bits, but also when clearing
or flipping them—in which cases we need to use the btr and btc instructions instead. Informally,
we tested various C compilers and could not find any that could perform nearly as well as hand-tuned
assembly code. This may, of course, change in the future.
Planning
- Do we want to implement this optimization? 2x is a huge speed up, but only on x86
- The compiler is sufficiently smart to use
bts
but notsbb
, perhaps we can write rust that compiles to the asm we want- https://doc.rust-lang.org/std/primitive.u32.html#method.borrowing_sub
- Portable, but it may compile to more / higher latency μops on some architectures, we should be careful this is not a pessimization for them
- https://doc.rust-lang.org/core/arch/x86_64/fn._subborrow_u64.html
- Not portable, but that may be desirable (see above). Available in stable rust.
- https://doc.rust-lang.org/std/primitive.u32.html#method.borrowing_sub
- If not there's always the
asm!
macro (nightly only), or if want it in stable then we can write a small C lib for the asm and link it