-
Notifications
You must be signed in to change notification settings - Fork 584
a two-level succinct data structure that provides constant-time rank operations for dense RoaringBitmaps #813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…operations for dense RoaringBitmaps
|
|
||
| for (int j = 1; j < superblockLimit; j++) { | ||
| packed |= (blockCumulative & BLOCK_MASK) << (BITS_PER_PACKED_BLOCK * (j - 1)); | ||
| blockCumulative += Long.bitCount(highBits[wordIdx + j]); |
Check failure
Code scanning / CodeQL
Array index out of bounds Error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new succinct rank data structure for RoaringBitmaps that provides constant-time rank operations while minimizing memory overhead. The implementation uses a two-level indexing scheme with lookup tables for efficient queries on dense bitmaps.
Key Changes
- Added
SuccinctRankclass that builds an immutable rank structure from a RoaringBitmap, optimized for memory-constrained environments with read-heavy workloads - Implements two query strategies: linear scan for small bitmaps (≤16 containers) and a succinct two-level index for larger bitmaps
- Includes comprehensive test coverage with 894 lines of tests covering edge cases, container types, and validation against RoaringBitmap's native rank implementation
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| roaringbitmap/src/main/java/org/roaringbitmap/SuccinctRank.java | Core implementation of the succinct rank data structure with build() method, rank() queries, and optimized bitmap container handling |
| roaringbitmap/src/test/java/org/roaringbitmap/TestSuccinctRank.java | Comprehensive test suite covering empty bitmaps, sparse/dense patterns, multiple container types, edge cases, and correctness validation |
| jmh/src/jmh/java/org/roaringbitmap/RankBenchmark.java | JMH benchmark comparing SuccinctRank against FastRankRoaringBitmap and native RoaringBitmap rank operations across different data distributions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Succinct rank structure for RoaringBitmap. | ||
| * | ||
| * @author gerald.green | ||
| * @since Dec-2025 | ||
| */ |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class-level documentation is very minimal. For a new data structure class, consider adding more detailed documentation explaining:
- What a succinct rank structure is
- What problem it solves compared to existing solutions (e.g.,
FastRankRoaringBitmap) - When to use it (the PR description mentions "memory-constrained environments" and "read-heavy workloads")
- Trade-offs (e.g., immutability requirement, build cost vs. query performance)
- Basic usage example
This would help users understand when and how to use this class effectively.
| } | ||
|
|
||
| public RoaringBitmap snapshot() { | ||
| return this.bitmap; |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snapshot() method returns the internal bitmap reference directly without cloning it. This is misleading because:
- The method name "snapshot" typically implies a copy or immutable view
- Users receiving this reference can modify the bitmap, which would break the rank structure's invariants
- The build() method's warning about not modifying the bitmap is not enforced
Consider either:
- Returning
bitmap.clone()to provide a true snapshot - Renaming the method to something like
getUnderlyingBitmap()with clear documentation that modification is unsafe - Making the method package-private if it's only for testing
| return this.bitmap; | |
| return this.bitmap.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geraldpgreen Can you review this criticism, it seems valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part of the assumption in this implementation is that the underlying bitmap is immutable. and for our use case, we always create new ones so it allows us to build it faster. let me adjust the name of the method to make it more clear
|
|
||
| return rank; | ||
| } | ||
|
|
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing javadoc for public method cardinality(). Consider adding documentation explaining that this returns the total number of set bits in the bitmap.
| /** | |
| * Returns the total number of set bits in the bitmap. | |
| * | |
| * @return the cardinality (number of set bits) of the bitmap | |
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geraldpgreen All public methods need to be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
| public boolean usesLinearScan() { | ||
| return this.highBits == null; | ||
| } | ||
|
|
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing javadoc for public method containerCount(). Consider adding documentation explaining that this returns the number of containers in the underlying bitmap.
| /** | |
| * Returns the number of containers in the underlying bitmap. | |
| * | |
| * @return the number of containers in the underlying bitmap | |
| */ |
| public long cardinality() { | ||
| return this.cumulativePerContainer[containerCount()]; | ||
| } | ||
|
|
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing javadoc for public method snapshot(). Consider adding documentation explaining what this method returns and its intended usage. Given the method name, clarify whether this is a copy or a reference to the underlying bitmap.
| /** | |
| * Returns a reference to the underlying {@link RoaringBitmap} used by this {@code SuccinctRank}. | |
| * <p> | |
| * Note: This does <b>not</b> return a copy. Modifications to the returned bitmap may affect | |
| * the state of this {@code SuccinctRank} instance. | |
| * | |
| * @return the underlying {@link RoaringBitmap} instance | |
| */ |
| public RoaringBitmap snapshot() { | ||
| return this.bitmap; | ||
| } | ||
|
|
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing javadoc for public method usesLinearScan(). Consider adding documentation explaining when linear scan is used (bitmaps with <= 16 containers) vs. the succinct structure, and what performance implications this has.
| /** | |
| * Returns {@code true} if this rank structure uses a linear scan for rank queries, | |
| * which occurs when the bitmap contains {@code <= 16} containers. In this case, | |
| * rank queries are performed by scanning each container sequentially, which is | |
| * efficient for small bitmaps but less performant for larger ones. | |
| * <p> | |
| * For bitmaps with more than 16 containers, a succinct rank structure is used, | |
| * enabling faster rank queries via precomputed data structures at the cost of | |
| * additional memory usage. | |
| * <p> | |
| * The choice between linear scan and succinct structure impacts both query | |
| * performance and memory footprint. | |
| * | |
| * @return {@code true} if linear scan is used; {@code false} if succinct structure is used | |
| */ |
| * @since Dec-2025 | ||
| */ | ||
| public class SuccinctRank { | ||
|
|
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comment explaining why 16 is the threshold value for choosing between linear scan and succinct structure. This would help future maintainers understand the rationale behind this magic number.
| /** | |
| * Threshold for choosing between linear scan and succinct structure. | |
| * For bitmaps with <= 16 containers, linear scan is faster and uses less memory. | |
| * This value was chosen empirically based on performance benchmarks. | |
| */ |
| source, highBits, highRankCount, cumulativePerContainer, containerCumulativeRanks); | ||
| } | ||
|
|
||
| // Two-level rank index: superblock counts + packed block counts |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider expanding this comment to explain what the two-level rank index is used for and how it enables O(1) rank queries. This would help maintainers understand the data structure better. For example: "Two-level rank index for high-key lookups: superblocks store absolute counts, packed blocks store relative counts within each superblock, enabling O(1) rank queries."
| // Two-level rank index: superblock counts + packed block counts | |
| /** | |
| * Two-level rank index for high-key lookups: | |
| * - Superblocks store absolute counts of set bits up to the start of each superblock. | |
| * - Packed blocks store relative counts of set bits within each superblock. | |
| * This structure enables O(1) rank queries by allowing fast computation of the number | |
| * of set bits up to any given key: first by retrieving the superblock's absolute count, | |
| * then adding the relative count from the packed block, and finally counting bits within | |
| * the target word. This is crucial for efficient high-key rank queries in RoaringBitmap. | |
| */ |
|
|
||
| for (int j = 1; j < superblockLimit; j++) { | ||
| packed |= (blockCumulative & BLOCK_MASK) << (BITS_PER_PACKED_BLOCK * (j - 1)); | ||
| blockCumulative += Long.bitCount(highBits[wordIdx + j]); |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This array access might be out of bounds, as the index might be equal to the array length + 6.
|
@geraldpgreen Thanks. We are likely to merge this. I think it is a good PR. AI found minor issues which I think you can quickly address. |
|
@lemire I think the feedback has been addressed in the latest iteration and let me know if it looks good for merging. |
|
Running tests. |
…
SUMMARY
When to use SuccinctRank:
Automated Checks
./gradlew testand made sure that my PR does not break any unit test.