Skip to content

Conversation

@geraldpgreen
Copy link
Contributor

@geraldpgreen geraldpgreen commented Dec 7, 2025

SUMMARY

When to use SuccinctRank:

  • Memory-constrained environments where rank performance matters
  • Read-heavy workloads on immutable bitmaps (build once, query many)

Automated Checks

  • I have run ./gradlew test and made sure that my PR does not break any unit test.


for (int j = 1; j < superblockLimit; j++) {
packed |= (blockCumulative & BLOCK_MASK) << (BITS_PER_PACKED_BLOCK * (j - 1));
blockCumulative += Long.bitCount(highBits[wordIdx + j]);

Check failure

Code scanning / CodeQL

Array index out of bounds Error

This array access might be out of bounds, as the index might be equal to the array length + 6.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new succinct rank data structure for RoaringBitmaps that provides constant-time rank operations while minimizing memory overhead. The implementation uses a two-level indexing scheme with lookup tables for efficient queries on dense bitmaps.

Key Changes

  • Added SuccinctRank class that builds an immutable rank structure from a RoaringBitmap, optimized for memory-constrained environments with read-heavy workloads
  • Implements two query strategies: linear scan for small bitmaps (≤16 containers) and a succinct two-level index for larger bitmaps
  • Includes comprehensive test coverage with 894 lines of tests covering edge cases, container types, and validation against RoaringBitmap's native rank implementation

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
roaringbitmap/src/main/java/org/roaringbitmap/SuccinctRank.java Core implementation of the succinct rank data structure with build() method, rank() queries, and optimized bitmap container handling
roaringbitmap/src/test/java/org/roaringbitmap/TestSuccinctRank.java Comprehensive test suite covering empty bitmaps, sparse/dense patterns, multiple container types, edge cases, and correctness validation
jmh/src/jmh/java/org/roaringbitmap/RankBenchmark.java JMH benchmark comparing SuccinctRank against FastRankRoaringBitmap and native RoaringBitmap rank operations across different data distributions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 5 to 10
/**
* Succinct rank structure for RoaringBitmap.
*
* @author gerald.green
* @since Dec-2025
*/
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class-level documentation is very minimal. For a new data structure class, consider adding more detailed documentation explaining:

  • What a succinct rank structure is
  • What problem it solves compared to existing solutions (e.g., FastRankRoaringBitmap)
  • When to use it (the PR description mentions "memory-constrained environments" and "read-heavy workloads")
  • Trade-offs (e.g., immutability requirement, build cost vs. query performance)
  • Basic usage example

This would help users understand when and how to use this class effectively.

Copilot uses AI. Check for mistakes.
}

public RoaringBitmap snapshot() {
return this.bitmap;
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot() method returns the internal bitmap reference directly without cloning it. This is misleading because:

  1. The method name "snapshot" typically implies a copy or immutable view
  2. Users receiving this reference can modify the bitmap, which would break the rank structure's invariants
  3. The build() method's warning about not modifying the bitmap is not enforced

Consider either:

  • Returning bitmap.clone() to provide a true snapshot
  • Renaming the method to something like getUnderlyingBitmap() with clear documentation that modification is unsafe
  • Making the method package-private if it's only for testing
Suggested change
return this.bitmap;
return this.bitmap.clone();

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geraldpgreen Can you review this criticism, it seems valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of the assumption in this implementation is that the underlying bitmap is immutable. and for our use case, we always create new ones so it allows us to build it faster. let me adjust the name of the method to make it more clear


return rank;
}

Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing javadoc for public method cardinality(). Consider adding documentation explaining that this returns the total number of set bits in the bitmap.

Suggested change
/**
* Returns the total number of set bits in the bitmap.
*
* @return the cardinality (number of set bits) of the bitmap
*/

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geraldpgreen All public methods need to be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

public boolean usesLinearScan() {
return this.highBits == null;
}

Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing javadoc for public method containerCount(). Consider adding documentation explaining that this returns the number of containers in the underlying bitmap.

Suggested change
/**
* Returns the number of containers in the underlying bitmap.
*
* @return the number of containers in the underlying bitmap
*/

Copilot uses AI. Check for mistakes.
public long cardinality() {
return this.cumulativePerContainer[containerCount()];
}

Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing javadoc for public method snapshot(). Consider adding documentation explaining what this method returns and its intended usage. Given the method name, clarify whether this is a copy or a reference to the underlying bitmap.

Suggested change
/**
* Returns a reference to the underlying {@link RoaringBitmap} used by this {@code SuccinctRank}.
* <p>
* Note: This does <b>not</b> return a copy. Modifications to the returned bitmap may affect
* the state of this {@code SuccinctRank} instance.
*
* @return the underlying {@link RoaringBitmap} instance
*/

Copilot uses AI. Check for mistakes.
public RoaringBitmap snapshot() {
return this.bitmap;
}

Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing javadoc for public method usesLinearScan(). Consider adding documentation explaining when linear scan is used (bitmaps with <= 16 containers) vs. the succinct structure, and what performance implications this has.

Suggested change
/**
* Returns {@code true} if this rank structure uses a linear scan for rank queries,
* which occurs when the bitmap contains {@code <= 16} containers. In this case,
* rank queries are performed by scanning each container sequentially, which is
* efficient for small bitmaps but less performant for larger ones.
* <p>
* For bitmaps with more than 16 containers, a succinct rank structure is used,
* enabling faster rank queries via precomputed data structures at the cost of
* additional memory usage.
* <p>
* The choice between linear scan and succinct structure impacts both query
* performance and memory footprint.
*
* @return {@code true} if linear scan is used; {@code false} if succinct structure is used
*/

Copilot uses AI. Check for mistakes.
* @since Dec-2025
*/
public class SuccinctRank {

Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comment explaining why 16 is the threshold value for choosing between linear scan and succinct structure. This would help future maintainers understand the rationale behind this magic number.

Suggested change
/**
* Threshold for choosing between linear scan and succinct structure.
* For bitmaps with <= 16 containers, linear scan is faster and uses less memory.
* This value was chosen empirically based on performance benchmarks.
*/

Copilot uses AI. Check for mistakes.
source, highBits, highRankCount, cumulativePerContainer, containerCumulativeRanks);
}

// Two-level rank index: superblock counts + packed block counts
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider expanding this comment to explain what the two-level rank index is used for and how it enables O(1) rank queries. This would help maintainers understand the data structure better. For example: "Two-level rank index for high-key lookups: superblocks store absolute counts, packed blocks store relative counts within each superblock, enabling O(1) rank queries."

Suggested change
// Two-level rank index: superblock counts + packed block counts
/**
* Two-level rank index for high-key lookups:
* - Superblocks store absolute counts of set bits up to the start of each superblock.
* - Packed blocks store relative counts of set bits within each superblock.
* This structure enables O(1) rank queries by allowing fast computation of the number
* of set bits up to any given key: first by retrieving the superblock's absolute count,
* then adding the relative count from the packed block, and finally counting bits within
* the target word. This is crucial for efficient high-key rank queries in RoaringBitmap.
*/

Copilot uses AI. Check for mistakes.

for (int j = 1; j < superblockLimit; j++) {
packed |= (blockCumulative & BLOCK_MASK) << (BITS_PER_PACKED_BLOCK * (j - 1));
blockCumulative += Long.bitCount(highBits[wordIdx + j]);
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This array access might be out of bounds, as the index might be equal to the array length + 6.

Copilot uses AI. Check for mistakes.
@lemire
Copy link
Member

lemire commented Dec 8, 2025

@geraldpgreen Thanks.

We are likely to merge this. I think it is a good PR. AI found minor issues which I think you can quickly address.

@geraldpgreen
Copy link
Contributor Author

@lemire I think the feedback has been addressed in the latest iteration and let me know if it looks good for merging.

@lemire
Copy link
Member

lemire commented Dec 15, 2025

Running tests.

@lemire lemire merged commit 921047c into RoaringBitmap:master Dec 15, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants