Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(rollup): use NSplit API from sroar to improve rollup performance #8092

Merged
merged 39 commits into from
Nov 30, 2021

Conversation

NamanJain8
Copy link
Contributor

@NamanJain8 NamanJain8 commented Nov 10, 2021

This PR improves the performance of the rollups. Also, it fixes memory issues of the bulk loader.

  • Use the optimized Split API from the sroar to split the bitmap (while doing rollup). This is an improvement over the recursive binary split that was very slow.
  • In bulk loader, use HandoverSkipList API instead of WriteBatch to avoid the memory constraints of doing a huge batch write.

Also, it introduces BitForbidPosting that limits the capability to store large posting lists that generate splits of length greater than the --limit max-splits flag. Once a posting list is marked as Forbidden it cannot be recovered.


This change is Reviewable

@github-actions github-actions bot added the area/bulk-loader Issues related to bulk loading. label Nov 10, 2021
@aman-bansal
Copy link
Contributor

In alpha max splits default value is probably missing. Secondly can we make default value of max-splits equal to nquad-limits. So that it wont affect currently running workflows for customers. They can set this variable to enhance the performance

@NamanJain8 NamanJain8 merged commit da9655b into master Nov 30, 2021
@NamanJain8 NamanJain8 deleted the naman/perf-split branch November 30, 2021 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bulk-loader Issues related to bulk loading.
Development

Successfully merging this pull request may close these issues.

4 participants