Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](merge-on-write) avoid to check delete bitmap while lookup rowkey in some situation to reduce CPU cost #41480

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zhannngchen
Copy link
Contributor

@zhannngchen zhannngchen commented Sep 29, 2024

Proposed changes

Issue Number: close #xxx

MoW performs a lookup on the primary key index for each key during the data loading process, and when a key is hit in the index, it continues to check if the key has been marked for deletion. Generally this check is not very costly.
However, in some scenarios, users perform high-frequency real-time update operations on a larger table, and most of the writes are updating existing data. In this scenario, the version of the table grows very fast, and the delete bitmap is also dense because duplicate keys are continuously being written.
In this scenario, this check is very costly

  1. because it means calling the contains method of the roaring bitmap for almost every version of the rowset hit by an imported key to check if it has been marked for deletion
  2. due to the high frequency of imports, there are typically thousands of versions that are not merged to base compaction.
  3. because of the high duplication rate, every key is basically hit in the index
  4. so this means that for almost every imported key, a loop needs to be called up to thousands of times to check if it has been marked for deletion
  5. This overhead becomes very exaggerated when we are doing load jobs of about 100,000+ rows per second for a table

Here's a flame diagram for this scenario
image

For tables that don't use seq columns, and for non-column update imports, this check can be skipped. Even if a key is already marked for deletion, it's not a problem to mark it for deletion again as if it existed.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zhannngchen zhannngchen changed the title [opt](merge-on-write) avoid to check delete bitmap while lookup rowke… [opt](merge-on-write) avoid to check delete bitmap while lookup rowkey in some situation to reduce CPU cost Sep 29, 2024
@zhannngchen
Copy link
Contributor Author

run buildall

@zhannngchen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.29% (9627/25815)
Line Coverage: 28.69% (79688/277771)
Region Coverage: 28.12% (41210/146535)
Branch Coverage: 24.74% (20981/84818)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3a5b803fbd90693a22e6998e5d9537dc31e6d7c7_3a5b803fbd90693a22e6998e5d9537dc31e6d7c7/report/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants