Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mas i370 deletepending #377

Merged
merged 3 commits into from
May 24, 2022
Merged

Mas i370 deletepending #377

merged 3 commits into from
May 24, 2022

Conversation

martinsumner
Copy link
Owner

@martinsumner martinsumner commented May 18, 2022

Previously delete_confirmation was blocked on work_ongoing.

However, if the penciller has a work backlog, work_ongoing may be a recurring problem ... and some files, may remain undeleted long after their use - lifetimes for L0 fails in particular have seen to rise from 10-15s to 5m +.

Letting L0 files linger can have a significant impact on memory. In put-heavy tests (e.g. when testing riak-admin transfers) the memory footprint of a riak node has bene observed peaking more than 80% above normal levels, when compared to using this patch.

This PR allows for deletes to be confirmed even when there is work ongoing, by postponing the updating of the manifest until the manifest is next returned from the clerk.

Previously if there was ongoing work (i.e. the clerk had control over the manifest), the penciller could not confirm deletions.  Now it may confirm, and defer the required manifest update to a later date (prompted by another delete confirmation request).
Rather than waiting on next delete confirmation request
@martinsumner
Copy link
Owner Author

martinsumner commented May 18, 2022

The following chart shows memory usage on the joining node over a 15-hour node join process (with transfer-limit of 6) - comparing runs with 3.0.9, 3.0.10 (without this PR) and 3.0.10 (with this PR e.g. +leveled370):

RiakMemory_NodeJoin

To demonstrate the difference, the following chart shows the same three tests, but this time comparing the average lifetime of the L0 SST files (in seconds):

L0LifeTime_NodeJoin

Copy link
Contributor

@ThomasArts ThomasArts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

src/leveled_pmanifest.erl Outdated Show resolved Hide resolved
Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
@martinsumner martinsumner merged commit 2648c9a into develop-3.0 May 24, 2022
martinsumner added a commit that referenced this pull request May 24, 2022
Previously delete_confirmation was blocked on work_ongoing.

However, if the penciller has a work backlog, work_ongoing may be a recurring problem ... and some files, may remain undeleted long after their use - lifetimes for L0 fails in particular have seen to rise from 10-15s to 5m +.

Letting L0 files linger can have a significant impact on memory. In put-heavy tests (e.g. when testing riak-admin transfers) the memory footprint of a riak node has bene observed peaking more than 80% above normal levels, when compared to using this patch.

This PR allows for deletes to be confirmed even when there is work ongoing, by postponing the updating of the manifest until the manifest is next returned from the clerk.

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants