-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(pageserver): quantify compaction outcome #7867
Conversation
3268 tests run: 3116 passed, 0 failed, 152 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
45cad40 at 2024-06-05T19:26:39.506Z :recycle: |
Signed-off-by: Alex Chi Z <chi@neon.tech>
c16c201
to
fd46dc5
Compare
Updated the tool to become an HTTP interface so that the Python tests can read it. |
Signed-off-by: Alex Chi Z <chi@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
daf73bb
to
f8ed5a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds code to determine, for a given layer map snapshot, the amount of delta layers that need to be visited before we hit an image layer when reconstructing any key in the layer map.
That metric is what I'd laxely call delta layer stack height.
It is a rough proxy metric for random getpage@lsn IO amplification under the assumption of uniform density on key & LSN dimension among all delta layers in the layer map. I.e., the probability of finding X amount of information about a random (key,lsn) \in key-lsn-range(L)
of a given layer L
is the same for all layers L
.
While this is useful, we wanted the point-in-time & total space efficiency metrics.
I suppose to calculate worst-case point-in-time space usage, we'd need a similar analysis but along the LSN dimension.
In addition to the missing metrics, I suggest to move the analysis code into a sub-module of mod timeline
that extends impl Timeline
.
E.g., like we did for the compaction code:
impl Timeline { |
Lastly, what about branching? Not covered in this PR.
I suggest the way forward:
- Apply renaming to submodule in this PR, then let's get it merged.
- Another PR to add branching support (just build a temporary LayerMap instance that contains all the layers from all recusrive ancestors)
- Another PR to extend the analysis for point-in-time space efficiency.
Hm, and one more thought: The |
Co-authored-by: Christian Schwarz <christian@neon.tech>
Signed-off-by: Alex Chi Z <chi@neon.tech>
Yep, that makes sense. I will submit a separate pull request for that. |
Ready for review again :) Hopefully I've resolved all the concerns and we have quite some future works for this analysis code to be very useful. |
I'll continue to work on the quantification efforts, implementing my asks above, while @skyzh will work on #7948 and follow-ups. |
Problem
A simple API to collect some statistics after compaction to easily understand the result.
The tool reads the layer map, and analyze range by range instead of doing single-key operations, which is more efficient than doing a benchmark to collect the result. It currently computes two key metrics:
Summary of changes
Checklist before requesting a review
Checklist before merging