-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quantification of compaction algorithms #7770
Labels
Comments
test_gc_feedback
: duplicate & adapt to measure / regress-test point-in-time space efficiency
5 tasks
This week:
|
problame
pushed a commit
that referenced
this issue
Jun 10, 2024
A simple API to collect some statistics after compaction to easily understand the result. The tool reads the layer map, and analyze range by range instead of doing single-key operations, which is more efficient than doing a benchmark to collect the result. It currently computes two key metrics: * Latest data access efficiency, which finds how many delta layers / image layers the system needs to iterate before returning any key in a key range. * (Approximate) PiTR efficiency, as in #7770, which is simply the number of delta files in the range. The reason behind that is, assume no image layer is created, PiTR efficiency is simply the cost of collect records from the delta layers, and the replay time. Number of delta files (or in the future, estimated size of reads) is a simple yet efficient way of estimating how much effort the page server needs to reconstruct a page. Signed-off-by: Alex Chi Z <chi@neon.tech>
This week, @problame to address his follow-up requests from #7867 (review) |
This issue was part of In the end, that work expanded into Q3 and we focussed solely on bottommost compaction. Bottommost compaction is very deterministic and hence, the existing quantification work in |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Child of 2024Q2 compaction work: #8001
This epic tracks the efforts to quantify any compaction algorithm's outcomes.
We had a brainstorming session some time back to come up with an (incomplete) set of potentially useful metrics: https://www.notion.so/neondatabase/Productionize-Tiered-Compaction-eca9b06aa1ae4c62bdf6cf40ab002eb6?pvs=4
Meeting notes / ideas:
(logical size + wal in PITR window)
= synthetic size, can just use thatsum(all layer files in index_part.json)
=> just thatDemo test case to adapt / apply the Python helpers to:
test_gc_feedback
Refs
The text was updated successfully, but these errors were encountered: