Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction causes full reimaging #2948

Open
bojanserafimov opened this issue Nov 28, 2022 · 3 comments
Open

Compaction causes full reimaging #2948

bojanserafimov opened this issue Nov 28, 2022 · 3 comments
Labels
t/bug Issue Type: Bug

Comments

@bojanserafimov
Copy link
Contributor

Steps to reproduce

I haven't locally reproduced it, but I saw it happen in prod 9 times on the same project, odd-brook-545806. I used the timeline visualizer to spot the problem, highlighted in red Screenshot from 2022-11-28 12-06-20

Expected result

Compaction makes L1 layers, which trigger image creation only where needed.

Actual result

Three little L1 layers cause the entire database to get covered in new image layers. As a result, we have 9 full unnecessary copies of the database.

Also this bug complicates perf testing, since reused projects will behave differently than fresh projects.

Environment

prod

Why is this happening?

We take an image when there are at least 3 L1 layers over the latest image. But every compaction creates 2 very long L1 layers that span all the way to the left and right of the keyspace, because there are some nonrel keys in those parts of the keyspace that get updated often.

Proposal for solution

Instead of creating images, include these long and sparse delta layers in the next compaction round. Basically treat them as L0 layers. Redefine the definition of L1 to be "a sufficiently dense layer, regardless of how many times it's been compacted".

@bojanserafimov bojanserafimov added the t/bug Issue Type: Bug label Nov 28, 2022
@knizhnik
Copy link
Contributor

And what do you think about my "partial image layers" approach (#2563) ?
It also doesn't suffer from this problem and doesn't require compaction (reshuffling) at all.

@bojanserafimov
Copy link
Contributor Author

And what do you think about my "partial image layers" approach

Do you have a text explanation that goes with the presentation? I get the concept of "delta with page images", but I'm missing a lot of specifics on changes to heuristics. Do we still take regular images after x deltas, per partition?

@knizhnik
Copy link
Contributor

knizhnik commented Jan 9, 2023

And what do you think about my "partial image layers" approach

Do you have a text explanation that goes with the presentation? I get the concept of "delta with page images", but I'm missing a lot of specifics on changes to heuristics. Do we still take regular images after x deltas, per partition?

It seems to me that I have answered this question somewhere (may be in Slack?).
With partial image layers there is no need to create full image layers frequently, because there are two main reasons for generation of full image layers:

  1. Restrict number of applied deltas by walredo.
  2. Make it possible to garbage collect old layers.
    With partial images walredo is not needed and GC can be performed relatively rare taken in account current PiTR settings (several days).

knizhnik added a commit that referenced this issue Jan 15, 2023
knizhnik added a commit that referenced this issue Feb 9, 2023
knizhnik added a commit that referenced this issue Feb 22, 2023
## Describe your changes

This is yet another attempt to address problem with storage size
ballooning #2948
Previous PR #3348 tries to address this problem by maintaining list of
holes for each layer.
The problem with this approach is that we have to load all layer on
pageserver start.
Lazy loading of layers is not possible any more.

This PR tries to collect information of N largest holes on compaction
time and exclude this holes from produced layers.
It can cause generation of larger number of layers (up to 2 times) and
producing small layers.
But it requires minimal changes in code and doesn't affect storage
format.

For graphical explanation please see thread:
#3597 (comment)

## Issue ticket number and link

#2948
#3348

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t/bug Issue Type: Bug
Projects
None yet
Development

No branches or pull requests

2 participants