pageserver: latest_gc_cutoff can go backwards after restart #10208

skyzh · 2024-12-19T17:11:30Z

Still root-causing #10192, but the underlying issue seems to be gc_cutoff can go backwards

2024-12-17T20:25:23.079668Z  INFO gc_loop{tenant_id=12fd6e6d7a50bf7dd96154ec39b8b7c8 shard_id=0000}:run:gc_timeline{timeline_id=9136e295b2647dae2fc5e2a2abbb1dc6 cutoff=0/E4B96D18}: keeping 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000E4B706C9-00000000E4B96D19 because it's newer than space_cutoff 0/E4B96D18
2024-12-17T20:40:41.170702Z  INFO compaction_loop{tenant_id=12fd6e6d7a50bf7dd96154ec39b8b7c8 shard_id=0000}:run:scheduled_compact_timeline{timeline_id=9136e295b2647dae2fc5e2a2abbb1dc6}: picked 30 layers for compaction (0 layers need rewriting) with max_layer_lsn=0/E4B96D19 min_layer_lsn=0/14EE9E8 gc_cutoff=0/E4B96D18 lowest_retain_lsn=0/E4B96D18, key_range=000000000000000000000000000000000000..FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, has_data_below=false
2024-12-17T20:46:56.028412Z  INFO loading tenant configuration from /storage/pageserver/data/tenants/12fd6e6d7a50bf7dd96154ec39b8b7c8/config-v1
2024-12-17T20:47:13.726136Z ERROR synthetic_size_worker: failed to calculate synthetic size for tenant 12fd6e6d7a50bf7dd96154ec39b8b7c8: could not find data for key 010000000000000000000000000000000000 (shard ShardNumber(0)) at LSN 0/E4B839F1, request LSN 0/E4B839F0, ancestor 0/0

Looking at the current index_part.json, "latest_gc_cutoff_lsn": "0/E4B706C8",

This means that we didn't persist latest_gc_cutoff_lsn=0/E4B96D18 to index_part.json, and when we restart the pageserver, we get the old latest_gc_cutoff_lsn and try to access things at that LSN for tasks like synthetic size calculation.

Therefore, a correct implementation of legacy GC / GC compaction is to use the persisted latest gc cutoff, in other words, we should upload latest_gc_cutoff to index_part before starting gc.

The text was updated successfully, but these errors were encountered:

skyzh · 2024-12-19T17:32:52Z

confirmed that legacy GC will schedule an index-only update before removing the files so it's only a problem with gc-compaction.

…10209) ## Problem close #10208 part of #9114 ## Summary of changes * Ensure remote `latest_gc_cutoff` is up-to-date before removing any files for gc-compaction. Signed-off-by: Alex Chi Z <chi@neon.tech>

skyzh added c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug labels Dec 19, 2024

skyzh self-assigned this Dec 19, 2024

skyzh mentioned this issue Dec 19, 2024

fix(pageserver): update remote latest_gc_cutoff after gc-compaction #10209

Merged

skyzh closed this as completed in #10209 Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: latest_gc_cutoff can go backwards after restart #10208

pageserver: latest_gc_cutoff can go backwards after restart #10208

skyzh commented Dec 19, 2024

skyzh commented Dec 19, 2024

pageserver: latest_gc_cutoff can go backwards after restart #10208

pageserver: latest_gc_cutoff can go backwards after restart #10208

Comments

skyzh commented Dec 19, 2024

skyzh commented Dec 19, 2024