Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: single phase Timeline::load_layer_map #5074

Merged
merged 54 commits into from
Aug 24, 2023
Merged

Conversation

koivunej
Copy link
Member

@koivunej koivunej commented Aug 23, 2023

Current implementation first calls load_layer_map, which loads all local layers, cleans up files, leave cleaning up stuff to "second function". Then the "second function" is finally called, it does not do the cleanup and some of the first functions setup can torn down. "Second function" is actually both reconcile_with_remote and create_remote_layers.

This change makes it a bit more verbose but in one phase with the following sub-steps:

  1. scan the timeline directory
  2. delete extra files
  3. recoincile the two sources of layers (directory, index_part)
  4. rename_to_backup future layers, short layers
  5. create the remaining as layers

Needed by #4938.

It was also noticed that this is blocking code in an async fn so just do it in a spawn_blocking, which should be healthy for our startup times. Other effects includes hopefully halving of stat calls; extra calls which were not done previously are now done for the future layers.

@koivunej koivunej requested review from a team as code owners August 23, 2023 07:05
@koivunej koivunej requested review from knizhnik and LizardWizzard and removed request for a team August 23, 2023 07:05
@koivunej koivunej changed the title refactor: one phase load layer map refactor: single phase Timeline::load_layer_map Aug 23, 2023
@github-actions
Copy link

github-actions bot commented Aug 23, 2023

1624 tests run: 1550 passed, 0 failed, 74 skipped (full report)


The comment gets automatically updated with the latest test results
8854f71 at 2023-08-24T12:52:21.627Z :recycle:

@koivunej koivunej removed the request for review from LizardWizzard August 23, 2023 07:37
koivunej and others added 6 commits August 23, 2023 17:50
confusing situation with test_remote_storage_upload_queue_retries
documented on the PR discussion. an image layer was remote only and it
took me again a while to recollect how is this possible to have future
image layers (it is because we can create them for not yet flushed to
disk lsns).
Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
@jcsp
Copy link
Collaborator

jcsp commented Aug 23, 2023

I really like the overall style of this, makes the code substantially clearer

@koivunej
Copy link
Member Author

Only added the one change after your approvals, not waiting for another round.

@koivunej koivunej enabled auto-merge (squash) August 24, 2023 13:04
@koivunej koivunej merged commit 76aa01c into main Aug 24, 2023
29 checks passed
@koivunej koivunej deleted the one_step_load_layer_map branch August 24, 2023 13:07
koivunej added a commit that referenced this pull request Aug 24, 2023
I've personally forgotten why/how can we have future layers during
reconciliation. Adds `#[cfg(feature = "testing")]` logging when we
upload such index_part.json, with a cross reference to where the cleanup
happens.

Latest private slack thread:
https://neondb.slack.com/archives/C033RQ5SPDH/p1692879032573809?thread_ts=1692792276.173979&cid=C033RQ5SPDH

Builds upon #5074. Should had been considered on #4837.
koivunej added a commit that referenced this pull request Oct 26, 2023
…#4938)

Implement a new `struct Layer` abstraction which manages downloadness
internally, requiring no LayerMap locking or rewriting to download or
evict providing a property "you have a layer, you can read it". The new
`struct Layer` provides ability to keep the file resident via a RAII
structure for new layers which still need to be uploaded. Previous
solution solved this `RemoteTimelineClient::wait_completion` which lead
to bugs like #5639. Evicting or the final local deletion after garbage
collection is done using Arc'd value `Drop`.

With a single `struct Layer` the closed open ended `trait Layer`, `trait
PersistentLayer` and `struct RemoteLayer` are removed following noting
that compaction could be simplified by simply not using any of the
traits in between: #4839.

The new `struct Layer` is a preliminary to remove
`Timeline::layer_removal_cs` documented in #4745.

Preliminaries: #4936, #4937, #5013, #5014, #5022, #5033, #5044, #5058,
#5059, #5061, #5074, #5103, epic #5172, #5645, #5649. Related split off:
#5057, #5134.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timeline::reconcile_with_remote does no cleanup
4 participants