Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: enable arbitrary attachments during /re-attach #5173

Open
jcsp opened this issue Sep 1, 2023 · 1 comment
Open

pageserver: enable arbitrary attachments during /re-attach #5173

jcsp opened this issue Sep 1, 2023 · 1 comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver

Comments

@jcsp
Copy link
Collaborator

jcsp commented Sep 1, 2023

In #5163, the pageserver calls out to the control plane on startup to get a list of tenant attachments + generation numbers.

However, for the tenant configuration and timelines to attach, it relies on local state, i.e. implicitly relies on the control plane having previously called /attach for the tenants identified in /re-attach.

That limitation exists because we wanted to avoid making the re-attach response much bigger by including all timelines and all tenant configurations, just in case one of them is a new attachment.

Eventually, we would like to remove this limitation. This would require either ensuring that the control plane + pageserver work well with the larger response, or perhaps moving to a more efficient streaming API (e.g. gRPC) rather than HTTP+JSON, so that we are not so concerned about RPC body sizes at scale.

@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/tech_debt Area: related to tech debt labels Sep 1, 2023
jcsp added a commit that referenced this issue Sep 6, 2023
)

## Problem

- #5050 

Closes: #5136

## Summary of changes

- A new configuration property `control_plane_api` controls other
functionality in this PR: if it is unset (default) then everything still
works as it does today.
- If `control_plane_api` is set, then on startup we call out to control
plane `/re-attach` endpoint to discover our attachments and their
generations. If an attachment is missing from the response we implicitly
detach the tenant.
- Calls to pageserver `/attach` API may include a `generation`
parameter. If `control_plane_api` is set, then this parameter is
mandatory.
- RemoteTimelineClient's loading of index_part.json is generation-aware,
and will try to load the index_part with the most recent generation <=
its own generation.
- The `neon_local` testing environment now includes a new binary
`attachment_service` which implements the endpoints that the pageserver
requires to operate. This is on by default if running `cargo neon` by
hand. In `test_runner/` tests, it is off by default: existing tests
continue to run with in the legacy generation-less mode.

Caveats:
- The re-attachment during startup assumes that we are only re-attaching
tenants that have previously been attached, and not totally new tenants
-- this relies on the control plane's attachment logic to keep retrying
so that we should eventually see the attach API call. That's important
because the `/re-attach` API doesn't tell us which timelines we should
attach -- we still use local disk state for that. Ref:
#5173
- Testing: generations are only enabled for one integration test right
now (test_pageserver_restart), as a smoke test that all the machinery
basically works. Writing fuller tests that stress tenant migration will
come later, and involve extending our test fixtures to deal with
multiple pageservers.
- I'm not in love with "attachment_service" as a name for the neon_local
component, but it's not very important because we can easily rename
these test bits whenever we want.
- Limited observability when in re-attach on startup: when I add
generation validation for deletions in a later PR, I want to wrap up the
control plane API calls in some small client class that will expose
metrics for things like errors calling the control plane API, which will
act as a strong red signal that something is not right.

Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
@jcsp
Copy link
Collaborator Author

jcsp commented Mar 28, 2024

Setting up secondaries during re-attach was added in #6941

The caveats are currently:

  • We can't attach tenants during re-attach that weren't explicitly attached before (because we don't have tenant config for them on the node)
  • Secondary locations get a default configuration during re-attach because we don't persist the secondary location for a tenant if it was in attached mode before restart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

No branches or pull requests

1 participant