Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: safekeeper coordination #543

Closed
10 tasks done
kelvich opened this issue Sep 6, 2021 · 1 comment
Closed
10 tasks done

Epic: safekeeper coordination #543

kelvich opened this issue Sep 6, 2021 · 1 comment
Assignees
Labels
c/storage/safekeeper Component: storage: safekeeper t/Epic Issue type: Epic
Milestone

Comments

@kelvich
Copy link
Contributor

kelvich commented Sep 6, 2021

Summary & Motivation

In a few scenarios safekeeper (SK) needs coordination with other SK’s that serve the same tenant:

  1. WAL deletion. SK needs to know what WAL was safely replicated to delete it. Now we keep WAL indefinitely.
  2. Deciding on who is sending WAL to the pageserver. Now sending safekeeper crash/ may lead to a livelock where nobo
  3. To enable SK to SK direct recovery without involving compute (that will be implemented in Epic: recovery of lagging safekeepers #1012)

DoD

  • safekeepers can re-elect a new a new safekeper to push WAL to the pageserver
  • no livelocks that we currently have

A quick follow through for this epic is #1403

Implementation plan:

Per neondatabase/rfcs#16 we decided to have centralized broker (currently etcd) to which every storage node can push / subscribe to. The rough steps are

@kelvich kelvich added the c/storage/safekeeper Component: storage: safekeeper label Sep 6, 2021
@stepashka stepashka added this to the Technical preview milestone Dec 13, 2021
@stepashka stepashka added t/Epic Issue type: Epic and removed launch blocker labels Dec 13, 2021
@stepashka stepashka changed the title safekeeper: time leases/gossip about pageserver connection Epic: safekeeper gossip implementation Dec 16, 2021
@stepashka stepashka added p/cloud Product: Neon Cloud and removed p/cloud Product: Neon Cloud labels Dec 27, 2021
@stepashka stepashka changed the title Epic: safekeeper gossip implementation Epic: safekeeper coordination Jan 14, 2022
@stepashka
Copy link
Member

stepashka commented Feb 11, 2022

@arssher , please update the implementation thoughts here based on the recent decision to go forward with the etcd instead of gossip for now and #1180

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/safekeeper Component: storage: safekeeper t/Epic Issue type: Epic
Projects
None yet
Development

No branches or pull requests

4 participants