Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: clean up proposals pipeline #116020

Open
pav-kv opened this issue Dec 11, 2023 · 2 comments
Open

kvserver: clean up proposals pipeline #116020

pav-kv opened this issue Dec 11, 2023 · 2 comments
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@pav-kv
Copy link
Collaborator

pav-kv commented Dec 11, 2023

Forking off #115020 (comment).

I'm leaning towards a clearer design, like this:

  1. Have a FIFO queue of in-flight proposals. They are added sequentially, and have consecutive "lease indices", so no fancy data structures are needed (except maybe some MPSC support for concurrent writes, like in the proposal buffer).

    • In fact, this possibly can be integrated with the proposals buffer which is a similar queue. The only difference is that we would keep proposals in the queue until they are applied or got behind LAI, not just until they are flushed to raft.
    • Integrating with the proposal buffer though can be hardened by the fact that refreshProposalsLocked wants to push duplicates to it. So we may still need 2 separate queues.
  2. When LAI state changes, wipe a prefix of this queue up to the new LAI (after reporting all the successfully applied commands). Decide on the unapplied / unknown to be applied commands:

    • For proposals that can be reproposed with a newer LAI, do it; push back to the queue correspondingly.
    • For proposals that can't be reproposed (and won't be), report a permanent failure.
    • For proposals that were jumped over (like when there was a snapshot), report an ambiguous failure and don't repropose.
  3. When a replica is destroyed, or lease moves (figure out exactly what those conditions are), wipe any outstanding proposals in the queue and prevent inserts to it. Report ambiguous errors for them.

  4. Not sure if the proposals map is needed in the first place. The above data structure is searchable by LAI in O(1) since LAI is strictly +1 incremental. The map would only be needed if we additionally need to search by the proposal ID, but I don't think we need to. We will use the command ID only for sanity checks to make sure that search in the queue by LAI finds the correct command.

  5. We do need to track multi-proposal proposals though (those that update LAI and repropose). From the above, we have the invariant that an in-flight proposal is always in the queue and always exactly once, so we can store all this information right in the queue. Or we could factor out some bits that are shared across multiple reproposals.

  6. Do not touch the proposals state (and proposals queue) from refreshProposalsLocked at all. All it needs to do is scan the queue, and insert slow proposals to the proposal buffer. This nicely separates concerns. We may replace the queue scanning with a better heap-based approach, so that we don't have to scan the entire queue all the time (as we do now).

Jira issue: CRDB-34385

Epic CRDB-39898

@pav-kv pav-kv added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. T-kv-replication labels Dec 11, 2023
Copy link

blathers-crl bot commented Dec 11, 2023

cc @cockroachdb/replication

@pav-kv
Copy link
Collaborator Author

pav-kv commented Dec 11, 2023

TLDR:

  • We have two kinds of reproposals, but they are handled at the same "layer". Break this in two.
  • Break proposals buffer into two queues:
    • The first queue assigns LAI and ClosedTimestamp sequentially, has no dups, and is thus searchable by LAI. A proposal stays in this queue until the replica's LAI moves past it, and then it is either reproposed or dropped. This queue can be wiped sequentially, so no proposal "leaks".
    • The second queue is a buffer between this one and raft. It doesn't need to be ordered by LAI for correctness (but needs to in order to make progress), and can have dups. In some sense it's LAI/CT-agnostic. Proposals are added to this queue once, and then periodically re-added if still present in the first queue.
    • Possibly the second queue/buffer isn't needed, because a raft batch can be constructed from the first data structure directly: scan it and send everything to raft except things that were recently sent.
  • Get rid of proposals map, in favor of the first queue which is intiutive and sequential.

@pav-kv pav-kv added the C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. label Dec 11, 2023
@exalate-issue-sync exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jun 28, 2024
@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

1 participant