Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Coupling block sync to DAG state #3386

Draft
wants to merge 5 commits into
base: staging
Choose a base branch
from

Conversation

mdelle1
Copy link
Contributor

@mdelle1 mdelle1 commented Aug 28, 2024

Motivation

This PR focuses on coupling block sync to DAG state replication. When a node is syncing via block responses, it will sync its storage and DAG with the certificates contained in the block and attempt to update its ledger. Previously, there were scenarios where a node would commit certificates in its DAG without advancing blocks. Instead, the committal of leader certificates and advancement of blocks during sync should be tightly coupled.

To achieve this, if a node is syncing, it should forgo committing leader certificates on a rolling basis inside update_dag. Instead, the leader certificate should be committed just before the ledger is ready to advance to the next block during sync. To facilitate this, we use a sender channel that communicates the leader certificate to be committed from the Sync module to the BFT.

A previous version of this PR can be found here #3268 .
The differences are that the previous PR does not prevent syncing nodes from committing leader certificates within update_dag, and that certificates inside block responses are added to the DAG all at once when the availability threshold is met. For contrast, in this PR, we maintain the original method of updating the DAG as soon as the certificates in the block response are processed within sync_storage_with_block, but only commit the leader certificate when the ledger is ready to advance to the next block. Updating the DAG as soon as the certificates are processed is necessary to ensure there is no discrepancy between the storage and the DAG state beyond the latest committed block.

To summarize, this PR makes the following changes:

  • Implements sender-receiver channels for committing leader certificates and determining recently committed certificates between Sync and BFT.
  • Prevents syncing nodes from committing leader certificates within update_dag.
  • Instead, commits the leader certificates in the BFT using a sender-receiver channel within sync_storage_with_block just before the ledger advances blocks.

Test Plan

Relevant BFT test cases include the following:
2. BFT-Rebonding
12. Sync-Invalid-Peers-Attack
13. Sync-Far-Behind

Related PRs

#3268

Comment on lines +503 to +511
Ok(is_recently_committed) => {
if !is_recently_committed {
bail!(
"Sync - Failed to advance blocks - leader certificate with author {leader_author} from round {leader_round} was not recently committed.",
);
}
debug!(
"Sync - Leader certificate with author {leader_author} from round {leader_round} was recently committed.",
);
Copy link

@gluax gluax Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a nit but the boolean can be moved into the match statement you get a compile warning if missing a branch and this has better readability IMO.

i.e:

Ok(true) => {
  debug!("...");
}
Ok(false) => {
  bail!("...");
}
// rest of cases

self.spawn(async move {
while let Some(((round, certificate_id), callback)) = rx_is_recently_committed.recv().await {
// Check if the certificate was recently committed.
let is_committed = self_.dag.read().is_recently_committed(round, certificate_id);
Copy link
Collaborator

@ljedrz ljedrz Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a lot of hassle just to perform this one check (in its own task with a dedicated channel), especially with both the BFT and Dag being clonable 🤔... it does make it async, but do we expect the recent_committed_ids collection to grow large enough to make it blocking?

@alzger
Copy link
Contributor

alzger commented Nov 1, 2024

There are still CIs jobs failing @ljedrz . @vicsn , please let us know how you would like to proceed.

@vicsn
Copy link
Collaborator

vicsn commented Nov 1, 2024

Can this PR be changed to draft state? No need to get it into next release.

@alzger alzger marked this pull request as draft November 1, 2024 19:17
@alzger
Copy link
Contributor

alzger commented Nov 1, 2024

Converted to draft as per your comment @vicsn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants