Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBFT instance-level checkpoints #113

Merged
merged 10 commits into from
Jul 7, 2022

Conversation

matejpavlovic
Copy link
Contributor

If correct nodes stop participating in the SB instance protocol immediately after having delivered all batches in a segment,
a minority of nodes might get stuck in higher views if they did not deliver all batches yet,
never finding enough support for finishing a view change.
The high-level checkpoints (encompassing all segments) are not enough to resolve this problem,
because multiple segments can block each other, each with its own majority of nodes that finish it,
but with too few nodes that finished all segments.
If, for example, each segment only has one node that fails to complete it, no high-level checkpoint can be constructed,
as too few nodes will have delivered everything from all the segments.
Thus, local instance-level checkpoints are required, so nodes can catch up on each segment separately.

@matejpavlovic matejpavlovic force-pushed the pbft-local-chkp branch 2 times, most recently from ae98c14 to 6089c25 Compare June 23, 2022 08:50
cmd/mircat/debug.go Outdated Show resolved Hide resolved
@matejpavlovic matejpavlovic force-pushed the pbft-local-chkp branch 3 times, most recently from 1147b8b to 01da17e Compare June 24, 2022 15:23
pkg/iss/pbftsegmentchkp.go Outdated Show resolved Hide resolved
pkg/iss/pbftsegmentchkp.go Outdated Show resolved Hide resolved
@matejpavlovic matejpavlovic force-pushed the pbft-local-chkp branch 2 times, most recently from e57e26c to 12a6104 Compare June 27, 2022 09:25
Signed-off-by: Matej Pavlovic <matopavlovic@gmail.com>
Signed-off-by: Matej Pavlovic <matopavlovic@gmail.com>
Signed-off-by: Matej Pavlovic <matopavlovic@gmail.com>
These checkpoints are necessary for liveness.
Each SB instance must be live independently of the other instances.
In particular, even instances that delivered all slots in a segment
must keep running until a global stable checkpoint is reached.
Otherwise the system might get stuck.

Signed-off-by: Matej Pavlovic <matopavlovic@gmail.com>
Signed-off-by: Matej Pavlovic <matopavlovic@gmail.com>
pkg/iss/iss.go Outdated Show resolved Hide resolved
pkg/iss/pbftslot.go Outdated Show resolved Hide resolved
Signed-off-by: Matej Pavlovic <matopavlovic@gmail.com>
Sending stable checkpoints to nodes that are suspected of having
fallen behind now happens periodically.

The number of old epochs to keep is now configurable.
Only epochs that are older than the latest stable checkpoint
minus a configuration parameter are garbage-collected.

Signed-off-by: Matej Pavlovic <matopavlovic@gmail.com>
Signed-off-by: Matej Pavlovic <matopavlovic@gmail.com>
pkg/iss/iss.go Outdated Show resolved Hide resolved
Signed-off-by: Matej Pavlovic <matopavlovic@gmail.com>
pkg/iss/config.go Outdated Show resolved Hide resolved
Co-authored-by: Sergey Fedorov <serge.fdrv@gmail.com>
@matejpavlovic matejpavlovic merged commit a10128c into consensus-shipyard:main Jul 7, 2022
@matejpavlovic matejpavlovic deleted the pbft-local-chkp branch July 7, 2022 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants