Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash Recovery #222

Open
matejpavlovic opened this issue Sep 12, 2022 · 0 comments
Open

Crash Recovery #222

matejpavlovic opened this issue Sep 12, 2022 · 0 comments
Labels
project Big task that can be addressed as a small stand-alone project Trantor

Comments

@matejpavlovic
Copy link
Contributor

Background: The Mir Project

The Mir project aims at developing a production-quality implementation of:

  • a general framework for easily implementing distributed protocols
  • a Byzantine fault-tolerant consensus protocol using the framework.

This implementation (in the Go language) will be available as the open-source mir library.
Our aim is integration with the Eudico Filecoin client,
specifically its ordering service.
Being framed as a library, however, Mir's goal is also to serve as a general-purpose high-performance BFT component of other projects.

The first fault-tolerant consensus protocol implemented within mir is a variant of ISS (next generation of the MirBFT consensus protocol, that appeared at EuroSys22 as State Machine Replication Scalability Made Simple). However, the framework is general enough to facilitate the implementation of other distributed protocols in the future. The framework is highly modular, allowing the developer of the distributed protocol to focus on the protocol logic, without having to care about network transport, storage, and cryptographic primitives, etc.
At the same time, the high modularity facilitates creating custom protocol implementations tailored to the consumer's needs. Mir hopes to be a building block of a next generation of distributed systems.

The mir library is still in development. Many of its useful features are sill not fully implemented, some are not even started, and others are yet to be discovered.

Tasks

This project focuses on one particular aspect of Mir's robustness: support for crash recovery.
It should be easy for a system built on top of mir to survive arbitrarily many benign crashes of individual nodes (especially if the implemented protocol is an asynchronous / partially synchronous one).

To aid the implementation of such crash-tolerant protocols, Mir provides a built-in write-ahead log (WAL) that the system implemented using Mir can use. In a nutshell, any event that occurs during execution can be persisted to stable storage through the WAL. During recovery (e.g., after a restart), the system is fed the sequence of stored events and can initialize its state accordingly.

A simple implementation of the WAL already exists in Mir. However, no other modules use it yet. In particular the state machine replication (SMR) application built using Mir (the application this project focuses on) can neither properly persist its state updates, nor is it able to re-initialize by reading the WAL.

The tasks this project consists of are as follows.

  • Identify the minimal necessary information the persisting of which enables Mir and its protocols to recover to a correct state.
  • Make Mir's SMR protocol suite persist all such critical information at the right times.
  • Implement the recovery: Make the Mir SMR protocol suite load a potential WAL on startups and initialize its state accordingly.

Expected Outcomes

  • A deployment of a Mir-based SMR application that continues operation, after all processes were simultaneously killed and restarted.

Required Skills

  • Experience in programming in the Go programming language
  • Basic knowledge of the usage of the Git version control system.
  • Understanding of distributed systems theory and consensus / TOB / SMR protocols
  • Proficiency in the English language

Contact Info

@matejpavlovic matejpavlovic added the project Big task that can be addressed as a small stand-alone project label Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project Big task that can be addressed as a small stand-alone project Trantor
Projects
None yet
Development

No branches or pull requests

1 participant