Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚧 | Y3: Efficient subnet consensus protocols #9

Closed
jsoares opened this issue Dec 6, 2021 · 34 comments
Closed

🚧 | Y3: Efficient subnet consensus protocols #9

jsoares opened this issue Dec 6, 2021 · 34 comments

Comments

@jsoares
Copy link
Contributor

jsoares commented Dec 6, 2021

Description

Our goal is to design and implement scalable, efficient consensus protocols for subnets (i.e. anything below the top-level Filecoin consensus). This should enable secure, low-latency operation up to ~500 nodes per subnet.

These protocols will be integrated with Eudico, the research clone of the Lotus Filecoin client, supporting a hierarchy of subnets, each running its own instance of a consensus protocol.

To achieve our, we develop Mir, a framework for implementing distributed protocols. The first protocol to be implemented is ISS, a multi-leader BFT-style consensus protocol, in conjunction with Narwhal, a state-of-the-art mempool implementation enabling sale-out throughput.

The design should be, however, general enough, and its implementation sufficiently modular, to make it easy to implement various different consensus protocols and select any of them for any particular subnet deployment.

Scope

  • Specify mid-tear and leaf-level consensus
  • Implement a modular and scalable (up to low hundred of nodes) consensus solution
  • Integrate the solution with the the (experimental) Filecoin client Eudico.

Resources

Papers

Talks

soon™️

Demos

2022-04-07, MirBFT

Watch the video

2022-06-02, Crash failures in Mir and its integration with Eudico

Watch the video

2022-08-12, Reproducible Integration Testing in Mir

Watch the video

2022-08-12, Taking Pseudocode To An Implementation With Mir Framework

Watch the video

2022-09-01, Reconfigurable SMR with Mir

Watch the video

2022-09-01, Dynamically adding new nodes to Eudico with Mir

Watch the video

@jsoares jsoares added the Epic label Dec 6, 2021
@jsoares jsoares changed the title 🚧 | Y3: Efficient subnet consensus 🚧 | Y3: Efficient subnet consensus protocols Feb 1, 2022
@jsoares
Copy link
Contributor Author

jsoares commented Feb 23, 2022

2022-02-21 meeting notes

  • Discussion with Alfonso and Denis around implementation approach. Synchronised on the state of each part of the project. Tendermint almost done. Next upcoming chunk is on the Eudico side (~2 weeks). In parallel, we’ll look into implementing MirBFT stub.
  • We’ll also look into how ISS works and identify the key parts of the ISS protocol that need to be implemented to have a working prototype.
    • Marko: We should come up with roadmap for this
      • Timeline: With testnet in Q4, what can we expect to have ready in time?
        • Having stable interface is the priority.
        • Estimating effort and timeline is next step — aiming to have someone by Monday. Let’s try for monthly milestones and zero-mean error.
  • Narwhal: first focus on the framework, interface and requirements for usage. For now, plan to use default mempool. Alfonso will be supporting on Eudico side. Immediate AIs: share ISS paper with Denis and go through MirBFT.
    • Is it clear that Narwahl can be added on top of Mir-BFT? Is there a unifying architecture?
      • Narwhal and MirBFT are not interchangeable. Narwhal can use MirBFT as a blackbox ordering service. The architecture should be modular, with MirBFT and Narwhal as specific instances.
      • We may lose some specific properties/advantages in favour of generality.
  • How should we do state transfer work in Eudico?
    • Have a snapshot of the state tree. Unclear how efficient it will be. This is synchronised to a specific epoch and consistent across nodes. We can also add reconfiguration messages — these can be part of a block and be part of the total order.

@jsoares
Copy link
Contributor Author

jsoares commented Feb 28, 2022

2022-02-28

✋ Attendees

📣 Updates

  • Discussing architecture and making a list of to-do items to make MirBFT a consensus protocol for Eudico
  • First milestone define
    • M1: minimal demoable implementation
      • Eudico client-service interface (~ that for Tendermint) [Denis, 3 wks]
      • Implementation of PBFT subprotocol in ISS [Matej, 3 wks]
      • State garbage collection and transfer [4 wks]
      • Prepare demo [1 day]

🧵 Discussion

  • Did we consider alternatives to PBFT?
    • Alternatives considered but PBFT is the most settled and general. Fastest to implement. Good performance in previous experiments.
    • We can try others in parallel but preference for the team to sart there.
  • Timeline for the overall project
    • End of Q3 realistic for testnet
    • Only critical items after M1 is
      • Reconfiguration
      • Persistent request store
      • Persistent write-ahead log (trickier)
    • Let's try to organise the follow-up milestones
  • Matej + Denis + Sergey sounds like full staffing; shouldn't need any more capacity. Some parts are not highly parallelisable.
  • We may be able to reuse testing framework from Lotus/Eudico instead of spending a lot of dev cycles on it

🎯 Up next

  • Define timeline for M1 (start and end date)
  • Define high-level milestones for the rest of the project
  • Organise meeting on testing tools and invite Marko

@jsoares
Copy link
Contributor Author

jsoares commented Mar 8, 2022

2022-03-07

✋ Attendees

📣 Updates

🧵 Discussion

  • Applicability of VDFs to PBFT vs. other protocols
  • Internship proposal comments?
  • Robust leader election
    • Want to verify an idea/assumption that PBFT could easily be made robust in this sense.

🎯 Up next

  • Continue implementing PBFT sub-protocol
  • Check alignment of interests with Srivatsan
  • Potentially submit Hyperledger mentorship proposal (deadline on Wed)
  • @jsoares to add milestones to roadmap

@jsoares
Copy link
Contributor Author

jsoares commented Mar 14, 2022

2022-03-14

✋ Attendees

📣 Updates

  • Good case of PBFT works.
  • Basic mircat demonstrable with PBFT.
  • Camera-ready for ISS paper almost ready.
  • mircat as a subproject.

🧵 Discussion

  • Would a demo video showing basic usage of mirbft (using a sample chat application) be useful? Another video with failures could follow when PBFT view change is implemented.
    • Let's go with short recordings for intermediate progress, and plan a demo on the April demo day.
    • Look into visualisation tooling for traces. Consider a dev shop.
      • Jorge having regrets about us not having joined the hyperledger mentorship programme

🎯 Up next

  • Consolidate documentation and high-level project description for broader audiences.
  • Implement PBFT view change.
  • Focus on the mirbft interface
  • Create a GitHub issue for mirbft Eudico interface

@jsoares
Copy link
Contributor Author

jsoares commented Mar 22, 2022

2022-03-21

✋ Attendees

📣 Updates

  • Started implementing PBFT view change
  • Consolidated big part of the documentation for MirBFT
  • Updated some related issues on ConsensusLab GitHub, added a few sub-issues

🧵 Discussion

  • New target date for MirBFT-Eudico interface 3 weeks into April: OK with Denis?
    • Denis: Yes

🎯 Up next

  • Continue working on PBFT view change implementation
  • Maybe demo (if time allows)

@jsoares
Copy link
Contributor Author

jsoares commented Mar 28, 2022

2022-03-28

✋ Attendees

📣 Updates

  • Suggested project for Sergey: State transfer in MirBFT.
  • More code on PBFT view change.

🧵 Discussion

  • Make #90 ISS with chain quality a discussion?
    • It's now a discussion -- let's document today's conversation and other thoughts as they come along.
  • Usage of VRFs for bucket assignment.
  • Value proposition of Narwhal vs MirBFT
    • And how they go hand-in-hand
    • Should we have a sprint to work out the best direction? Start async discussion; highest priority.
      • Denis can still work on MirBFT integration after itests; meeting on Wednesday
    • Task protocol also at top of mind; simple, co-designed with Narwhal, something to know about when having this discussion
      • Ideas from Hedera/hashgraph?

🎯 Up next

  • PBFT View change.
  • MirBFT usage demo for 7 April.

@matejpavlovic
Copy link
Contributor

2022-04-11

(Updated by @matejpavlovic 2022-04-12)

✋ Attendees

📣 Updates

  • Narwhal and ISS at EuroSys
  • New discussions around deduplication in consensus protocols.
  • Denis and Sergey starting this week! 🥳

🧵 Discussion

Implementation Framework Name

Currently MirBFT - carries historical baggage. Let's rename to

GoNode

Initial idea was Node.go (like Node.js), which would have fit even better to the abstraction it provides, but Node.js is a registered trademark.
For now the repository lives at https://github.com/matejpavlovic/go-node and will be populated with code soon.

Consensus Protocol

Given recent research results (Narwhal, Tusk, Bullshark, etc.) and discussions,

  1. it seems like we can have 1 out of 2:
    • request deduplication and
    • resistance to a mobile adversary (DoS resistance).
  2. The long-term solution will almost surely involve a DAG.

Thus, it is worth revisiting the current plan for consensus implementation. The long-term solution will involve a

  • Current path:
    • Strategy:
      • Implement and use GoNode.
      • Within GoNode, implement the ISS protocol (including deduplication).
      • Use a simple mempool approach (hook Eudico's mempool to GoNode using some adapter code) and add Narwhal much later.
    • Pros:
      • Probably the simplest and fastest path to an MVP.
      • @matejpavlovic has experience with ISS.
      • Verified with the ISS PoC.
      • Good latency in the good case
    • Cons:
      • Vulnerability to a mobile adversary that would need to be resolved later.
      • Limited throughput with failures / asynchrony.
      • Discussion on request duplication hints that deduplication inside ISS might be a dead end.
  • DAG-based path (a)
    • Strategy:
      • Implement and use the GoNode framework.
      • Use GoNode, use duplicated ISS (dISS): Boils down to simple multiplexing of PBFT instances.
      • Implement the Narwhal DAG (ideally also with GoNode).
      • Combine dISS and Narwhal.
      • Optimistic Narwhal deduplication (can be postponed).
    • Pros:
      • A more general design, including the DAG as an abstraction from the start.
      • Simplifies ISS implementation (by reducing it to dISS).
      • Readiness for a wider family of DAG-based consensus protocols.
      • Resistance to a mobile adversary (DoS).
      • Probably better throughput under failures/asynchrony.
    • Cons:
      • Probably marginally higher latency than current path.
      • No PoC for exactly this protocol (although Narwhal-HotStuff probably applies).
      • Requires altering current roadmap (but just slightly - moving Narwhal to the front of the queue).
  • DAG-based path (b)
    • Strategy:
      • Like DAG-based path (a), but also exchange dISS for a different protocol (e.g. Tusk or Bullshark)
    • Pros:
      • Potentially better latency than (a)
    • Cons:
      • Comes close to "from-scratch" solution with big investment in new protocols and uncertain result.
  • Everything-from-scratch path
    • Strategy:
      • Deep dive into the latest research results.
      • Redesign everything from scratch without regard on what we already have.
    • Pros:
      • No bias by current state
    • Cons:
      • Big overhead of starting from scratch with a result that might not be better than other two approaches

Conclusion:

  • Use Eudico's gossip-based mempool for obtaining TXs
  • Implement a simple availability layer where each node constructs blocks from the received requests and obtains availability certificates
    • This will be later extended to Narwhal
  • Each node proposes the availability certificates to the consensus layer
  • Implement a simple consensus layer that multiplexes PBFT instances without deduplication (for DoS resistance)
  • Deliver the output of the consensus layer back to Eudico

🎯 Up next

@vukolic
Copy link

vukolic commented Apr 12, 2022

Would we need "RustNode" as well?

@adlrocha
Copy link
Contributor

adlrocha commented Apr 12, 2022

From complete ignorance, @matejpavlovic, how does Mir-BFT relates to Node.js? (I am just wary of this kind of generic names 😄 ). It makes SEO and general discoverability harder.

@vukolic
Copy link

vukolic commented Apr 12, 2022

+1 I am not a huge fan of the name proposal

@dnkolegov
Copy link

Why Node? Node-go or go-node says that "Node" is implemented in Go, but what Node is, and how does it relate to BFT?

If we are talking about names:

  • uBFT - Universal BFT or any other xBFT, and then ubft-go,...
  • orbitox

@matejpavlovic
Copy link
Contributor

Proposing the initial draft of the design document for incremental implementation of Eudico's ordering layer. Comments welcome!
https://hackmd.io/@matejpavlovic/ryd53ZP4c

@sergefdrv
Copy link

initial draft of the design document for incremental implementation of Eudico's ordering layer

I left some comments

@matejpavlovic
Copy link
Contributor

2022-04-21

✋ Attendees

📣 Updates

  • Wrote design document
    on the architecture and modularization of Eudico's ordering layer.
  • Started documenting Mir architecture decisions with ADRs
  • Updated roadmap on CL GitHub

🧵 Discussion

  • Separate Slack channel for Mir implementation?
    • #mir-impl or #mir-implementation
    • Public if not polluting the Filecoin Slack space, otherwise private
  • We agree on the proposed high-level design
    • This is not the detailed implementation architecture
  • We will start by implementing the first simple variant algorithm-wise and add all necessary algorithm-independent features (reconfiguration, restarts, etc.) before evolving the algorithms

🎯 Up next

  • Continue implementation
  • @matejpavlovic to devise a more concrete implementation architecture

@matejpavlovic
Copy link
Contributor

2022-04-25

✋ Attendees

📣 Updates

🧵 Discussion

  • Persistence of Mempool
    • TX mempool stays responsible for persisting of payloads
    • Whenever the Availability layer receives a TX hash,
      it can count on the corresponding payload
      to be written in persistent storage
  • Deployment and static configuration of Eudico nodes
    • Take the simplest approach to statically configure the deployment
    • Node IDs don't need to correspond to anything in particular (no need for a node ID to be a wallet address, for example)
  • Mining and "proposers" of blocks - incentives in BFT consensus
    • Introduce mechanism for general "contribution" to the 2nd (MVP) milestone
    • Have a static "dummy" address in the Miner field of generated Eudico blocks.

🎯 Up next

  • Continue implementation
  • Discuss with Marko and Alfonso: Miners in Filecoin, Eudico and Mir. Does the conception of Miner exist in Filecoin version with a BFT-type consensus algorithm?

@matejpavlovic
Copy link
Contributor

2022-05-02

✋ Attendees

📣 Updates

🧵 Discussion

🎯 Up next

@matejpavlovic
Copy link
Contributor

2022-05-09

✋ Attendees

📣 Updates

🧵 Discussion

  • Mir architecture
    • Try using the current event-based architecture and see if we need to work a lot against it.

🎯 Up next

@matejpavlovic
Copy link
Contributor

2022-05-16

✋ Attendees

📣 Updates

🧵 Discussion

  • How will we do state transfer?
    • Plans for week of 16/05: grab serialised state and transfer as it is; trivial app example, might not work for Eudico.
    • We'll have checkpoints; in the long run, the state transfer within Mir will likely be the same thing. Then we can transfer overall state in Eudico.
    • Mir is there to do ordering, not deal with full state payload.
  • Y3-M3 plans
    • Will need some input from Alfonso in the next few days
    • Early focus on perf benchmarks
  • Demo on 2 Jun?
    • We can demo Denis' previous work + failure
    • Benchmarks not likely

🎯 Up next

@jsoares
Copy link
Contributor Author

jsoares commented May 23, 2022

2022-05-23

✋ Attendees

📣 Updates

🧵 Discussion

  • @jsoares: Where are we on finalising M2? Is this complete with #68?
    • Yes, will close on merge
  • Refine M3 goal

🎯 Up next

@matejpavlovic
Copy link
Contributor

2022-05-30

✋ Attendees

📣 Updates

🧵 Discussion

  • Addressing errors/warnings reported by code static analysis (linter, LGTM, GH Security)
  • @matejpavlovic : Dissemination of delivered blocks (Denis' hack) - @adlrocha is it viable?
  • How does PoS relate to BFT? Do we implement any incentive mechanism?
    • PoS is a generalization of BFT, using different voting powers for different nodes.
    • The ordering layer gets this data from the outside (power table),
      no need to figure it out itself.
  • How does HC work? Who creates subnets and how?
    • There is no succinct document describing it yet.
    • With the new FVM update, the filecoin will support creating subnets,
      each of them potentially using (an instance of) the BFT ordering layer.
    • The state of the subnet periodically writes its checkpoint to the parent.
  • @matejpavlovic: Demo day - who shows what? Options:
    • @matejpavlovic shows demo chat app with a node crashing
    • @dnkolegov shows integration of mir into Eudico
      • with a node crashing (if possible)
      • without a node crashing

🎯 Up next

  • @sergefdrv
    • Discuss and finalize Mir testing strategy
    • Start improving testing of Mir
  • @matejpavlovic
    • Polish BFT ordering interface
    • Pseudocode using the interface
    • Sync with @dnkolegov on demo and pick one option from above

@matejpavlovic
Copy link
Contributor

matejpavlovic commented Jun 20, 2022

2022-06-20

✋ Attendees

No zoom call, just async collaboration on the notes.

📣 Updates

🧵 Discussion

  • Eudico/Lotus libp2p node ID or wallet ID as node/client ID in Mir:
    • Discussion on Slack
    • Beside using Libp2p ID and its crypto creates some problem from an implementation perspective because it was not designed and implemented for such cases
    • Suggestion: to use wallet keys as Mir IDs as we do now
      • Approved on Slack by Alfonso and Matej
    • @matejpavlovic: Confirming and approving here as well, will incorporate it (hopefully already tomorrow, Covid permitting) to the ordering layer design document.
    • @matejpavlovic: conclusion:
      • Each node (validator) of a subnet is uniquely identified by a wallet address: the address of the node's corresponding wallet in the parent net. I.e., this wallet address serves as the node ID in the subnet. Since the wallet address is directly derived from a key that the node will use to sign subnet ordering protocol messages, no other public key needs to be associated with the node in the subnet.
      • Each node (validator) ID (wallet address) is associated with a libp2p address used to send messages to the node.
      • (Not implemented yet, but part of the design:) Each node (validator) ID is associated with a weight value expressing how much relative voting power the node has in the subnet
      • (Not implemented yet, but part of the design:) In order for the node (validator) to not have to store the private key associated with the wallet address identifying it, each node (validator) ID is also associated with subnet node public key generated specifically for the purpose of verifying agreement protocol messages. The node (validator) will use the corresponding private key for signing.
      • All of the above is saved in the state of the subnet actor (residing in the parent net's state) in some configuration data structure (its exact format and implementation to be determined). How it changes is irrelevant from the subnet's point of view. The subnet just has to react when it changes and adopt the corresponding configuration.
  • @matejpavlovic:
    • @sergefdrv do you want to go ahead with Testify for testing? Why Testify in the end? (Having worked with it is a potential valid reason, if it does the job). What is the main difference to Ginkgo (current patchy tests)?
      • @sergefdrv: Prefer Go native testing and Testify for usability and better integration with Go testing facilities.
        • For example, Testify makes it easy to run specific tests
        • Current Ginkgo tests pollute the output with diagnostic messages. Not the case with Testify and standard Go testing facilities (one can still enable verbose mode when needed)
        • Ginkgo doesn't seem to have native integration with Go's benchmark or fuzz testing facilities
      • @matejpavlovic: Conclusion: Use Testify.
    • @sergefdrv are you considering implementing any model checking? I think we should leave that for much later.
      • @sergefdrv: Would just like to have some randomized tests that enforce different order of external events and check that the expected properties (safely and liveness) hold in our implementation rather than full-fledged formal verification of the high-level protocol description

🎯 Up next

  • @dnkolegov:
    • polish Mir's Crypto implementation in Eudico
    • finish final threshold/check period refactoring
    • find out why itests sometimes panic and fail on the CircleCI node
    • start implementing Mir's Transport interface in libp2p
  • @matejpavlovic
    • Finish updating and extending design document, focusing on the sections on reconfiguration and specifying what keys the nodes use and how.
    • Express the design described in the design document in go-like pseudocode (that can be used as a skeleton for actual implementation)
    • Continue writing Mir improvements
    • Review papers for BRAINS'22
  • @sergefdrv
    • Clean up existing Mir test infrastructure
    • Implement logical time in integration tests instead of using real timers and clocks. This will speed up testing and give control over event order so that we can make random permutations there.
  • @xosmig
    • Add some tests for the DSL module.
    • Merge ContextStore and DSL module to the main branch.
    • Start implementing basic building blocks for the future Narwhal implementation.
    • Study the Rust implementation of Narwhal.

@atonkikh
Copy link

Oh, hi! I'm just QA engineer from another company, I don't mind learning Rust, but can you help with DSL please.
Also, do you pay euros?

@xosmig
Copy link

xosmig commented Jun 20, 2022

@atonkikh oh, sorry, it's because I also sometimes use atonkikh as my nickname.
Don't worry, I'll handle the DSL thing :)

@matejpavlovic
Copy link
Contributor

2022-06-27

📣 Updates

🧵 Discussion

  • @xosmig: Do we already have some infrastructure to test individual modules?
    I am thinking how (and if) I should test the BCB implementation
    • @matejpavlovic: To my best knowledge, we don’t have that yet.
      I don’t think testing the BCB implementation should be a priority now though.
      When Sergey is done with the new testing infrastructure, let’s see if and how that can be leveraged.
      @sergefdrv do you have anything in that direction in mind?
    • @sergefdrv: so far, I'm focused on integration testing, not module-level testing.
      I think one can write simple tests for individual modules as usual in Go. I would recommend using Testify.
    • @xosmig Yes, I did that for dsl modules (just calling ApplyEvents directly and checking the output), but BCB probably needs something more similar to integration tests: we need to run several nodes with other modules and the application being mocked, let the nodes talk to each other while manipulating the network, and check the outcome.

🎯 Up next

  • @matejpavlovic:
    • Finish ISS protocol implementation (mostly message re-tramsmission)
    • Update and extend Eudico integration design documents
      • Specify details about (re-)configuration implementation
      • Update Eudico-Mir interface
  • @sergefdrv
    • Continue cleaning up existing Mir test infrastructure
    • Implement logical time in integration tests instead of using real timers and clocks. This will speed up testing and give
  • @xosmig
    • Basic availability layer implementation.
    • Design draft for dynamic modules, without implementation.
  • @dnkolegov
    • continue working on new Eudico ordering layer
    • implement more efficient FIFO mechanism in the Eudico block assembler
    • libp2p implementation for Mir

@matejpavlovic
Copy link
Contributor

matejpavlovic commented Jul 5, 2022

2022-07-04

📣 Updates

🧵 Discussion

  • Libp2p transport for Mir:
    • @dnkolegov: @matejpavlovic and I discussed today basic implementation of libp2p transport. Matej's suggestion is to use direct libp2p connections. In essense, that is similar to the current GRPC transport in Mir but with Libp2p. @adlrocha are you fine with that?
    • @matejpavlovic: In essence, the transport layer needs to provide authenticated point-to-point messaging. Using gossipsub is not desired for BFT-style consensus, since 1) not all messages need to be send to all nodes and 2) gossip latency is too high for most messages in BFT-style protocols.
    • @matejpavlovic: Aggregating from Comments: It seems mostly like a question of performance. Unless there is a really big performance difference, let's stick to one networking stack.
    • @matejpavlovic: I'd say we go for libp2p-based transport and once we have time at a later point, experiment with alternatives if we believe we can get a substantial performance gain.
  • Details on Eudico’s ordering layer implementation
    • @xosmig: Last week, @matejpavlovic and I discussed the architecture of Eudico’s ordering Layer and came to a conclusion that, in order to avoid replicating the mempool structure in the availability layer module, the interface between the two should be pull-oriented (the availability layer asks for new blocks) rather than push-oriented (the mempool pushes new blocks/transactions to the availability layer). The main problem with the push-oriented interface was that the mempool could push new transactions at a faster rate than the availability layer could process them. Then we would need to include some buffers and non-trivial strategies for selecting and dropping transactions in the availability layer. On the other hand, pull-based architecture allows to encapsulate all such logic in the mempool module.
    • @xosmig: Also, we agreed to merge most of the functionality of batch store into the availability layer due to a high coupling between the functionalities of the two.
      • @matejpavlovic: Will update the design docs accordingly (see "Up Next" below)
    • @xosmig: Now I have a similar concern about the interface between the availability layer (composing and distributing TX batches for ordering, e.g. Narwhal) and the consensus layer: should the consensus layer also request (pull) blocks from the availability layer or should the availability layer push new blocks to the consensus layer?
      • There are some issues with both approaches:
        • Push-based: same as before, availability layer can go too fast or too slow.
        • Pull-based: implemented naively, may add extra latency.
          • It can be addressed if the availability layer works “one block in advance”. However, then there may be an issue with “recency” of the blocks (in case the consensus layer is rather slow).
      • @matejpavlovic:
        1. I think this can be addressed by making the Mir module that is emitting events representing the batches for ordering an ActiveModule. That way, if the consensus module is too slow to process them, the Mir implementation will temporarily stop reading new batches from the availability layer.
          • Comment discussion summary: This is a lower-level mechanism to protecting from going out of memory. Let's not use it for protocol-level flow control.
        2. This problem might actually happen to be prevented at a higher level. Given that the system works in epochs (between which it may reconfigure), it might be better to make even the availability layer only produce a finite number of batches per epoch. And since starting new epochs depends on the progress of the consensus protocol, we get an implicit flow control mechanism.
      • @xosmig: a few advantages of the pull-based model (with the optimization of working “one block in advance”):
        • lower coupling, cleaner interfaces (details like “epochs” don't “leak” into the availability layer implementation);
        • I think it may work nicer in combination with a more advanced availability layer like Narwhal because, in case of Narwhal, we probably do want it to go as fast as possible, but sometimes the consensus layer needs to get the latest tip of the dag and commit it (it would be able to do so through the request-response mechanism).
        • The problem of “recency” of blocks in case of slow consensus is less severe, albeit still present. (i.e., the delay between the creation of a block and the time it is proposed is smaller).
  • A small question about Filecoin:
    • @xosmig: does the notion of a "valid transaction" depend on the state of the blockchain? Can / should we start working on a new block before deciding the previous one?
    • @matejpavlovic: While we could, in the future, support some "external validation" (informally discussed few times, last time during Consensus Factory), I would not go as far as requiring "closed-loop" block processing. For now (i.e. for the MVP) let's just consider everything that makes it in the mempool as valid (from the point of view of the ordering layer).
    • @xosmig ok, noted, thanks.
  • Consensus proofs:
    • @sergefdrv: The HC spec mentiones: Subnets can run any consensus algorithm of their choosing, provided it can meet a defined interface, and they can determine the consensus proofs they want to include for light clients. How do we plan to implement the consensus proofs in Mir/ISS?
    • @matejpavlovic: Currently we do not provide an explicit interface for "consensus proofs" (not sure whether and where one is defined for HC. @adlrocha?). But principally, the checkpoint certificates produced by the consensus implementation can serve as proofs, as they are signed by a quorum of the validators.
      • @sergefdrv:
        • It is not quite clear what granularity is requered for the consensus proofs. One natural answer would be: per block, another: per subnet checkpoint. @adlrocha, what do you think?
        • @matejpavlovic, if it is per-block then how do we generate such proofs if we reassemble batched into block at the end of the ordering process?
      • @matejpavlovic: Indeed, if we wanted such a proof for every block, that could be inconvenient to implement, since Eudico blocks don't necessarily correspond to Mir batches.
        • @sergefdrv: Neither the HC checkpoints seem to correspond to the ISS checkpoints.
      • @xosmig: it is always possible to implement a lightweight external gadget that collects >2/3 signatures under a statement "the k-th block's hash is 0x...".
        • @sergefdrv: @xosmig, yep, I think it is easier to generate finality proofs outside, since doing so as a part of consensus implementation might be difficult (think about normal case vs. view changes etc.)

🎯 Up next

  • @xosmig
    • Basic availability layer implementation.
    • Design draft for dynamic modules, without implementation.
  • @dnkolegov:
    • working on libp2p transport for Mir
  • @matejpavlovic:
    • Update design documents based on latest discussions.
    • Define details and data structures for membership and reconfiguration.

@matejpavlovic
Copy link
Contributor

2022-07-11

📣 Updates

🧵 Discussion

🎯 Up next

  • @dnkolegov:
    • integration Eudico with Mir's libp2p transport
    • adopting libp2p transport in the Mir's sample app
    • will discuss the next task within the Mir MVP project with @matejpavlovic until this Friday
  • @sergefdrv
    • Finish implementing logical time in integration tests
  • @matejpavlovic
    • Refine Y3 roadmap to be similar type as B3
    • Refine Reconfiguration and Performance issues to more concrete AIs
    • Execute some of above AIs
  • @xosmig
    • Merge the availability layer prototype PR.
    • Add some integration tests for the availability layer.
    • Start adding complexity to the availability layer (message retransmission, persistent storage, etc.)

@matejpavlovic
Copy link
Contributor

2022-07-25

📣 Updates

🧵 Discussion

  • dnkolegov:
    • @matejpavlovic suggested to create a new Mir node on each reconfiguration via Eudico's manager (to start with something and to have something for MVP). I experimented with this idea and found that this works not well because we have to stop the network transport and then reestablish all network connections. I was not able to make Mir nodes with libp2p transport work immediately after stopping the node, but it worked with adding Sleep command. I would suggest to improve this idea a little bit in the following way: let's add new connections into network transport and remove the old connections if necessary instead of calling net.Stop(), net.Start() and net.Connect() on each reconfiguration. So we need to add a function like NewConnections(map[NodeID]NetAddr) to the network transport interface. This function opens connections to nodes contained in the map without removing all network connection from previous stage.
      • @matejpavlovic: Yes I agree, let's re-use the existing connections.
        Conclusion: @dnkolegov to implement it and use it in the Eudico integration code.

🎯 Up next

@matejpavlovic
Copy link
Contributor

2022-08-01

📣 Updates

🧵 Discussion

🎯 Up next

  • dnkolegov:
    • Implement membership update submission
    • Integrate the Mir availability layer into Eudico
  • @sergefdrv
    • Implement Load generator for Mir-based orderer
  • @matejpavlovic
    • Integrate availability layer in chat app
    • If time permits, continue making chat app reconfigurable
  • @xosmig
    • Dsl modules Demo at MoaDD

@matejpavlovic
Copy link
Contributor

2022-08-08

📣 Updates

🧵 Discussion

  • @dnkolegov: @matejpavlovic @xosmig, is the availability layer is ready? do (will) we have an example (in the chat-app) of how to use it properly?
    • @xosmig: There is a separate demo for the availability layer: link. Afaik, it is not yet integrated into ISS (and, hence, to the chat-demo).
    • @matejpavlovic I switched up the chat demo reconfiguration and the availability layer priorities, since the reconfiguration seemed more pressing wrt. Denis' work. I'll focus on the availability today and tomorrow, so it's integrated with ISS by tomorrow evening.
  • @xosmig: I am a bit concerned with the current approach to technical debt in the code. We should probably aim at writing a bit more general future-proof code. Otherwise, it is just being rewritten twice. In most cases, doing things general enough the first time is really not a problem. A few examples:
    • My personal big strugle: Mir != ISS (as I am tasked to do first non-ISS protocols in Mir). Unfortunatelly, I keep seeing (unnecessarily) ISS-specific code being added regularly.
    • strncov.Atoi(nodeID)
    • if tr.Sim != nil {...}
    • if transport, ok := tr.Modules["net"]; ok {...} (guilty, this one is mine)
    • @matejpavlovic: I acknowledge this point and agree we should do better on this. To explain how we got there initially: Historically, Mir started out as an implementation of ISS and has progressively been generalized to be less and less ISS-specific. But admittedly, few parts of the code that are ISS-specific are still from the old times. Some parts were just written in an ISS-specific way because it was cognitively simpler at that time, and generalization is lagging (leaving it as debt). I agree this is sub-optimal.
      • My suggestion: Let us be stricter with this from now on and, when reviewing PRs, point out ISS-specific parts and request generalization. And ingeneral, push for only merging PRs that do not add more debt, and in rare cases where they do, it should be justified and agreed-upon.
    • @xosmig: I understand why the code was written this way initially. I am more concerned about new code being added "by analogy" with the old code, thus increasing technical debt while we should aim at gradyally paying it off.

🎯 Up next

  • dnkolegov:
    • continue to work on Mir reconfuguration in Eudico
  • @matejpavlovic
    • Availability
    • Reconfiguration (node joining and state transfer)
  • @xosmig
    • Hopefully finally start implementing Narwhal

@matejpavlovic
Copy link
Contributor

2022-08-15

📣 Updates

🧵 Discussion

  • Summary of sync brainstorming between @matejpavlovic and @xosmig about Mir, Narwhal, and reconfiguration:
    • Narwhal implementation:
      • What persistent storage system to use? BadgerDB was used just as an example in the design document, because it happens to be used by the (inherited) WAL implementation. In principle, @matejpavlovic has no concrete preference for a particular data store. What potentially might be relevant is go-datastore.
      • The implementation will rely on a reliable communication abstraction. As for now, the unreliable Net module can be temporarily used as a stub. @matejpavlovic to check out the updated GitHub discussion on reliable message delivery.
    • Narwhal garbage collection:
      • The garbage collection in Narwhal "cuts off" some dependencies. It may become a problem when we will be adding deduplication and censorship resistence. It is not immediately clear whether it's actually necessary: maybe we could garbage-collect only ordered blocks and add "weak references" as in DagRider to make sure every block is eventually ordered? Let's look more closely at the DagRider and think more about it.
    • Narwhal reconfiguration:
      • Multiple instances of Narwhal will co-exist, using a similar instantiation and garbage collection mechanism as ISS. The ordering layer (ISS in our case) already drives the advancing of epochs and configurations and the creation of checkpoints. It will simply emit analogous events to the availability module, which will instantiate and garbage-collect Narwhal instances accordingly.
      • As the ordering layer (ISS) is pulling its input from the availability layer (Narwhal), the availability certificates need not carry explicit epoch information. The ordering layer will simply pull the certificate from the corresponding instance of Narwhal.
      • The instances of Narwhal will be represented as nested modules within a main availability module.
    • Mir nested modules:
      • In order to be able to kill old instances of the availability layer module and create new ones during the reconfiguration, we need something like dynamic modules. However, we don't have a clean design for dynamic modules yet. Instead, we settled on a simpler alternative, which can be implemented with a tiny modification to how Mir core processes module ids.
        • TODO(@xosmig): write down the design for submodules either in discussions or in an ADR.
        • TODO(@matejpavlovic): Introduce structured module identifiers to enable suport for nested modules

🎯 Up next

  • @dnkolegov:
    • finish integration of Mir reconfiguration
    • start working in integration of Mir's availability layer into Eudico
  • @xosmig:
    • A simple stubby Narwhal prototype.
  • @matejpavlovic
    • Merge reconfiguration and availability layer support
    • Prepare chat reconfiguration demo

@matejpavlovic
Copy link
Contributor

matejpavlovic commented Aug 22, 2022

2022-08-22

📣 Updates

🧵 Discussion

  • @xosmig: What do we want from code generation?
    • @xosmig: For example, I can generate a hierarchy of structs completely separate from the ones generated by protobufs, with Mir types (e.g., t.NodeID instead of string) and conversion functions.
      • @matejpavlovic: That would be nice to have.
        • Would that be generated from the .proto definitions?
          • @xosmig: the generator uses the protoc-generated .pb.go files as input.
            This choice was motivated by relative simplicity compared to working directly with .proto
            files since protobufs have their own type system which would need to be translated to go type system.
        • How would the generator know the difference between two different Mir types that map to the same protbuf type?
          E.g., if the protobuf message defines a field of type uint64, how would the generator know that it is, say,
          a sequence number and not an epoch number? Is it possible add additional annotations in the .proto file?
          • @xosmig: the generator looks at optional annotations on fields in .proto files, e.g.:
            string module = 1 [(mir.type) = "github.com/filecoin-project/mir/pkg/types.ModuleID"];
        • It would also be great to have, for each generated type, a generated constructor function that takes the fields as arguments.
          I would then encourage a convention of never creating an object directly through SomeObjectType{Field: val, ...}
          and always using the constructor. This way, should we decide to add fields to some objects,
          the compiler would help avoiding the default zero values for that field in instantiations of the object
          that one might forget to update.
          • @xosmig: yes, I already implemented the constructor functions.
    • @xosmig: The baseline is that I will just generate constructors and DSL functions.
      • @matejpavlovic: Yes that's a very helpful thing. Does this also mean constructors for event protobufs?
        • I have the code to generate the constructors implemented. We need to decide to which extent we want to use it.
        • @matejpavlovic: I'd say there is no harm in generating a constructor for each event type, even if it ends up not being used.
      • @matejpavlovic: A skeleton implementation of the DSL module would also be nice, with generated handlers for each event type.
        • For this we would need to use something like service description in protobufs.
          This would be nice in the future, but I think goes beyond the intended scope for the first version of code generator.
        • @matejpavlovic: A coarse but simple (and I agree, not necessarily very pretty) way would be
          to just generate an handler for every single event type defined for the package (even for those meant as output).
          The user would then delete those they don't use.
          Although, that might require the user to edit the generated intermediate "event registration" files.
          I guess that could be maybe solved by another annotation, but I agree that this is way out of scope for now.
  • @matejpavlovic: Also in the light of the above, and as @xosmig already suggested at some point,
    we should clean up the protobuf definitions, as now they are quite messy.
    The events.proto and messages.proto are historical artifacts and a big part (if not all) of their content
    should be moved to files associated with the respective modules. I'll soon have a proper look at that and move things around.
    @xosmig if this has any impact on the code generation and you would like me to stick to some particular conventions, pls let me know.
    • @xosmig: I think there is an implicit convention for event and message subtypes
      (e.g., isspb.Event, availabilitypb.Event, isspb.ISSMessage, mscpb.Message).
      • @matejpavlovic: Yes that makes sense. (The isspb.ISSMessage should then be renamed to isspb.Message though.)
    • I also don't want to support oneofs with primitive type options.
      Partly, because I am considering a posibility of the future where we will move to a different data model with interfaces instead of oneofs.
      • Ok, note taken, that makes sense too. I won't use primitive oneofs and when refactoring, I'll remove the existing ones.

🎯 Up next

  • @dnkolegov:
    • Implement reconfiguration by removing Mir nodes
    • Integrate the Mir availability layer into Eudico
  • @xosmig:
    • Finish the code generation
    • Narwhal prototype
  • @matejpavlovic
    • Polish reconfiguration
    • Start performance measurements

@matejpavlovic
Copy link
Contributor

2022-08-29

📣 Updates

🧵 Discussion

🎯 Up next

  • @dnkolegov:
    • Implementing and using Eudico mempool with Mir
    • Using the last Mir version in Eudico
  • @xosmig:
    • Finish the code generation.
  • @matejpavlovic
    • Create plan for future Mir improvements
    • Continue Mir implementation
    • Prepare team week session on general blockchain SMR

@matejpavlovic
Copy link
Contributor

2022-09-19

📣 Updates

🎯 Up next

  • @matejpavlovic
    • Y3-M4 roadmap timing and assignment of DRIs
    • Sync roadmaps with BuilderNet
    • Mir preliminary perf evaluation
  • @sergefdrv
  • @xosmig
    • Finish the code generation
  • @dnkolegov:
    • Reestablishing lost network connections
    • implementing client watermark windows for limiting the number of in-flight transactions

@matejpavlovic
Copy link
Contributor

Y3 has been merged with the B4 project, including the meeting notes.

@jsoares jsoares closed this as completed Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants