You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The need for data synchronization exists in different domains of nim-waku: Store protocol: Currently, store nodes work independently without synchronizing the state of their persisted messages. As such, there might be situations that they may have different views of the historical messages. This view inconsistency also means that light nodes cannot rely on the completeness of the history provided by a single store node.
We need a mechanism to synchronize the state of store nodes and to enable them to exchange their views and converge to a consistent and complete state. This will add reliability to the overall store protocol service. Moreover, any single full store node becomes a reliable source of message history.
Chat2: A similar synchronization requirement exists in the chat2 application where the message view of clients may diverge due to network delay. Another use-case is the synchronization of two (offline) devices belonging to the same user.
The focus this post is to present an architecture for data synchronization relying on MVDS.
Using this architecture, a group of independent nodes each with a dynamic set of messages would be able to keep their message view consistent.
In the following modular architecture, the synchronization problem has been broken into three parts MVDS, Synchronization Orchestration Layer, and Peer Management Layer.
MVDS
At a very high level, consider MVDS as a protocol by which two nodes Alice and Bob with two sets of messages A and B, respectively, can communicate, synchronize and obtain A UNION B. At a very abstract level, what MVDS does is that:
It receives a list of messages i.e., A as input and synchronizes them with the other end of the protocol (makes sure A-B is received by Bob)
It outputs messages fetched from the other end of the protocol i.e., B-A
Synchronization Orchestration
The orchestration layer is responsible to keep the node's message state in sync with a (dynamic) set of peers {P_1,..., P_N}. It does so by periodically synchronizing with those peers via the MVDS protocol.
The node's message state is a queue of messages denoted by Q. The Orchestration layer supports:
Add peer.
Remove peer.
Pause/Resume peer (Optional).
Set a synchronization interval for a Peer (interval to be defined).
Keeps all the peers synchronized with the latest state of Q .
Updates Q with messages fetched from each connected peer.
De-duplicates messages in Q.
Peer/Group Management
The set of peers to be synchronized with must be dictated to the synchronization layer and this is the responsibility of the peer management layer. Note that the peer management unit is an independent component that can exist regardless of the other two layers.
This layer must identify the group of peers that the node needs to synchronize with. It is expected that this layer keeps track of the updates on the group memberships and communicates those updates with the Synchronization Orchestration layer to add/pause/resume synchronization with the peers accordingly.
The implementation of this layer can be as simple as maintaining a static list of peers. Or, may involve a complex group membership protocol.
Example use cases
The following explains how to deploy this architecture in chat2 and store protocol. Note that the modularity of this architecture gives us design flexibility where we can devise various synchronization topologies which various communication complexities.
Chat2 application
In a very basic solution, we can imagine the peer management unit to be a static and known list of members. This list is passed to the the Synchronization Orchestration layer which in turn begins to synchronize with them.
FT-Store: Store (full) nodes synchronization
See the problem definition in waku-org/nwaku#561
As the solution, a full store node can pass a list of other full store nodes to the Synchronization Orchestration layer to get in sync with.
One big unknown here is that how to find other store nodes and how to make sure all the store nodes will converge to a consistent message view.
One immediate solution is to assume a static list of full nodes and make them sync with every other node. This imposes O(N^2) communication complexity which is not ideal (specially considering that there might be a large number of store nodes)
See below for the sketch of a more efficient solution. Its communication complexity is O(N).
Solution Sketch
We need to have a connected graph of store full nodes, where each node has a constant number of connections and periodically synchronizes with its connections using the Synchronization Orchestration layer. Such a connected graph allows eventual synchronization across all the store full nodes.
How to construct this graph? by leveraging libp2p GossipSub protocol.
Peer management
We can have a GossipSub domain for store nodes synchronization by defining a pubsub topic e.g., waku/2/store-sync/proto. This would ultimately lead all the store nodes to find each other and form a mesh. Then, each node extracts the id of its direct connections within that mesh and passes them to the Synchronization Orchestration to synchronize with.
The followings are the next steps:
Implementing MVDS as a separate protocol (will use the old MVDS nim repo but have to clean it up and revise/debug/test the code) together with a raw specs (the prior specs may need some updates). Issuing up further tasks.
Development of a basic orchestration layer with a static group management (nodes to be set through command-line)
Integration with wakunode2
Further down the road, I will focus on the GossipSub-based peer management.
The text was updated successfully, but these errors were encountered:
oskarth
changed the title
RFC: Architecture for data synchronization: chat2 and store protocol
Architecture for data synchronization: chat2 and store protocol
Jul 16, 2021
Discussed in vacp2p/rfc#384
Originally posted by staheri14 May 27, 2021
Problem Definition
The need for data synchronization exists in different domains of
nim-waku
:Store protocol: Currently, store nodes work independently without synchronizing the state of their persisted messages. As such, there might be situations that they may have different views of the historical messages. This view inconsistency also means that light nodes cannot rely on the completeness of the history provided by a single store node.
We need a mechanism to synchronize the state of store nodes and to enable them to exchange their views and converge to a consistent and complete state. This will add reliability to the overall store protocol service. Moreover, any single full store node becomes a reliable source of message history.
Chat2: A similar synchronization requirement exists in the chat2 application where the message view of clients may diverge due to network delay. Another use-case is the synchronization of two (offline) devices belonging to the same user.
The focus this post is to present an architecture for data synchronization relying on MVDS.
Using this architecture, a group of independent nodes each with a dynamic set of messages would be able to keep their message view consistent.
In the following modular architecture, the synchronization problem has been broken into three parts MVDS, Synchronization Orchestration Layer, and Peer Management Layer.
MVDS
At a very high level, consider MVDS as a protocol by which two nodes Alice and Bob with two sets of messages
A
andB
, respectively, can communicate, synchronize and obtainA UNION B
. At a very abstract level, what MVDS does is that:A
as input and synchronizes them with the other end of the protocol (makes sureA-B
is received by Bob)B-A
Synchronization Orchestration
The orchestration layer is responsible to keep the node's message state in sync with a (dynamic) set of peers
{P_1,..., P_N}
. It does so by periodically synchronizing with those peers via the MVDS protocol.The node's message state is a queue of messages denoted by
Q
. The Orchestration layer supports:Q
.Q
with messages fetched from each connected peer.Q
.Peer/Group Management
The set of peers to be synchronized with must be dictated to the synchronization layer and this is the responsibility of the peer management layer. Note that the peer management unit is an independent component that can exist regardless of the other two layers.
This layer must identify the group of peers that the node needs to synchronize with. It is expected that this layer keeps track of the updates on the group memberships and communicates those updates with the Synchronization Orchestration layer to add/pause/resume synchronization with the peers accordingly.
The implementation of this layer can be as simple as maintaining a static list of peers. Or, may involve a complex group membership protocol.
Example use cases
The following explains how to deploy this architecture in chat2 and store protocol. Note that the modularity of this architecture gives us design flexibility where we can devise various synchronization topologies which various communication complexities.
Chat2 application
In a very basic solution, we can imagine the peer management unit to be a static and known list of members. This list is passed to the the Synchronization Orchestration layer which in turn begins to synchronize with them.
FT-Store: Store (full) nodes synchronization
See the problem definition in waku-org/nwaku#561
As the solution, a full store node can pass a list of other full store nodes to the Synchronization Orchestration layer to get in sync with.
One big unknown here is that how to find other store nodes and how to make sure all the store nodes will converge to a consistent message view.
One immediate solution is to assume a static list of full nodes and make them sync with every other node. This imposes O(N^2) communication complexity which is not ideal (specially considering that there might be a large number of store nodes)
See below for the sketch of a more efficient solution. Its communication complexity is O(N).
Solution Sketch
We need to have a connected graph of store full nodes, where each node has a constant number of connections and periodically synchronizes with its connections using the Synchronization Orchestration layer. Such a connected graph allows eventual synchronization across all the store full nodes.
How to construct this graph? by leveraging libp2p GossipSub protocol.
Peer management
We can have a GossipSub domain for store nodes synchronization by defining a pubsub topic e.g.,
waku/2/store-sync/proto
. This would ultimately lead all the store nodes to find each other and form a mesh. Then, each node extracts the id of its direct connections within that mesh and passes them to the Synchronization Orchestration to synchronize with.The followings are the next steps:
Implementing MVDS as a separate protocol (will use the old MVDS nim repo but have to clean it up and revise/debug/test the code) together with a raw specs (the prior specs may need some updates). Issuing up further tasks.
Development of a basic orchestration layer with a static group management (nodes to be set through command-line)
Integration with wakunode2
Further down the road, I will focus on the GossipSub-based peer management.
The text was updated successfully, but these errors were encountered: