Replies: 2 comments 3 replies
-
Directionally this seems reasonable to me. What do you suggest as next steps? Also pinging @jm-clius @dryajov @cammellos @arnetheduck in case you have any feedback on above. |
Beta Was this translation helpful? Give feedback.
-
Moved to: vacp2p/research#73 Generic message follows I'm going to go ahead and close this discussion as:
If you feel like this is in error, please create a thread on Vac forum or a new issue. |
Beta Was this translation helpful? Give feedback.
-
Problem Definition
The need for data synchronization exists in different domains of
nim-waku
:Store protocol: Currently, store nodes work independently without synchronizing the state of their persisted messages. As such, there might be situations that they may have different views of the historical messages. This view inconsistency also means that light nodes cannot rely on the completeness of the history provided by a single store node.
We need a mechanism to synchronize the state of store nodes and to enable them to exchange their views and converge to a consistent and complete state. This will add reliability to the overall store protocol service. Moreover, any single full store node becomes a reliable source of message history.
Chat2: A similar synchronization requirement exists in the chat2 application where the message view of clients may diverge due to network delay. Another use-case is the synchronization of two (offline) devices belonging to the same user.
The focus this post is to present an architecture for data synchronization relying on MVDS.
Using this architecture, a group of independent nodes each with a dynamic set of messages would be able to keep their message view consistent.
In the following modular architecture, the synchronization problem has been broken into three parts MVDS, Synchronization Orchestration Layer, and Peer Management Layer.
MVDS
At a very high level, consider MVDS as a protocol by which two nodes Alice and Bob with two sets of messages
A
andB
, respectively, can communicate, synchronize and obtainA UNION B
. At a very abstract level, what MVDS does is that:A
as input and synchronizes them with the other end of the protocol (makes sureA-B
is received by Bob)B-A
Synchronization Orchestration
The orchestration layer is responsible to keep the node's message state in sync with a (dynamic) set of peers
{P_1,..., P_N}
. It does so by periodically synchronizing with those peers via the MVDS protocol.The node's message state is a queue of messages denoted by
Q
. The Orchestration layer supports:Q
.Q
with messages fetched from each connected peer.Q
.Peer/Group Management
The set of peers to be synchronized with must be dictated to the synchronization layer and this is the responsibility of the peer management layer. Note that the peer management unit is an independent component that can exist regardless of the other two layers.
This layer must identify the group of peers that the node needs to synchronize with. It is expected that this layer keeps track of the updates on the group memberships and communicates those updates with the Synchronization Orchestration layer to add/pause/resume synchronization with the peers accordingly.
The implementation of this layer can be as simple as maintaining a static list of peers. Or, may involve a complex group membership protocol.
Example use cases
The following explains how to deploy this architecture in chat2 and store protocol. Note that the modularity of this architecture gives us design flexibility where we can devise various synchronization topologies which various communication complexities.
Chat2 application
In a very basic solution, we can imagine the peer management unit to be a static and known list of members. This list is passed to the the Synchronization Orchestration layer which in turn begins to synchronize with them.
FT-Store: Store (full) nodes synchronization
See the problem definition in waku-org/nwaku#561
As the solution, a full store node can pass a list of other full store nodes to the Synchronization Orchestration layer to get in sync with.
One big unknown here is that how to find other store nodes and how to make sure all the store nodes will converge to a consistent message view.
One immediate solution is to assume a static list of full nodes and make them sync with every other node. This imposes O(N^2) communication complexity which is not ideal (specially considering that there might be a large number of store nodes)
See below for the sketch of a more efficient solution. Its communication complexity is O(N).
Solution Sketch
We need to have a connected graph of store full nodes, where each node has a constant number of connections and periodically synchronizes with its connections using the Synchronization Orchestration layer. Such a connected graph allows eventual synchronization across all the store full nodes.
How to construct this graph? by leveraging libp2p GossipSub protocol.
Peer management
We can have a GossipSub domain for store nodes synchronization by defining a pubsub topic e.g.,
waku/2/store-sync/proto
. This would ultimately lead all the store nodes to find each other and form a mesh. Then, each node extracts the id of its direct connections within that mesh and passes them to the Synchronization Orchestration to synchronize with.Beta Was this translation helpful? Give feedback.
All reactions