-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Waku Sync 2.0 #102
Comments
Technical StuffProtocol Parameters
When the recursion is seen as a tree.
N.B. both syncing nodes can have different parameters. Code Architecture
OptimizationPossible optimizations are part of the research paper. I noted those that fit our use case here for completeness. Range encodingWe will delta encode range bounds like Negentropy does and truncate hashes the same way, it reduces bandwidth with minimal downside. Range "Path" reuseWhen computing multiple ranges, if they are sorted, the upper bound may match the lower bound of the next range. In that case, a tree storage can reuse each "path" when computing fingerprints. Exotic PartitioningPartitioning doe not have to split equal ranges. We could design a partitioning scheme that increase the size of ranges for older elements since we could assume that it is unlikely that messages would still be missing after all this time. Subset ChecksWhen receiving a range fingerprint, in case of mismatch, a subset of item might match. With XOR we can in O(n3) find if we have such a subset, avoiding further round-trips at the cost of more computation. |
RoadmapLets project ourselves in the future shall we? Waku Sync 2.0The first step would be to replace 1.0 with the new version. The core protocol should be mostly final at this stage but only impl. basic storage mechanisms and optimizations. A big part of this work will be engineering of the code base as to design the proper abstractions and also finalize protocol design.
Light Protocols ReplacementLight push and filter can optionally be replace by sync. The first thing is to try different storage and range choice strategies for node with limited compute and memory. We may decide to do a Js impl. at this stage if Nim experiments are conclusive. Another possibility is to only augment current light protocols with sync for reliability. Store ReplacementStore could be replaced by sync with proper storage implementation. At this stage ranges would become queries that could include content and/or pusbsub topics or other parameters. This step would require protocol changes and a good amount of engineering efforts to impl. K-Dimensional trees for storage. Relay ReplacementSync can in theory replace GossibSub but replacing a battle-tested and mature protocol is a big task. We would have to study attack vectors, run experiments to measure latency and we may even have to design and impl. a scoring system similar to GossipSub to gain enough performance. |
Thanks for this detailed post! I also like how well the Roadmap lays out what I think would be the steps needed to get as much value from Sync as soon as possible. Some comments/questions:
Could this be a disadvantage when we have multiple diversified sync sessions? In other words, "stateless" seems to me to imply that we dynamically create and destroy state for each sync session - for each sync peer we fingerprint a unique sync range matching each sync request; for each light peer we apply a content topic filter and create + fingerprint a sync range over the filtered messages. We've spoken about this before, but I still wonder if the resulting overhead may be the major unknown to test in a real environment. Perhaps one of the first deliverables could be to build a basic PoC that proves that the overhead due to dynamic "state" management is reasonable.
You mean proving that a specific message hash is kept by the store node? Do we have rough idea how this might work, say if we need a consensus mechanism for messages (not that we necessarily should)? For Prolly trees, proving that you know the root and a path to the root should be enough. Here, however, it seems that ranges are dynamic by definition and it will be difficult to build a structure suitable for consensus?
I would perhaps need more background to understand this point about trees. Do you mean that dynamically computing the fingerprint over every sync range might become too expensive, so we may have to impose a more rigid hierarchical structure in the storage backend, where subranges can be "statically" fingerprinted? In such a model, only the affected leaves' and branches' fingerprints will need updating after an insert. However, if this understanding is correct, this seems very similar to the Prolly Tree structure?
Agree that we'll need at least a basic peer management mechanism here, perhaps descoring misbehaving sync peers according to a simple heuristic. We'll need to limit the number of peers we initiate sync with and that we allow to sync with us to maintain scalability (think of GRAFTs in gossipsub being based on mutual consensus). It would be nice to have Sync behaviour eventually feed into gossipsub scoring, so that also on a Relay level we encourage stronger connectivity to peers that prove themselves to be more useful in terms of message delivery. One more general question: Do you think there are any other major risks/unknowns that could be showstoppers here? To restate one of my concerns above, I'm wondering if it's possible that the overhead for state management could become impractical. This may force us to consider other data structures. If such unknowns exist, could we derisk them by building a PoC that targets these questions first? |
Only the storage has a state, requests only read from storage they do not write but storage must also be updated with new messages. I'll build a POC to explore this area.
A merkle tree could be kept for membership proofs. Either part of storage of separate like an addon. We could even sync Prolly trees if needed.
Similar yes they are both trees but Prollies are probabilistic, leaves have no limit on number of elements. Our tree would simply have fixed size leaves which is easier to manage.
💯
Of course there's a limit on how many range processing a node can do per seconds. It's hard to say what that number is without building something. How far can we optimize a highly concurrent tree structure is the engineering question we need to answer. As for other risks, I can't think of anything else, only unknown unknowns remains. |
With the simplest impl. on my computer and a 100k cache I can;
10M cache
|
Thanks for the detailed write-up! Few questions/points that come to me:
shouldn't this be based on the use-case of the user of sync protocol? what if in some scenarios batching may make this more efficient because latency may not be an issue for the user. this may be more of an implementation detail, but leaving this as a configurable option for users may benefit more use-cases. Will give another read diving into Technical details as well and come back with more questions. |
It can be used this way but are nodes going to accept to send that many messages? I would not want my node to do that except if I syncing with my own nodes. What I'm thinking is this use case might require special trust assumptions.
Good idea, I will measure and make a table.
I'm not sure what other ways we have. I'm open to suggestion.
We could add more options to the transfer protocol such as batching and ACKs. |
i remember initiating a discussion with @rymnc about this topic when we were working on store-sync few months ago. It invovles generating some sort of |
This assumes a flow where we wait for the whole set diff to be known, not a problem just a bit different than what I had in mind where diff found are sent right away. For DOS protection, would it not be better if the requester do the most work? Proof generation is expensive and verification not (usually?), but for batches the proof generator is the sender. We can simply disconnect from nodes that sends invalid proofs as we know they are malicious. I feel like in the case where batching would be better, latency is not a concern (then why batch?) but it is a concern when trying to speed up consistency. |
what do you mean by whole set diff? same logic can be applied to be done in batches as well right once we identify a batch that is being synced proof gets generated for it. maybe i am missing something.
valid point, but here requester is getting protected from being DOS'ed by the sender with which they are syncing. so, this makes sense that proof geenration is costly so that spam is not sent to the requester.
true, but this would be costly in nature to verify all old proofs isn't it? requester will have to store many merkle roots back in time for this to happen.
yep , agreed |
I don't understand. Do you mean generating proofs for messages in a range? In my mind, batches would contain either a sub-set or the whole sets differences. Messages and the proof would be sent to the other node to verify.
I would have to think more deeply about who is more vulnerable in this scenario. On one side, long sync like this are costly to do, but on the other hand you can also get spammed by fake messages. My instinct would be to always have the initiator of any computation pay the highest cost.
Yes that is a cost but with 32 bytes proofs we can store a lot of them before it becomes a problems. |
I can't really measure the Negentropy internals it seams. I'll try to compare them once I have a working POC of 2.0 |
sorry, looks like my sentence is quite jumbled.
agreed, but maybe that would be anyways taken care of by incentivization rather than building into the protocol?
true, it maynot be a large storage and anyways this would be part of some persistent store. |
hi, just my 2c on this ~ if both the peers know each other, then you may not need to use RLN for rate limiting at all, perhaps PoW might be more beneficial. |
Shard SupportTo prevent syncing of sets that contains messages from a shard a node does not support, we need to augment the protocol. How? Some possibilities below.
Keeping the stateless nature of the protocol seams like a good idea. Each payload would have to include shards and content topics. |
Wouldn't it be simpler to simply include the shard "offer" in the initial message with the counterparty simply accepting or rejecting this? In other words - either we sync over everything I store or we don't sync at all. This would eliminate the need for more complex storages for now, but at least ensure that the nodes involve know and consent to what is being synced. |
Yes of course but that's not a solution, just a quick fix. Do you feel we won't need to solve it for a long time? If so then I guess we could do that. |
Waku Sync 2.0
Waku Sync 1.0 wraps the Negentropy (RBSR based) protocol. I believe that designing a new RBSR protocol specifically for Waku would increase it's usefulness and efficiency.
I will describe version 2.0 below. Keep in mind that I will not explain range based set reconciliation in this post.
Why range based set reconciliation?
The flexibility of RBSR in terms of storage technology, stateless syncing sessions and overall simplicity makes RBSR a good fit for Waku. The ability to generate membership proofs could also be added as a separate system if needed.
Prolly trees have the capability of membership proofs (they are merkle trees after all) but would require the implementation of a garbage collector and the tree state cannot be changed while a sync is ongoing. The rigid structure that give Prolly trees their usefulness is also a detriment. Physical storage cannot be adapted to data structure but the opposite is true.
Overview
The ultimate goal is that syncing pair of nodes have the same set of messages. To achieve this we will use 2 protocols concurrently. One of those protocols will be used to transfer messages the other to inform the first as to which messages to transfer.
Reconciliation
This protocol is responsible for set reconciliation and is inspired by Negentropy but with modification because knowing the nature of the data stored (Waku messages) allow us to make informed decisions about fingerprinting function to use, attack vectors and storage data structure. We will also use specific sync ranges to optimize for Waku use cases (light push, filter, relay).
Transfer
This is the simplest of the 2 systems, it's main purpose is to send messages to the other node in the sync pair. It's only input will be message hashes and will rely on storage for message retrieval. To lower latency, no buffer or batching will be done so that messages are integrated into the set of the recipient node as fast as possible. This way if the recipient node is also syncing with other nodes at the same time this new message may also be synced and transferred.
Reconciliation Protocol
Operates on sets of elements that can be totally ordered. In our case timestamp and hash is used as the total order.
Range Choice
Ranges can be chosen based on the specific use case to reduce the number of round-trips in best case scenario. For example, if the context of the sync require fast propagation (Relay), smaller ranges in recent history would be best (RLN epoch). Inversely, for syncing archives choosing large ranges in the oldest part of the set would be preferable.
Various nodes can use any strategy the protocol stays the same.
Fingerprinting Function
The fingerprinting of a range would be the XOR operation applied to every hash in that range. Waku messages are hashed and then RLN verified when received. Reusing this hash and the timestamp included makes the most sense.
Attack Vectors
The sync process can be perturbed by carefully crafting Waku messages that combined with others would lead to range fingerprints identical to another so as to prevent syncing of that range by other nodes. While in theory possible it would be impractical. The first problem is that messages are encrypted, it is not possible to associate users to messages. Second, the very specific case where the sync process skips a range due to an attack has low impact. When syncing at a later moment, ranges would differs and the targeted message would still be synced. Sending messages also has a cost via RLN which act as a deterrent. Finally if an attacker has control of enough nodes they could use far more effective attacks than temporarily disturbing the sync process.
Bidirectionality
Negentropy assumes a client-server model, we don't, only the client know all the differences in their case. We want both nodes to know the differences for each ranges. When receiving a item set we will answer with our own item set and indicate that the sync for that range is done.
Storage
The goal of any storage implementation is to make the range fingerprinting as efficient as possible for that specific use case.
Queues
Light clients should sync very small range as to keep computation and state stored reasonably low. In-memory queues should be enough to hold the state and fingerprints can be computed by simple iteration.
Trees
Tree structures will be needed when linearly iterating to compute fingerprints becomes too inefficient. Instead of O(n) complexity they can compute fingerprints in O(log n) of the data set size. For queries that include content topics K-D trees could also be used. The problem is that inserting and removing elements of the tree also has a cost. We know that insertion will be frequent and deletions can also be in cases where the whole set is not keep but only the latest subset. Balancing these properties present an interesting engineering challenge.
Transfer Protocol
Messages will be sent on a best effort basis without retry, responses or error handling. The assumption is that any message that fail to reach it's destination will be synchronized and sent later. On the receiving end, every messages will be hashed and RLN proofs verified then added to the local storage. Nodes will only accept incoming connections from nodes they are synchronizing with.
Waku message proto buffers will be reused as wire-format.
Connection & Scoring
Who to sync with may not be part of the protocol but is an important detail nonetheless. In theory, choosing random peers may be best but in practice we may want to keep track of latency, online status and amount of messages synced so that the local node can curate a high quality list of peers. Another strategy we could implement is optimistic tit-for-tats (a la BitTorrent), since we can easily count how many missed messages we sent and received and from whom.
The text was updated successfully, but these errors were encountered: