This repository has been archived by the owner on Feb 8, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 31
time/roundtrip complexity of replication protocol #309
Comments
Yep, the main plan right now is to do (1.) and (5.) below.
|
for future reference:
|
A mechanism I used in Mimis are called "relative roots". Almost all the directories have a ... ⇔ ../... link. Root directories, however, have a ... ⇔ . It allows consistent forward linking, so my pages have lots of |
for future ref: |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
After spending quite a lot of time studying and thinking about replication protocols,
I see them as essentially as just being a remote dataset comparison.
This comparison may take advantage of structures inherent in the data being replicated,
or helpful a structure may be designed in to make remote comparison easy.
For example, scuttlebutt http://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf uses append-only logs per peer, which means all that is required for remote comparison is to exchange a vector clock.
Bittorrent has a static set of blocks, known beforehand, this is exchanged as a bitfield.
scuttlebutt is O(Nodes) bandwidth and bittorrent is O(blocks) bandwidth, but both are O(1) for roundtrips. Since round-trips add a massive delay, I think it's very wise to keep the roundtrip complexity low.
Ipfs replicates a DAG. This is a somewhat more complex and interesting datastructure than scuttlebutt or bittorrent. From what is described in the draft paper, in ipfs each node requests a want list - the branches it wants to expand, which is requested and then sent.
This means the roundtrips required would vary widly with the structure of the DAG. in the worst case, you have a linked list, which would require a round trip for each item.
In general, I think the roundtrips will be something like O(depth) considering that the trees are user generated this could perform very badly in some cases, for example, a git repo has a very linked listy structure, so a large repo would go quite slow.
Some ideas for improving this, as discussed on irc etc
When objects are added, the chain points to all of them.
first you could replicate the chain (requested since the last known point)
and then you'll know all the objects so from now on the dag is flat.
this would require 2 round trips, which is still constant time.
want. the remote sends the object, plus objects that it links to. I notice that the
current tree structure has hashes, names, and sizes. Maybe the sizes could be expanded to
include branching factor? then you could eagerly expand smallish nodes that branch lots?
I think you'd also need to send something like to represent the objects you alread know.
which could be, I know everything under X, to a depth of 10, but I want things from branch
Y (but don't send me anything under X+10)
The text was updated successfully, but these errors were encountered: