-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitswap Rounds #417
Comments
Changes im going to make for the alpha:
|
@whyrusleeping probably important to make the |
How will available bandwidth be determined? |
@maybebtc good question... |
@jbenet could you elaborate a little more on the |
For now will be set by
Sure thing. how about these examples? (json is easier):
# from p1 to p2
{
"wantlist": {
"full": true,
"entries": [
{ "block": "<hash-a>", "priority": 1},
{ "block": "<hash-b>", "priority": 1},
{ "block": "<hash-c>", "priority": 1},
{ "block": "<hash-d>", "priority": 1}
]
}
}
# from p2 to p1
{
"blocks": [
"<data which hashes to <hash-a>>",
"<data which hashes to <hash-b>>",
],
"wantlist": {
"full": true,
"entries": [
{ "block": "<hash-c>", "priority": 100}, # really wants c
{ "block": "<hash-d>", "priority": 1}
{ "block": "<hash-e>", "priority": 1}
{ "block": "<hash-f>", "priority": 1}
]
}
}
# from p1 to p2
{
"blocks": [
"<data which hashes to <hash-f>>",
],
"wantlist": {
"full": false, # note partial. could leave it out (defaults to false).
"entries": [
{ "block": "<hash-a>", "cancel": true}, # - a
{ "block": "<hash-b>", "cancel": true}, # - b
{ "block": "<hash-g>", "priority": 2} # + g
]
}
} |
awesome, makes more sense now, thanks! |
@jbenet with priorities, the list ordering no longer matter, correct? |
@whyrusleeping correct |
Would we set to a value that is effectively inf? |
or just leave it out of the implementation for now, but comment where it would go? |
@jbenet how are you envisioning the |
@whyrusleeping basically, on receiving the block (or the client somehow canceling things, i dont think we allow that presently. cancelling the context doesnt remove the block from the wantlist, i dont think?)... anyway.. on these, we should send out updates to all our peers, "canceling" that block from our wantlist. it's effectively an Ack to the sender, but more importantly, it's a very cheap wantlist update to everyone else. |
okay, so, how do we decide on the set of peers to send that update to? we dont really keep track of who has our wantlist or not |
@whyrusleeping ? bitswap should be keeping a set of peers we're currently "trading with". they are the people we rebroadcast wantlists to. (this set can be the swarm connected peers for now. but doesnt need to be later on) |
Ah, the strategy does have a list of peers... youre right. How do we want to go about 'ending' a trading session? |
This is really tricky. A session should be open because either
these have different end conditions:
Coming up with a correct "we're done here, close" is non trivial. Perhaps a muuuuuuch simpler thing to do that may work well in practice is just this: type session struct {
peer peer.Peer
deadline time.Time
}
func (s *session) shouldRemainOpen() {
return time.Now().Before(s.deadline)
}
func (s *session) bumpDeadline(t time.Duration) {
new := time.Now().Add(t)
if new.After(time.Now()) {
s.deadline = new
}
}
// when (a) we open connection, or (b) get or (c) send a block, bump deadline.
// (the first one (a), could be longer)
s.bumpDeadline(timeout) In any case, i think isolating all this and wrapping it with a |
I think i have a slightly better idea, we should extract the routing table from the DHT, and let bitswap have more direct access to it. When a peer is bumped from the routing table, close our session with them. |
@whyrusleeping that won't work. not every peer is in the routing table. |
But they would be if bitswap had the routing table as well, and we call table.Update(...) for every interaction within bitswap |
We probably want a more generic mechanism for getting this information across the boundary. The routing table isn't an actual source of truth. The RT gets its information from a different layer. #418 |
Incorrect. The routing table has a limited set of slots, corresponding to (roughly) k log2 of the size of the DHT, and prefers long running nodes useful in DHT queries. (once the buckets are full, new nodes will just be dropped). Bitswap should be able to connect to people no matter what the routing table thinks of them. This is a separate system. Assumptions like these drive over schoolkids with their cars.
Yes. Routing information is its own thing, and should make its own conclusions separately. this implies a tighter bond between Routing and Net, not between bitswap and routing. (bitswap should have no clue at all about |
Ran some benchmarks. The system's throughput is hard-limited by the choices of There seems to be an impedance mismatch between the streaming nature of the bitswap system and the batch models we've been attempting to reify. In machine learning, there's the notion of online learning where, rather than performing large batch updates, the system can incrementally update as new data comes in. And in data processing, there are systems like Apache Storm that are designed to process unbounded streams of data. For many real-time scenarios, Storm is a more appropriate fit than Hadoop/MapReduce. I suspect in the case of bitswap, a streaming model may be a better match for the referent system. 1 for { // worker sends messages as long as there exist messages in Outbox
2 select {
3 case msg := <-bs.decider.NextMessage: // decider has a PQ/heap of messages to go out
4 sender.SendMessage(msg) // sender/network rate limits this operation
5 case <-ctx.Done():
6 }
7 } |
Yeah, I think a system like this would work nicely. Do we have the requred backpressure to prevent this from clogging the system? |
Taking @maybebtc's suggestions into account, i have started brainstorming a better way for deciding what bitswap should do at any given moment. Im planning on building a priority queue of sorts, where higher priority blocks (based on some heuristic) are at the front of the list. This queue will only contain blocks we have available locally, and will be added to whenever we receive a wantlist, or whenever we receive new blocks. I havent figured out the sorting heuristic yet, as basing it off of the peers debtRatio would require constant resorting, and basing it off of the wanted blocks priority would encourage people to game the system (unless priorities were normalized). But my idea (once a heuristic is determined) is that to select the next item, a random number is chosen on an inversely exponential distribution, such that blocks near the front of the list are more likely to be selected. @jbenet, thoughts? |
At this stage in the release cycle, I'd pursue the absolute simplest thing that could possibly work. I wouldn't worry about the debt ratio at this stage. Assume nice strategy. From where I sit, it would appear that bitswap MVP priorities are simplicity and throughput. |
Okay, so absolute simplest would just be a heuristic of H(x) = 1. Makes things pretty easy. |
This doc describes how the Bitswap Protocol v1 should work.
For now, we assume an understanding of the goals and high level description. In time, this document will hopefully be self-contained.
I describe here the desired state. Not necessarily what we'll implement right away
Elements
Bitswap Concepts
Peer
- other hosts we exchange data withLedger
- containing peer-to-peer accountingWantlist
- list of desired blocksStrategy
- pluggable decision making unitImplementation Details
Client
- the client of bitswap (e.g. ipfs node)Message
- a bitswap message, outgoing or incomingBlockstore
- local block storageAvailable Bandwidth
- defined byClient
Round
- an iteration of the protocol, within one hostAllocation
- what to send to whom. decided on by Strategy, per Round.go-ipfs specifics
net.Service
- registered service, which sends + receives messagesnet.Handler
- request handlerMessage looks like this:
High Level:
RoundWorker
which handles rounds, allocations, and sendingClientWorker
(need better name) which handles requests fromClient
Peers
andClient
pass information to the workers through channels. workers modify state, may issue outgoing requests, and send back responses.The RoundWorker:
(s *Strategy) GetAllocation(Ledgers, Wantlist, Available Bandwidth, Blockstore) *Allocation
Allocation
Normally, we would let the data sending define the rounds (this makes it easier to use available bandwidth correctly), but our net system may buffer the msgs out and return before actually sending them (can see about removing this). So we may have to use timed intervals (0.25, 0.5, 1s ?) instead.
Example of the RoundWorker
The ClientWorker:
Note that
Message.Wantlist.Entry
has int priority. These are normalized, meaning that a priority is really:p_i / sum(p_j for all p)
. We need it because sending a single WantlistEntry as an update is not able to convey where the priority is. Also, sorting is not able to express priority differences. Hence this change.As Peer messages come in:
As Client requests come in:
(WIP)
The text was updated successfully, but these errors were encountered: