Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitswap Rounds #417

Closed
jbenet opened this issue Dec 8, 2014 · 27 comments
Closed

Bitswap Rounds #417

jbenet opened this issue Dec 8, 2014 · 27 comments
Assignees

Comments

@jbenet
Copy link
Member

jbenet commented Dec 8, 2014

This doc describes how the Bitswap Protocol v1 should work.

For now, we assume an understanding of the goals and high level description. In time, this document will hopefully be self-contained.

I describe here the desired state. Not necessarily what we'll implement right away

Elements

Bitswap Concepts

  • Peer - other hosts we exchange data with
  • Ledger - containing peer-to-peer accounting
  • Wantlist - list of desired blocks
  • Strategy - pluggable decision making unit

Implementation Details

  • Client - the client of bitswap (e.g. ipfs node)
  • Message - a bitswap message, outgoing or incoming
  • Blockstore - local block storage
  • Available Bandwidth - defined by Client
  • Round - an iteration of the protocol, within one host
  • Allocation - what to send to whom. decided on by Strategy, per Round.

go-ipfs specifics

  • net.Service - registered service, which sends + receives messages
  • net.Handler - request handler

Message looks like this:

message Message {

  message Wantlist {

    message Entry {
      optional string block = 1; // the block key
      optional int priority = 2; // the priority (normalized). default to 1
      optional bool cancel = 3;  // whether this revokes an entry
    }

    repeated Entry entries = 1; // a list of wantlist entries
    optional bool full = 2;     // whether this is the full wantlist. default to false
  }

  optional Wantlist wantlist = 1;
  repeated bytes blocks = 2;
}

High Level:

  • the core bitswap service employs two async workers:
  • the RoundWorker which handles rounds, allocations, and sending
  • the ClientWorker (need better name) which handles requests from Client
  • all incoming requests from Peers and Client pass information to the workers through channels. workers modify state, may issue outgoing requests, and send back responses.

The RoundWorker:

  • maintains a peer's Wantlist
  • iteratively performs rounds:
    • call (s *Strategy) GetAllocation(Ledgers, Wantlist, Available Bandwidth, Blockstore) *Allocation
    • send out blocks according to Allocation
  • (select on Wantlist update channel, and on round time interval?)

Normally, we would let the data sending define the rounds (this makes it easier to use available bandwidth correctly), but our net system may buffer the msgs out and return before actually sending them (can see about removing this). So we may have to use timed intervals (0.25, 0.5, 1s ?) instead.

Example of the RoundWorker

The ClientWorker:

  • maintains local Wantlist
  • periodically rebroadcasts Wantlist (5-15s?)
  • sends out single Wantlist updates to other nodes
  • (select on local request channel, and on periodic rebroadcast)

Note that Message.Wantlist.Entry has int priority. These are normalized, meaning that a priority is really: p_i / sum(p_j for all p). We need it because sending a single WantlistEntry as an update is not able to convey where the priority is. Also, sorting is not able to express priority differences. Hence this change.

As Peer messages come in:

  • send peer wantlist updates to RoundWorker (over channel)
  • call HasBlock on blocks

As Client requests come in:

  • send local wantlist updates to ClientWorker (over channel)
  • return blocks promise (subscribe, send out requests etc)

(WIP)

@whyrusleeping
Copy link
Member

Changes im going to make for the alpha:

  • Create a roundWorker loop/thread
  • Change ReceiveMessage to no longer process the wantlist, and instead send it to the roundWorker
  • When a new wantlist comes in for a peer, discard the old wantlist
  • Every RoundPeriod send a subset of N blocks wanted by peers, where N is determined by available bandwidth / blocksize
  • more todos as I work this out more

@jbenet
Copy link
Member Author

jbenet commented Dec 8, 2014

@whyrusleeping probably important to make the Message change above (into entires). that will seriously cut down bitswap b/w use. wantlists are big.

@btc
Copy link
Contributor

btc commented Dec 8, 2014

How will available bandwidth be determined?

@whyrusleeping
Copy link
Member

@maybebtc good question...

@whyrusleeping
Copy link
Member

@jbenet could you elaborate a little more on the Message change? Im not sure I completely understand where youre going with that.

@jbenet
Copy link
Member Author

jbenet commented Dec 8, 2014

How will available bandwidth be determined?

For now will be set by Client. (ipfs can measure bandwidth capacity over time. for now we can punt on this (not give it to the strategy), but leave room for it).

could you elaborate a little more on the Message change?

Sure thing. how about these examples? (json is easier):

  1. full wantlist, no blocks. blocks of equal priority. prio is normalized, so each is 1/sum = 0.25
# from p1 to p2
{
  "wantlist": {
    "full": true,
    "entries": [
      { "block": "<hash-a>", "priority": 1},
      { "block": "<hash-b>", "priority": 1},
      { "block": "<hash-c>", "priority": 1},
      { "block": "<hash-d>", "priority": 1}
    ]
  }
}
  1. say p2 sends <hash-a> and <hash-b> to p1. (and its own wantlist)
# from p2 to p1
{
  "blocks": [
    "<data which hashes to <hash-a>>",
    "<data which hashes to <hash-b>>",
  ],
  "wantlist": {
    "full": true,
    "entries": [
      { "block": "<hash-c>", "priority": 100}, # really wants c
      { "block": "<hash-d>", "priority": 1}
      { "block": "<hash-e>", "priority": 1}
      { "block": "<hash-f>", "priority": 1}
    ]
  }
}
  1. on getting the blocks, p1 sends out to all its peers (doesnt have to be an independent message, could be on another message going to each peer, but it's fine if it is because we use buffered streams).
# from p1 to p2
{
  "blocks": [
    "<data which hashes to <hash-f>>",
  ],
  "wantlist": {
    "full": false, # note partial. could leave it out (defaults to false).
    "entries": [
      { "block": "<hash-a>", "cancel": true}, # - a
      { "block": "<hash-b>", "cancel": true}, # - b
      { "block": "<hash-g>", "priority": 2}   # + g
    ]
  }
}

@whyrusleeping
Copy link
Member

awesome, makes more sense now, thanks!

@whyrusleeping
Copy link
Member

@jbenet with priorities, the list ordering no longer matter, correct?

@jbenet
Copy link
Member Author

jbenet commented Dec 8, 2014

@whyrusleeping correct

@btc
Copy link
Contributor

btc commented Dec 9, 2014

For now will be set by Client. (ipfs can measure bandwidth capacity over time. for now we can punt on this (not give it to the strategy), but leave room for it).

Would we set to a value that is effectively inf?

@jbenet
Copy link
Member Author

jbenet commented Dec 9, 2014

Would we set to a value that is effectively inf?

or just leave it out of the implementation for now, but comment where it would go?

@whyrusleeping
Copy link
Member

@jbenet how are you envisioning the Cancel being used? For example, If I receive a block ive been wanting, I should want to cancel that item from my wantlist, what should be done there?

@jbenet
Copy link
Member Author

jbenet commented Dec 9, 2014

@whyrusleeping basically, on receiving the block (or the client somehow canceling things, i dont think we allow that presently. cancelling the context doesnt remove the block from the wantlist, i dont think?)... anyway.. on these, we should send out updates to all our peers, "canceling" that block from our wantlist. it's effectively an Ack to the sender, but more importantly, it's a very cheap wantlist update to everyone else.

@whyrusleeping
Copy link
Member

okay, so, how do we decide on the set of peers to send that update to? we dont really keep track of who has our wantlist or not

@jbenet
Copy link
Member Author

jbenet commented Dec 9, 2014

@whyrusleeping ? bitswap should be keeping a set of peers we're currently "trading with". they are the people we rebroadcast wantlists to. (this set can be the swarm connected peers for now. but doesnt need to be later on)

@whyrusleeping
Copy link
Member

Ah, the strategy does have a list of peers... youre right. How do we want to go about 'ending' a trading session?

@jbenet
Copy link
Member Author

jbenet commented Dec 9, 2014

This is really tricky.

A session should be open because either

  1. we want something from them, and they may give it to us.
  2. they want something from us, and we may give it to them.
    (note the complications :) )

these have different end conditions:

  1. they want something from us
    • they get it (from us or someone else) and close the connection. (we dont do anything)
    • we wont give it to them. (hard to ascertain, because they may be sending us things)
  2. we want something from them
    • they gave it to us, we're done, we close. (easy)
    • seems like they wont be giving it to us.... (hard). this one basically means we should have a notion of likelihood of success (based on maybe going to get blocks they want for them, and sending them to get our debt ratio up) but this gets tricky.

Coming up with a correct "we're done here, close" is non trivial.

Perhaps a muuuuuuch simpler thing to do that may work well in practice is just this:

type session struct {
  peer peer.Peer
  deadline time.Time
}

func (s *session) shouldRemainOpen() {
  return time.Now().Before(s.deadline)
}

func (s *session) bumpDeadline(t time.Duration) {
  new := time.Now().Add(t)
  if new.After(time.Now()) {
    s.deadline = new
  }
}

// when (a) we open connection, or (b) get or (c) send a block, bump deadline.
// (the first one (a), could be longer)
s.bumpDeadline(timeout)

In any case, i think isolating all this and wrapping it with a func (s *session) shouldRemainOpen() makes it easier to alter later.

@whyrusleeping
Copy link
Member

I think i have a slightly better idea, we should extract the routing table from the DHT, and let bitswap have more direct access to it. When a peer is bumped from the routing table, close our session with them.

@jbenet
Copy link
Member Author

jbenet commented Dec 9, 2014

@whyrusleeping that won't work. not every peer is in the routing table.

@whyrusleeping
Copy link
Member

But they would be if bitswap had the routing table as well, and we call table.Update(...) for every interaction within bitswap

@btc
Copy link
Contributor

btc commented Dec 9, 2014

We probably want a more generic mechanism for getting this information across the boundary.

The routing table isn't an actual source of truth. The RT gets its information from a different layer. #418

@jbenet
Copy link
Member Author

jbenet commented Dec 9, 2014

But they would be if bitswap had the routing table as well, and we call table.Update(...) for every interaction within bitswap

Incorrect. The routing table has a limited set of slots, corresponding to (roughly) k log2 of the size of the DHT, and prefers long running nodes useful in DHT queries. (once the buckets are full, new nodes will just be dropped). Bitswap should be able to connect to people no matter what the routing table thinks of them. This is a separate system. Assumptions like these drive over schoolkids with their cars.

call table.Update(...)
We probably want a more generic mechanism for getting this information across the boundary.

Yes. Routing information is its own thing, and should make its own conclusions separately. this implies a tighter bond between Routing and Net, not between bitswap and routing. (bitswap should have no clue at all about table.Update).

@btc
Copy link
Contributor

btc commented Dec 13, 2014

Ran some benchmarks. The system's throughput is hard-limited by the choices of roundPeriod and bandwidthPerRound. And for every fixed choice of these two variables, there exists a scenario that doesn't quite fit. And by allowing these values to vary/adapt at runtime, optimality comes at a high cost of complexity.

There seems to be an impedance mismatch between the streaming nature of the bitswap system and the batch models we've been attempting to reify.

In machine learning, there's the notion of online learning where, rather than performing large batch updates, the system can incrementally update as new data comes in. And in data processing, there are systems like Apache Storm that are designed to process unbounded streams of data. For many real-time scenarios, Storm is a more appropriate fit than Hadoop/MapReduce.

I suspect in the case of bitswap, a streaming model may be a better match for the referent system.

  1 for { // worker sends messages as long as there exist messages in Outbox
  2     select { 
  3         case msg := <-bs.decider.NextMessage: // decider has a PQ/heap of messages to go out
  4                   sender.SendMessage(msg) // sender/network rate limits this operation
  5         case <-ctx.Done():
  6     }
  7 }

@whyrusleeping
Copy link
Member

Yeah, I think a system like this would work nicely. Do we have the requred backpressure to prevent this from clogging the system?

@whyrusleeping
Copy link
Member

Taking @maybebtc's suggestions into account, i have started brainstorming a better way for deciding what bitswap should do at any given moment. Im planning on building a priority queue of sorts, where higher priority blocks (based on some heuristic) are at the front of the list. This queue will only contain blocks we have available locally, and will be added to whenever we receive a wantlist, or whenever we receive new blocks. I havent figured out the sorting heuristic yet, as basing it off of the peers debtRatio would require constant resorting, and basing it off of the wanted blocks priority would encourage people to game the system (unless priorities were normalized). But my idea (once a heuristic is determined) is that to select the next item, a random number is chosen on an inversely exponential distribution, such that blocks near the front of the list are more likely to be selected.

@jbenet, thoughts?

@btc
Copy link
Contributor

btc commented Dec 15, 2014

At this stage in the release cycle, I'd pursue the absolute simplest thing that could possibly work.

I wouldn't worry about the debt ratio at this stage. Assume nice strategy.

From where I sit, it would appear that bitswap MVP priorities are simplicity and throughput.

@whyrusleeping
Copy link
Member

Okay, so absolute simplest would just be a heuristic of H(x) = 1. Makes things pretty easy.

@btc btc assigned btc and unassigned whyrusleeping Dec 17, 2014
@btc btc assigned whyrusleeping and unassigned btc Dec 18, 2014
@jbenet jbenet closed this as completed Jan 5, 2015
@jbenet jbenet removed the status/in-progress In progress label Jan 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants