[Draft] Bitswap Protocol Extensions #186

dirkmc · 2019-08-21T23:42:06Z

This proposal extends the Bitswap Protocol:

a node can ask peers to indicate whether or not they have a block
responding peers inform the requestor of the size of their outstanding queue

In #167 we discuss improvements to Bitswap without changing the protocol

Background

Request Patterns

Request patterns broadly fit into a couple of common use cases.

Single File

A new node wants all the information that many other nodes already have. The new node requests

The block representing the root of a DAG
The blocks representing its children
The blocks representing the children's children, etc

1.           o
           / | \
2.        o  o  o
         /|  |  |\
3.      o o  o  o o

Web Page / Package Manager

For example, a user clicks on a link to <cid>/en/InterestingTopic.

The local node requests:

<cid> (1 block)
<cid>/en (1 block)
<cid>/en/InterestingTopic (1 block)
The blocks that make up the contents of <cid>/en/InterestingTopic (many blocks)

1.          o
           /
2.        o
           \
3.          o
           /|\
4.        o o o

For use cases with high fan-out (eg Wikipedia), typically

a few nodes have all the information
many nodes have
- the root and upper branches of the tree
- sub-trees with some of the content (eg the pages the user has browsed)

         Peer A     Peer B      Peer C
           o          o            o
         / | \       /|\          /|\
        o  o  o     o o o        o o o
       /|\             /|\         |  \
      o o o           o o o        o   o
         /|\            | |\      /|\
        o o o           o o o    o o o

For both of these typical request patterns it is important to fetch the initial blocks quickly as they contain links to blocks at deeper levels in the DAG.

Current Bitswap Implementation

Want list

In the current implementation of Bitswap, the local node sends WANT requests for blocks to its peers. Nodes maintain a "want list" for each peer they are connected to, ie the list of blocks that a peer has asked for. When a node receives a block that it wants, it sends a CANCEL message to all peers it had previously sent a WANT message to.

Discovery

Initially the Session discovers peers that have blocks it is interested in by

broadcasting WANTs to all peers that the local node is connected to
asking a provider interface (eg the DHT) for nodes that provide the block

When the provider interface returns peers the Session adds them to an "unoptimized" peers list.
When peers respond to WANT requests the Session adds them to an "optimized" peers list, ordered by latency (see #166 and #165 for more detail on latency tracking).

Peer Selection

Once some peers have been discovered, subsequent requests are split across the list of peers. The Session maintains a count of how many unique blocks and duplicate blocks it receives (duplicate blocks are those that are received more than once). It uses the unique / duplicate ratio to adjust the split factor, which determines how many peers each WANT request will be sent to (see #167 and #165 for more detail on splitting). The Session adjusts the split factor up and down to try to maintain a balance between receiving the data it wants quickly while trying to minimize duplicates.

For example if there are 8 peers (A - H) and 10 CIDs (0 - 9), and the split factor is 3:

cid0 cid3 cid6 cid9  ->  A D G
cid1 cid4 cid7       ->  B E H
cid2 cid5 cid8       ->  C F

Rate limiting

The Session limits the "live" want list to 32 WANTs (a "live" want is a request that has been sent but a response has not yet been received).

Bitswap Protocol Extensions

The goals are:

Limit duplicate blocks being sent to the same peer
Reduces load on the network and frees up bandwidth
Maximize bandwidth
"Keep peers busy", ie try to ensure that peers always have data to send to us
Limit want requests per-peer (rather than per-session)
Rate limiting per-peer more effectively manages peers' load (see Improving Latency Measurement #166)
Treat wants as a stream
So that they can be processed one-by-one (rather than in groups - see Improve Request Sharding #167)
Move towards a throughput model
(rather than tracking latency - see Improving Latency Measurement #166)

HAVE / DONT_HAVE

In the current implementation the local node sends a WANT message to request a block. However in some cases it's advantageous to know whether a peer has a block but not to request the block data itself. This allows a node to build up knowledge about the distribution of blocks amongst its peers, without requesting a lot of duplicate data.

This proposal extends the WANT message with two flags:

want_have
When the peer receives the block, it sends the local node a HAVE message instead of the block. If the block is small enough it just sends the block anyway.
send_dont_have
If the peer does not have the block, it immediately sends DONT_HAVE.

Note that both these flags can be set (they are not exclusive).

When a node is in the discovery phase, it broadcasts a request to all connected peers. At this stage it is only interested in HAVE (it is not interested in blocks or DONT_HAVE) eg:
WANT CID1 (want_have=true send_dont_have=false) -> <All peers>

When the node starts retrieving groups of blocks, it splits the requests for blocks in the group across peers. The node asks each peer for some of the blocks, and sets want_have=true and send_dont_have=true for the remaining blocks.

For example:

  Wants  Local     Remote Peers
         Peer      A  B  C  D
           o       o  o  o  o

    ▫                                 1. Session.GetBlocks(`<root cid>`)

           ✓  ---> A                  2. Session broadcasts WANT with `want_have` ✓
              ------> B                  flag to all peers
              ---------> C
              ------------> D

              <- ✓ A                  3. Peer A responds with HAVE ✓

           ▫  ---> A                  4. Session requests block ▫ from peer A

              <--- ✓ B                5. Peers B & C respond with HAVE ✓
              <------ ✓ C

              <- ▧ A                  6. Peer A responds with block ▧

  ▫▫▫▫                                7. Session.GetBlocks(`<root children cids>`)
  ▫▫▫▫

        ▫▫▫✤  ---> A                  8. Session splits WANTs across peers, requesting
        ✤✤✤✤                             either
                                         ▫ a block
                                         ✤ `want_have=true send_dont_have=true`
        ✤✤✤▫  ------> B
        ▫▫✤✤

        ✤✤✤✤  ---------> C
        ✤✤▫▫

              <--- A                  9. Peers send response for each WANT with
                   ▧▧✗✓                  ▧ Block
                   ✓✓✗✓                  ✓ HAVE
                                         ✗ DONT_HAVE
              <----- B
                     ✗✓✓▧
                     ▧▧✓✓

              <-------- C
                        ✗✓✓▧
                        ▧▧✓✓

A WANT with want_have=false (a request for a block) supersedes any previous WANT with want_have=true.

Outstanding Queue Size

When the local node sends a request to a peer, the peer's response includes the number of outstanding blocks and their total size in bytes. For example:

The local node requests CIDs 1, 2, 3, 4 from peer A
The response from peer A has
- blocks 1 and 2 (blocks 3 and 4 are too big to fit into the message)
- the number of outstanding blocks (in this case 2: block 3 and block 4)
- the size of outstanding blocks in bytes (size of block 3 + size of block 4)

This allows the local node to choose which peers to send requests to so as to

keep all peers busy (maximize bandwidth)
back off if a peer is overloaded

Implementation Improvements

Discovery

Follow the current model:

broadcast WANT for initial CIDs to all connected peers
search for providers of first CID (using eg DHT)
periodically search for providers of random CID (using eg DHT)

Keep peers busy

As the session discovers peers it moves them into a candidate peer list. The session sends each new want to the candidate peer with the least items in its live want queue. The live want queue size has a per-peer limit that varies according to how much data the peer is waiting to send to the local node (outstanding data):

no outstanding data: increase limit
a lot of outstanding data: decrease limit

Bitswap tries to keep the outstanding data just above zero using the queue size and a moving average of variance in the size of outstanding data.

For example:

                     size   limit
Peer A  [▫▫  ]         2      4
Peer B  [▫▫▫     ]     3      8
Peer C  [▫▫▫▫  ]       4      6

The next WANT would be sent to Peer A as it has the smallest queue size (2 free spaces).

This allows Bitswap to send WANTs to the peers with the highest throughput, while responding to back pressure.

Streaming Wants

The current implementation processes wants in groups (see #167). However when bandwidth is limited we can only send individual wants or small groups of wants, so in this proposal we move towards processing wants individually (as a stream).

Peer Selection

For each want CID, the session selects a peer by:

First try to select a peer that either
- sent the local node a HAVE for the CID
- is a provider for the CID
Otherwise select a peer probabilistically according to the ratio of HAVEs / DONT_HAVEs it has sent to the local node
Otherwise select the peer with the shortest queue

Want Potential

In order to determine how many peers to send a WANT to, each peer / want combination is assigned a probability called the "want potential". Wants are sent in order of want potential, then FIFO.

The session sends wants to multiple peers until the "want potential" is over a threshold. The threshold varies according to the ratio of unique / duplicate blocks received by the session (so as to tradeoff a high chance of receiving a block quickly vs too much duplicate data)

The want potential changes as the local node receives messages from peers.

For example:

CID1 has a want potential of
- 0.8 for peer A
- 0.2 for peer B
- 0.1 for peer C
The threshold value is 0.9

The session sends
WANT CID1 -> Peer A
WANT CID1 -> Peer B
want potential = 0.8 + 0.2 = 1.0
Peer A sends DONT_HAVE CID1
want potential = 1.0 - 0.8 = 0.2
The session sends
WANT CID1 -> Peer C
want potential = 0.2 + 0.1 = 0.3
Peer B sends DONT_HAVE CID1
want potential = 0.3 - 0.2 = 0.1
Peer C sends Block CID1
want potential = ∞

Piggybacking

The session piggybacks informational requests onto WANTS it sends to peers.
For example if there are 5 CIDs waiting to be sent out, and the session sends WANTs for CID1 and CID3 to Peer A, the session will include WANTs with the want_have and send_dont_have flags for CID2, CID4 and CID5:

CID:       12345
Request:   ▫✤▫✤✤  ->  Peer A
Where
  ▫ is a WANT for a block
  ✤ is a WANT with `want_have=true, send_dont_have=true`

The text was updated successfully, but these errors were encountered:

hannahhoward · 2019-08-27T00:50:00Z

Something to consider - how does WantHave/SentDontHave work in the context of the PeerTaskQueue / load balancing on the responder side. Currently the responder does check if it has the block before the load balancing in the PeerTaskQueue occurs (https://github.com/ipfs/go-bitswap/blob/master/decision/engine.go#L284) -- but I wonder if the entire processing should occur before load balancing occurs? i.e. is the a solid DOS attack by just sending a ton of want requests that are want_have=true / send_dont_have=true?

hannahhoward · 2019-08-27T00:54:40Z

Another thing to consider -- ipfs/specs#201

I still would love it if this PR/spec idea got some love. If you're going to be relying on a more detailed conversation to happening between peers, I think you need a way to verify everyone is hearing each other :)

Currently only wantlist deltas are sent, but there is no error correction, so if messages get lost or a peer hangs up, the only way to recover is to wait for a periodic wantlist rebroadcast (every 30 seconds I think -- see https://github.com/ipfs/go-bitswap/blob/master/messagequeue/messagequeue.go#L18).

Seems like the already existing problems with this approach might be amplified by increasing the chattiness of bitswap.

hannahhoward · 2019-08-27T00:57:18Z

Generally, this makes a ton of sense and I think it's a great approach!

Keep in mind there are lots of folks running legacy bitswap versions out there so make sure you don't put all your eggs in the basket of assuming the other peers can support this.

dirkmc · 2019-08-27T13:19:23Z

Those are good points, thanks Hannah.

I agree that we need to take precautions against DoS - maybe a per-client rate limit on the serving node.

With respect to recovering from disconnects, would it be sufficient to resend the whole wantlist on reconnect?

alanshaw · 2019-08-29T10:30:39Z

Please can you augment the README with the background and current implementation information? There's already a good description but diagrams help a lot.

dirkmc · 2019-08-29T12:18:37Z

Agreed, I'm planning to improve the documentation to explain how Bitswap works in detail, as I mostly had to work it out from reading through the code :)

There are some issues about adding better documentation for Bitswap:
ipfs/boxo#109 ipfs/boxo#110 ipfs/boxo#111 #153
I will write docs and close these out once we've settled on an implementation of the protocol extensions discussed in this issue.

This commit extends the bitswap protocol with two additional wantlist properties: * WANT_HAVE/HAVE: Instead of asking for a block, a node can specify that they want to know if any peers "have" the block. * WANT_HAVE_NOT/HAVE_NOT: Instead of waiting for a timeout, a node can explicitly request to be told immediately if their peers don't currently have the given block. Additionally, nodes now tell their peers how much data they have queued to send them when sending messages. This allows peers to better distribute requests, keeping all peers busy but not overloaded. Changes in this PR are described in: #186

dirkmc · 2020-03-04T18:26:07Z

Implemented as part of ipfs/kubo#6782

This commit extends the bitswap protocol with two additional wantlist properties: * WANT_HAVE/HAVE: Instead of asking for a block, a node can specify that they want to know if any peers "have" the block. * WANT_HAVE_NOT/HAVE_NOT: Instead of waiting for a timeout, a node can explicitly request to be told immediately if their peers don't currently have the given block. Additionally, nodes now tell their peers how much data they have queued to send them when sending messages. This allows peers to better distribute requests, keeping all peers busy but not overloaded. Changes in this PR are described in: ipfs/go-bitswap#186 This commit was moved from ipfs/go-bitswap@b3a47bc

This comment has been minimized.

Sign in to view

dirkmc mentioned this issue Aug 26, 2019

Improve bitswap performance ipfs-inactive/package-managers#85

Open

olizilla mentioned this issue Aug 27, 2019

🙌 IPFS Weekly Call + Contributor Office Hours 📡 💫 2019-09-02 ipfs/community#434

Closed

dirkmc mentioned this issue Aug 29, 2019

PoC of Bitswap protocol extensions implementation #189

Merged

5 tasks

aschmahmann mentioned this issue Oct 6, 2019

Question: evaluating message metadata for routing decisions libp2p/go-libp2p-pubsub#206

Open

dirkmc closed this as completed Mar 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Bitswap Protocol Extensions #186

[Draft] Bitswap Protocol Extensions #186

dirkmc commented Aug 21, 2019 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

hannahhoward commented Aug 27, 2019

hannahhoward commented Aug 27, 2019 •

edited

Loading

hannahhoward commented Aug 27, 2019

dirkmc commented Aug 27, 2019

alanshaw commented Aug 29, 2019

dirkmc commented Aug 29, 2019 •

edited

Loading

dirkmc commented Mar 4, 2020

[Draft] Bitswap Protocol Extensions #186

[Draft] Bitswap Protocol Extensions #186

Comments

dirkmc commented Aug 21, 2019 • edited Loading

Background

Request Patterns

Single File

Web Page / Package Manager

Current Bitswap Implementation

Want list

Discovery

Peer Selection

Rate limiting

Bitswap Protocol Extensions

HAVE / DONT_HAVE

Outstanding Queue Size

Implementation Improvements

Discovery

Keep peers busy

Streaming Wants

Peer Selection

Want Potential

Piggybacking

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

hannahhoward commented Aug 27, 2019

hannahhoward commented Aug 27, 2019 • edited Loading

hannahhoward commented Aug 27, 2019

dirkmc commented Aug 27, 2019

alanshaw commented Aug 29, 2019

dirkmc commented Aug 29, 2019 • edited Loading

dirkmc commented Mar 4, 2020

dirkmc commented Aug 21, 2019 •

edited

Loading

hannahhoward commented Aug 27, 2019 •

edited

Loading

dirkmc commented Aug 29, 2019 •

edited

Loading