Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes/clarifications to bip-330. #899

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 49 additions & 64 deletions bip-0330.mediawiki
Original file line number Diff line number Diff line change
Expand Up @@ -104,24 +104,18 @@ def create_sketch(shortids, capacity):

The [https://github.com/sipa/minisketch/ minisketch] library implements the construction, merging, and decoding of these sketches efficiently.

====Truncated transaction IDs====

For announcing and relaying transaction outside of reconciliation, we need an unambiguous, unsalted way to refer to transactions to deduplicate transaction requests. As we're introducing a new scheme anyway, this is a good opportunity to switch to wtxid-based requests rather than txid-based ones. While using full 256-bit wtxids is possible, this is overkill as they contribute significantly to the total bandwidth as well. Instead, we truncate the wtxid to just their first 128 bits. These are referred to as truncated IDs.

===Intended Protocol Flow===

Set reconciliation primarily consists of the transmission and decoding of a reconciliation set sketch upon request.

[[File:bip-0330/recon_scheme_merged.png|framed|center|Set reconciliation protocol flow]]

====Bisection====
[[File:bip-0330/recon_scheme_merged.png|framed|center|Protocol flow]]

If a node is unable to reconstruct the set difference from the received sketch, the node then makes an additional reconciliation request, similar to the initial one, but this request is applied to only a fraction of possible transactions (e.g., in the range 0x0–0x8). Because of the linearity of sketches, a sketch of a subset of transactions would allow the node to compute a sketch for the remainder, which saves bandwidth.
====Sketch extension====

[[File:bip-0330/bisection.png|framed|300px|center|Bisection]]
If a node is unable to reconstruct the set difference from the received sketch, the node then makes a request for sketch extension. The peer would then send an extension, which is a sketch of a higher capacity (allowing to decode more differences) over the same transactions minus the part which was already sent initially (to save bandwidth). This is possible because extending a sketch is just concatenating new elements to an array.

===New messages===
Several new protocol messages are added: sendrecon, reqreconcil, sketch, reqbisec, reconcildiff, invtx, gettx. This section describes their serialization, contents, and semantics.
Several new protocol messages are added: sendrecon, reqreconcil, sketch, reqsketchext, reconcildiff. This section describes their serialization, contents, and semantics.

In what follows, all integers are serialized in little-endian byte order. Boolean values are encoded as a single byte that must be 0 or 1 exactly. Arrays are serialized with the CompactSize prefix that encodes their length, as is common in other P2P messages.

Expand Down Expand Up @@ -151,7 +145,7 @@ The reqreconcil message initiates a reconciliation round.
|-
| uint16 || set_size || Size of the sender's reconciliation set, used to estimate set difference.
|-
| uint8 || q || Coefficient used to estimate set difference. Multiplied by PRECISION=2^6 and rounded up by the sender and divided by PRECISION by the receiver.
| uint16 || q || Coefficient used to estimate set difference. Multiplied by PRECISION=2^12 and rounded up by the sender and divided by PRECISION by the receiver.
|}

Upon receipt of a "reqreconcil" message, the receiver:
Expand All @@ -162,8 +156,8 @@ It is suggested to use ''c=1'' to avoid sending empty sketches and reduce the ov

Intuitively, ''q'' represents the discrepancy in sets: the closer the sets are, the lower optimal ''q'' is.
As suggested by Erlay, ''q'' should be derived as an optimal ''q'' value for the previous reconciliation with a given peer, once the actual set sizes and set difference are known. Alternatively, ''q=0.1'' should be used as a default value.
For example, if in previous round ''set_size=30'' and ''local_set_size=20'', and the *actual* difference was ''4'', then a node should compute ''q'' as following:
''q=(|30-20| - 1) / (30+20)=0.18''
For example, if in previous round ''set_size=30'' and ''local_set_size=20'', and the *actual* difference was ''12'', then a node should compute ''q'' as following:
''q=(12 - |30-20|) / (30+20)=0.04''
The derivation of ''q'' can be changed according to the version of the protocol.

No new "reqreconcil" message can be sent until a "reconcildiff" message is sent.
Expand All @@ -177,21 +171,29 @@ The sketch message is used to communicate a sketch required to perform set recon
| byte[] || skdata || The sketch of the sender's reconciliation snapshot
|}

Upon receipt of a "sketch" message, a node computes the set difference by combining the receiver sketch with a sketch computed locally for a corresponding reconciliation set. If this is the 2nd time for this round a "sketch" message was received, the bisection approach is used, and by combining the new sketch with the previous one, two difference sketches are obtained, one for the first half and one for the second half of the short id range. The receiving node then tries to decode this sketch (or sketches), and based on the result:
* If decoding fails, a "reconcildiff" message is sent with the failure flag set (success=false). If this was the first "sketch" in the round, a "reqbisec" message may be sent instead.
* If decoding succeeds, a "reconcildiff" message is sent with the truncated IDs of all locally known transactions that appear in the decode result, and the short IDs of the unrecognized ones.
The sketch message may be received in two cases.

The receiver also makes snapshot of their current reconciliation set, and clears the set itself. The snapshot is kept until a "reconcildiff" message is sent by the node.
1. Initial sketch. Upon receipt of a "sketch" message, a node computes the difference sketch by combining the received sketch with a sketch computed locally for a corresponding reconciliation set. The receiving node then tries to decode the difference sketch and based on the result:
* If the decoding failed, the receiving node requests an extension sketch by sending a "reqsketchext" message. Alternatively, the node may terminate the reconciliation right away by sending a "reconcildiff" message is sent with the failure flag set (success=false).
* If the decoding succeeded, a "reconcildiff" message with success=true.
The receiver also makes snapshot of their current reconciliation set, and clears the set itself. The snapshot is kept until a "reconcildiff" message is sent by the node. It is needed to enable sketch extension.

====reqbisec====
The reqbisec message is used to signal that set reconciliation has failed and an extra sketch is needed to find set difference.
2. Sketch extension. By combining the sketch extension with the initially received sketch, an extended sketch is obtained. The receiving node then computes the extended difference sketch by combining the received extended sketch with an extended sketch computed locally over a corresponding reconciliation set snapshot. The receiving node then tries to decode the extended difference sketch and based on the result:
* If the decoding failed, the receiving node terminates the reconciliation right away by sending a "reconcildiff" message is sent with the failure flag set (success=false).
* If the decoding succeeded, a "reconcildiff" message with success=true.

In either cases, a "reconcildiff" with success=false should also be accompanied with announcing all transactions from the reconciliation set (or set snapshot if failed after extension) as a fallback to flooding.
A "reconcildiff" with success=true should *contain* unknown short IDs of the transactions from the decoded difference, corresponding to the transactions missing on the sender's side. Known short IDs from the difference correspond to what the receiver of the message is missing, and they should be announced via an "inv" message.

====reqsketchext====
The reqsketchext message is used by reconciliation initiator to signal that initial set reconciliation has failed and a sketch extension is needed to find set difference.

It has an empty payload.

Upon receipt of a "reqbisec" message, a node responds to it with a "sketch" message, which contains a sketch of a subset of corresponding reconciliation set <b>snapshot</b> (stored when "reqreconcil" message for the current round was processed) (values in range ''[0..(2^31)]'').
Upon receipt of a "reqsketchext" message, a node responds to it with a "sketch" message, which contains a sketch extension: a sketch (of the same transactions sketched initially) of higher capacity without the part sent initially.

====reconcildiff====
The reconcildiff message is used to announce transactions which are found to be missing during set reconciliation on the sender's side.
The reconcildiff message is used by reconciliation initiator to announce transactions which are found to be missing during set reconciliation on the sender's side.

{|class="wikitable"
! Data type !! Name !! Description
Expand All @@ -201,57 +203,36 @@ The reconcildiff message is used to announce transactions which are found to be
| uint32[] || ask_shortids || The short IDs that the sender did not have.
|}

Upon receipt a "reconcildiff" message with ''success=1'', a node sends a "invtx" message for the transactions requested by 32-bit IDs (first vector) containing their 128-bit truncated IDs (with parent transactions occuring before their dependencies), and can request announced transactions (second vector) it does not have via a "gettx" message.
Otherwise if ''success=0'', receiver should request bisection via ''reqbisec'' (if failure happened for the first time).
If failure happened for the second time, receiver should announce the transactions from the reconciliation set via an "invtx" message, excluding the transactions announced from the sender.
Upon receipt a "reconcildiff" message with ''success=1'' (reconciliation success), a node sends an "inv" message for the transactions requested by 32-bit IDs (first vector) containing their wtxids (with parent transactions occuring before their dependencies).
If ''success=0'' (reconciliation failure), receiver should announce the transactions from the reconciliation set via an "inv" message.
In both cases, transactions the sender of the message thinks the receiver is missing are announced via an "inv" message.
The regular "inv" deduplication should apply.

The <b>snapshot</b> of the corresponding reconciliation set is cleared by the sender and the receiver of the message.

The sender should also send their own "invtx" message along with the reconcildiff message to announce transactions which are missing on the receiver's side.

====invtx====
The invtx message is used to announce transactions (both along with reconcildiff message and as a response to the reconcildiff message). It is the truncated ID analogue of "inv" (which cannot be used because it has 256-bit elements).

{|class="wikitable"
! Data type !! Name !! Description
|-
| uint128[] || inv_truncids || The truncated IDs of transactions the sender believes the receiver does not have.
|}

Upon receipt a "invtx" message, a node requests announced transactions it does not have.
The <b>snapshot</b> of the corresponding reconciliation set is cleared by the sender of the message.

====gettx====
The gettx message is used to request transactions by 128-bit truncated IDs. It is the truncated ID analogue of "getdata".

{|class="wikitable"
! Data type !! Name !! Description
|-
| uint128[] || ask_truncids || The truncated IDs of transactions the sender wants the full transaction data for.
|}

Upon receipt a "gettx" message, a node sends "tx" messages for the requested transactions.
The sender should also send their own "inv" message along with the reconcildiff message to announce transactions which are missing on the receiver's side.

==Local state==

This BIP suggests a stateful protocol and it requires storing several variables at every node to operate properly.

====Reconciliation sets====
Every node stores a set of 128-bit truncated IDs for every peer which supports transaction reconciliation, representing the transactions which would have been sent according to the regular flooding protocol.
Every node stores a set of wtxids for every peer which supports transaction reconciliation, representing the transactions which would have been sent according to the regular flooding protocol.
Incoming transactions are added to sets when those transactions are received (if they satisfy the policies such as minimum fee set by a peer).
A reconciliation set is moved to the corresponding set snapshot after the transmission of the initial sketch.

====Reconciliation set snapshot====
After the transmitting of the initial sketch (either sending or receiving of reconcildiff message), every node should store the snapshot of the current reconciliation set, and clear the set.
This is important to make bisection more stable during the reconciliation round (bisection should be applied to the snapshot).
The snapshot is also used to efficiently lookup the transactions requested by short ID.
After transmitting the initial sketch (either sending or receiving of the reconcildiff message), every node should store the snapshot of the current reconciliation set, and clear the set.
This is important to make sketch extension more stable (extension should be computed over the set snapshot). Otherwise, extension would contain transactions received after sending out the initial sketch.
The snapshot is cleared after the end of the reconciliation round (sending or receiving of the reconcildiff message).

====q-coefficient====
The q value should be stored to make efficient difference estimation. It is shared across peers and changed after every reconciliation.
q-coefficient represents the discrepancy in sets: the closer the sets are, the lower optimal ''q'' is.
In future implementations, q could vary across different peers or become static.

====Reconciliation salt====
When negotiating reconciliation support, peers send each other their contribution to the reconciliation salt (see how we construct short IDs above). These salts (or just the resulting salt) should be stored on both sides of the connection.

==Backward compatibility==

Expand All @@ -261,32 +242,36 @@ Clients which do not implement this protocol remain fully compatible after this

==Rationale==

====Why using PinSketch for set reconciliation?====
====Why use PinSketch for set reconciliation?====

PinSketch is more bandwidth efficient than IBLT, especially for the small differences in sets we expect to operate over.
PinSketch is as bandwidth efficient as CPISync, but PinSketch has quadratic decoding complexity, while CPISync have cubic decoding complexity. This makes PinSketch significantly faster.

====Why using 32-bit short transaction IDs?====
====Why use 32-bit short transaction IDs?====

To use Minisketch in practice, transaction IDs should be shortened (ideally, not more than 64 bits per element).
Small number of bits per transaction also allows to save extra bandwidth and make operations over sketches faster.
A small number of bits per transaction also allows saving extra bandwidth and make operations over sketches faster.
According to our estimates, 32 bits provides low collision rate in a non-adversarial model (which is enabled by using independent salts per-link).

====Why using 128-bit short IDs?====
====Why use sketch extensions instead of bisection?====

Bisection is an alternative to sketch extensions, per which a second sketch with the same initial capacity is computed over half of the txID space.
Due to the linearity of sketches, transmitting just this one allows a reconciliation requestor to compute the sketch of the same capacity of another half. Two sketches allow a requestor to reconstruct twice as many differences as was allowed by an initial sketch.

In practice this allows a requestor to amortize the bandwidth overhead of initial reconciliation failure, similarly to extension sketches, making the overhead negligible.

The main benefit of sketch extensions is a much simpler implementation. Implementing bisection is hard (see [https://github.com/naumenkogs/bitcoin/commit/b5c92a41e4cc0599504cf838d20212f1a403e573 implementation]) because, in the end, we have to operate with two sketches and handle scenarios where one sketch decoded and another sketch failed.

To avoid problems caused by the delays in the network, our protocol requires extra round of announcing unsalted transaction IDs. [https://arxiv.org/pdf/1905.10518.pdf Erlay] protocol on top of this work also requires announcing unsalted transaction IDs for flooding.
Both of these measures allow to deduplicate transaction announcements across the peers.
However, using full 256-bit IDs to uniquely identify transactions seems to be an overkill.
128 is the highest power of 2 which provides good enough collision-resistance in an adversarial model, and trivially saves a significant portion of the bandwidth related to these announcements.
It becomes even more difficult if in the future we decide to allow more than one extension/bisection. Bisection in this case have to be recursive (and spawn 4/8/16/... sketches), while for extensions we always end up with one extended sketch.

====Why using bisection instead of extending the sketch?====
Sketch extensions are also more flexible: extending a sketch of capacity 10 with 4 more means just computing a sketch of capacity 14 and sending the extension, while for bisection increasing the capacity to something different than 10*2/10*4/10*8/... is sophisticated implementation-wise.

Unlike extended sketches, bisection does not require operating over sketches of higher order.
This allows to avoid the high computational cost caused by quadratic decoding complexity.
The only advantage of bisection is that it doesn't require computing sketches of higher capacities (exponential cost). We believe that since
the protocol is currently designed to operate in the conditions where sketches usually have at most the capacity of 20, this efficiency is not crucial.

==Implementation==

TODO
https://github.com/bitcoin/bitcoin/pull/18261

==Acknowledgments==

Expand Down
Binary file removed bip-0330/bisection.png
Binary file not shown.
Binary file modified bip-0330/recon_scheme_merged.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.