From 465cd03fb311a28cff1db914af8c69a6e998074c Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Mon, 10 Aug 2020 14:56:22 -0700 Subject: [PATCH 1/3] Add async script RFC --- book/src/SUMMARY.md | 1 + .../XXXX-asynchronous-script-verification.md | 231 ++++++++++++++++++ 2 files changed, 232 insertions(+) create mode 100644 book/src/dev/rfcs/XXXX-asynchronous-script-verification.md diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index e2e01e3cccd..5d211e355c1 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -12,6 +12,7 @@ - [Zebra RFCs](dev/rfcs.md) - [RFC Template](dev/rfcs/0000-template.md) - [Pipelinable Block Lookup](dev/rfcs/0001-pipelinable-block-lookup.md) + - [Asynchronous Script Verification](dev/rfcs/XXXX-asynchronous-script-verification.md) - [Diagrams](dev/diagrams.md) - [Network Architecture](dev/diagrams/zebra-network.md) - [zebra-checkpoints](dev/zebra-checkpoints.md) diff --git a/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md b/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md new file mode 100644 index 00000000000..7a5019e840e --- /dev/null +++ b/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md @@ -0,0 +1,231 @@ +- Start Date: 2020-08-10 +- Design PR: [ZcashFoundation/zebra#0000](https://github.com/ZcashFoundation/zebra/pull/0000) +- Zebra Issue: [ZcashFoundation/zebra#0000](https://github.com/ZcashFoundation/zebra/issues/0000) + +# Summary +[summary]: #summary + +This RFC describes an architecture for asynchronous script verification and +its interaction with the state layer. This architecture imposes constraints +on the ordering of operations in the state layer. + +# Motivation +[motivation]: #motivation + +As in the rest of Zebra, we want to express our work as a collection of +work-items with explicit dependencies, then execute these items concurrently +and in parallel on a thread pool. + +# Definitions +[definitions]: #definitions + +- *UTXO*: unspent transaction output. Transaction outputs are modeled in `zebra-chain` by the [`TransparentOutput`][transout] structure. +- Transaction input: an output of a previous transaction consumed by a later transaction (the one it is an input to). Modeled in `zebra-chain` by the [`TransparentInput`][transin] structure. +- lock script: the script that defines the conditions under which some UTXO can be spent. Stored in the [`TransparentOutput::lock_script`][lock_script] field. +- unlock script: a script satisfying the conditions of the lock script, allowing a UTXO to be spent. Stored in the [`TransparentInput::PrevOut::lock_script`][lock_script] field. + +[transout]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.TransparentOutput.html +[lock_script]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.TransparentOutput.html#structfield.lock_script +[transin]: https://doc.zebra.zfnd.org/zebra_chain/transaction/enum.TransparentInput.html +[unlock_script]: https://doc.zebra.zfnd.org/zebra_chain/transaction/enum.TransparentInput.html#variant.PrevOut.field.unlock_script + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Zcash's transparent address system is inherited from Bitcoin. Transactions +spend unspent transaction outputs (UTXOs) from previous transactions. These +UTXOs are encumbered by *locking scripts* that define the conditions under +which they can be spent, e.g., requiring a signature from a certain key. +Transactions wishing to spend UTXOs supply an *unlocking script* that should +satisfy the conditions of the locking script for each input they wish to +spend. + +This means that script verification requires access to data about previous +UTXOs, in order to determine the conditions under which those UTXOs can be +spent. In Zebra, we aim to run operations asychronously and out-of-order to +the greatest extent possible. For instance, we may begin verification of a +block before all of its ancestors have been verified or even downloaded. So +we need to design a mechanism that allows script verification to declare its +data dependencies and execute as soon as all required data is available. + +It's not necessary for this mechanism to ensure that the transaction outputs +remain unspent, only to give enough information to perform script +verification. Checking that all transaction inputs are actually unspent is +done later, at the point that its containing block is committed to the chain. + +At a high level, this adds a new request/response pair to the state service: + +- `Request::AwaitUtxo(OutPoint)` requests a `TransparentOutput` specified by `OutPoint` from the state layer; +- `Response::Utxo(TransparentOutput)` supplies requested the `TransparentOutput`. + +Note that this request is named differently from the other requests, +`AwaitUtxo` rather than `GetUtxo` or similar. This is because the request has +rather different behavior: the request does not complete until the state +service learns about a UTXO matching the request, which could be never. For +instance, if the transaction output was already spent, the service is not +required to return a response. The caller is responsible for using a timeout +layer or some other mechanism. + +This allows a script verifier to asynchronously obtain information about +previous transaction outputs and start verifying scripts as soon as the data +is available. For instance, if we begin parallel download and verification of +500 blocks, we should be able to begin script verification of all scripts +referencing outputs from existing blocks in parallel, and begin verification +of scripts referencing outputs from new blocks as soon as they are committed +to the chain. + +Because spending outputs from older blocks is more common than spending +outputs from recent blocks, this should allow a significant amount of +parallelism. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +We add a `Request::AwaitUtxo(OutPoint)` and +`Response::Utxo(TransparentOutput)` to the state protocol. As described +above, the request name is intended to indicate the request's behavior: the +request does not resolve until the state layer learns of a UTXO described by +the request. + +To verify scripts, a script verifier requests the relevant UTXOs from the +state service and waits for all of them to resolve, or fails verification +with a timeout error. Currently, we outsource script verification to +`zcash_consensus`, which does FFI into the same C++ code as `zcashd` uses. +**We need to ensure this code is thread-safe**. + +Implementing the state request correctly requires considering two sets of behaviors: + +1. behaviors related to the state's external API (a `Buffer`ed `tower::Service`); +2. behaviors related to the state's internal implementation (using `sled`). + +Making this distinction helps us to ensure we don't accidentally leak +"internal" behaviors into "external" behaviors, which would violate +encapsulation and make it more difficult to replace `sled`. + +In the first category, our state is presented to the rest of the application +as a `Buffer`ed `tower::Service`. The `Buffer` wrapper allows shared access +to a service using an actor model, moving the service to be shared into a +worker task and passing messages to it over an multi-producer single-consumer +(mpsc) channel. The worker task recieves messages and makes `Service::call`s. +The `Service::call` method returns a `Future`, and the service is allowed to +decide how much work it wants to do synchronously (in `call`) and how much +work it wants to do asynchronously (in the `Future` it returns). + +This means that our external API ensures that the state service sees a +linearized sequence of state requests, although the exact ordering is +unpredictable when there are multiple senders making requests. + +In the second category, the Sled API presents itself synchronously, but +database and tree handles are clonable and can be moved between threads. All +that's required to process some request asynchronously is to clone the +appropriate handle, move it into an async block, and make the call as part of +the future. (We might want to use Tokio's blocking API for this, but that's a +side detail). + +Because the state service has exclusive access to the sled database, and the +state service sees a linearized sequence of state requests, we have an easy +way to opt in to asynchronous database access. We can perform sled operations +synchronously in the `Service::call`, waiting for them to complete, and be +sure that all future requests will see the resulting sled state. Or, we can +perform sled operations asynchronously in the future returned by +`Service::call`. + +If we perform all *writes* synchronously and allow reads to be either +synchronous or asynchronous, we ensure that writes cannot race each other. +Asynchronous reads are guaranteed to read at least the state present at the +time the request was processed, or a later state. + +Now, returning to the UTXO lookup problem, we can map out the possible states +with this restriction in mind. This description assumes that UTXO storage is +split into disjoint sets, one in-memory (e.g., blocks after the reorg limit) +and the other in sled (e.g., blocks after the reorg limit). The details of +this storage are not important for this design, only that the two sets are +disjoint. + +When the state service processes a `Request::AwaitUtxo(OutPoint)` referencing +some UTXO `u`, there are three disjoint possibilities: + +1. `u` is already contained in an in-memory block storage; +2. `u` is already contained in the sled UTXO set; +3. `u` is not yet known to the state service. + +In case 3, we need to queue `u` and scan all *future* blocks to see whether +they contain `u`. However, if we have a mechanism to queue `u`, we can +perform check 2 asynchronously, because restricting to synchronous writes +means that any async read will return the current or later state. If `u` was +in the sled UTXO set when the request was processed, the only way that an +async read would not return `u` is if the UTXO were spent, in which case the +service is not required to return a response. + +This behavior can be encapsulated into a `PendingUtxos` +structure described below. + +```rust +// sketch +#[derive(Default, Debug)] +struct PendingUtxos(HashMap>); + +impl PendingUtxos { + // adds the outpoint and returns (wrapped) rx end of oneshot + // return can be converted to `Service::Future` + pub fn queue(&mut self, outpoint: OutPoint) -> impl Future>; + + // if outpoint is a hashmap key, remove the entry and send output on the channel + pub fn respond(&mut self, outpoint: OutPoint, output: TransparentOutput); + + // extracts a list of new outputs from the block and checks them against the + // hashmap keys + pub fn check_block(&mut self, block: &Block); + + // scans the hashmap and removes any entries with closed senders + pub fn prune(&mut self); +} +``` + +The state service should maintain an `Arc>`, used as follows: + +1. In `Service::call(Request::AwaitUtxo(u))`, the service should: + - call `PendingUtxos::queue(u)` to get a future `f` to return to the caller; + spawn a task that does a sled lookup for `u`, calling `PendingUtxos::respond(u, output)` if present; + - check the in-memory storage for `u`, calling `PendingUtxos::respond(u, output)` if present; + - return `f` to the caller (it may already be ready). + The common case is that `u` references an old UTXO, so spawning the lookup + task first means that we don't wait to check in-memory storage for `u` + before starting the sled lookup. + +2. In `Service::call(Request::CommitBlock(block, ..))`, the service should: + - call `PendingUtxos::check_block(block.as_ref())`; + - do any other transactional checks before committing a block as normal. + Because the `AwaitUtxo` request is informational, there's no need to do + the transactional checks before matching against pending UTXO requests, + and doing so upfront potentially notifies other tasks earlier. + +3. In `Service::poll_ready()`, the service should call + `PendingUtxos::prune()` at least *some* of the time. This is required because + when a consumer uses a timeout layer, the cancelled requests should be + flushed from the queue to avoid a resource leak. However, doing this on every + call will result in us spending a bunch of time iterating over the hashmap. + +# Drawbacks +[drawbacks]: #drawbacks + +One drawback of this design is that we may have to wait on a lock. However, +the critical section basically amounts to a hash lookup and a channel send, +so I don't think that we're likely to run into problems with long contended +periods, and it's unlikely that we would get a deadlock. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +High-level design rationale is inline with the design sketch. One low-level +option would be to avoid encapsulating behavior in the `PendingUtxos` and +just have an `Arc>`, so that the lock only protects the hashmap +lookup and not sending through the channel. But I think the current design is +cleaner and the cost is probably not too large. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- We need to pick a timeout for UTXO lookup. This should be long enough to +account for the fact that we may start verifying blocks before all of their +ancestors are downloaded. From 7194e1b5b0e3185b1eb7786f124fea1f09361537 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Tue, 11 Aug 2020 10:22:40 -0700 Subject: [PATCH 2/3] Update book/src/dev/rfcs/XXXX-asynchronous-script-verification.md Co-authored-by: Deirdre Connolly --- book/src/dev/rfcs/XXXX-asynchronous-script-verification.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md b/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md index 7a5019e840e..6fd281c27c4 100644 --- a/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md +++ b/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md @@ -106,7 +106,7 @@ In the first category, our state is presented to the rest of the application as a `Buffer`ed `tower::Service`. The `Buffer` wrapper allows shared access to a service using an actor model, moving the service to be shared into a worker task and passing messages to it over an multi-producer single-consumer -(mpsc) channel. The worker task recieves messages and makes `Service::call`s. +(mpsc) channel. The worker task receives messages and makes `Service::call`s. The `Service::call` method returns a `Future`, and the service is allowed to decide how much work it wants to do synchronously (in `call`) and how much work it wants to do asynchronously (in the `Future` it returns). From 2afc2087d7c6f5d15c75f769a2d37ff90c9fd036 Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Tue, 11 Aug 2020 11:56:20 -0700 Subject: [PATCH 3/3] Remove check_block Per discussion where @yaahc suggested that it would be simpler to delete this function entirely and treat it as an implementation detail. Co-authored-by: Jane Lusby --- book/src/dev/rfcs/XXXX-asynchronous-script-verification.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md b/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md index 6fd281c27c4..3f01557b5af 100644 --- a/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md +++ b/book/src/dev/rfcs/XXXX-asynchronous-script-verification.md @@ -173,9 +173,6 @@ impl PendingUtxos { // if outpoint is a hashmap key, remove the entry and send output on the channel pub fn respond(&mut self, outpoint: OutPoint, output: TransparentOutput); - // extracts a list of new outputs from the block and checks them against the - // hashmap keys - pub fn check_block(&mut self, block: &Block); // scans the hashmap and removes any entries with closed senders pub fn prune(&mut self);