Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add async script RFC #868

Merged
merged 3 commits into from
Aug 21, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- [Zebra RFCs](dev/rfcs.md)
- [RFC Template](dev/rfcs/0000-template.md)
- [Pipelinable Block Lookup](dev/rfcs/0001-pipelinable-block-lookup.md)
- [Asynchronous Script Verification](dev/rfcs/XXXX-asynchronous-script-verification.md)
- [Diagrams](dev/diagrams.md)
- [Network Architecture](dev/diagrams/zebra-network.md)
- [zebra-checkpoints](dev/zebra-checkpoints.md)
Expand Down
228 changes: 228 additions & 0 deletions book/src/dev/rfcs/XXXX-asynchronous-script-verification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
- Start Date: 2020-08-10
- Design PR: [ZcashFoundation/zebra#0000](https://github.com/ZcashFoundation/zebra/pull/0000)
- Zebra Issue: [ZcashFoundation/zebra#0000](https://github.com/ZcashFoundation/zebra/issues/0000)

# Summary
[summary]: #summary

This RFC describes an architecture for asynchronous script verification and
its interaction with the state layer. This architecture imposes constraints
on the ordering of operations in the state layer.

# Motivation
[motivation]: #motivation

As in the rest of Zebra, we want to express our work as a collection of
work-items with explicit dependencies, then execute these items concurrently
and in parallel on a thread pool.

# Definitions
[definitions]: #definitions

- *UTXO*: unspent transaction output. Transaction outputs are modeled in `zebra-chain` by the [`TransparentOutput`][transout] structure.
hdevalence marked this conversation as resolved.
Show resolved Hide resolved
- Transaction input: an output of a previous transaction consumed by a later transaction (the one it is an input to). Modeled in `zebra-chain` by the [`TransparentInput`][transin] structure.
yaahc marked this conversation as resolved.
Show resolved Hide resolved
- lock script: the script that defines the conditions under which some UTXO can be spent. Stored in the [`TransparentOutput::lock_script`][lock_script] field.
- unlock script: a script satisfying the conditions of the lock script, allowing a UTXO to be spent. Stored in the [`TransparentInput::PrevOut::lock_script`][lock_script] field.

[transout]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.TransparentOutput.html
[lock_script]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.TransparentOutput.html#structfield.lock_script
[transin]: https://doc.zebra.zfnd.org/zebra_chain/transaction/enum.TransparentInput.html
[unlock_script]: https://doc.zebra.zfnd.org/zebra_chain/transaction/enum.TransparentInput.html#variant.PrevOut.field.unlock_script

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Zcash's transparent address system is inherited from Bitcoin. Transactions
spend unspent transaction outputs (UTXOs) from previous transactions. These
UTXOs are encumbered by *locking scripts* that define the conditions under
which they can be spent, e.g., requiring a signature from a certain key.
Transactions wishing to spend UTXOs supply an *unlocking script* that should
satisfy the conditions of the locking script for each input they wish to
spend.

This means that script verification requires access to data about previous
UTXOs, in order to determine the conditions under which those UTXOs can be
spent. In Zebra, we aim to run operations asychronously and out-of-order to
the greatest extent possible. For instance, we may begin verification of a
block before all of its ancestors have been verified or even downloaded. So
we need to design a mechanism that allows script verification to declare its
data dependencies and execute as soon as all required data is available.

It's not necessary for this mechanism to ensure that the transaction outputs
remain unspent, only to give enough information to perform script
verification. Checking that all transaction inputs are actually unspent is
done later, at the point that its containing block is committed to the chain.
yaahc marked this conversation as resolved.
Show resolved Hide resolved

At a high level, this adds a new request/response pair to the state service:

- `Request::AwaitUtxo(OutPoint)` requests a `TransparentOutput` specified by `OutPoint` from the state layer;
hdevalence marked this conversation as resolved.
Show resolved Hide resolved
- `Response::Utxo(TransparentOutput)` supplies requested the `TransparentOutput`.

Note that this request is named differently from the other requests,
`AwaitUtxo` rather than `GetUtxo` or similar. This is because the request has
rather different behavior: the request does not complete until the state
service learns about a UTXO matching the request, which could be never. For
instance, if the transaction output was already spent, the service is not
hdevalence marked this conversation as resolved.
Show resolved Hide resolved
required to return a response. The caller is responsible for using a timeout
layer or some other mechanism.

This allows a script verifier to asynchronously obtain information about
previous transaction outputs and start verifying scripts as soon as the data
is available. For instance, if we begin parallel download and verification of
500 blocks, we should be able to begin script verification of all scripts
hdevalence marked this conversation as resolved.
Show resolved Hide resolved
referencing outputs from existing blocks in parallel, and begin verification
of scripts referencing outputs from new blocks as soon as they are committed
to the chain.

Because spending outputs from older blocks is more common than spending
yaahc marked this conversation as resolved.
Show resolved Hide resolved
outputs from recent blocks, this should allow a significant amount of
parallelism.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

We add a `Request::AwaitUtxo(OutPoint)` and
`Response::Utxo(TransparentOutput)` to the state protocol. As described
above, the request name is intended to indicate the request's behavior: the
request does not resolve until the state layer learns of a UTXO described by
the request.

To verify scripts, a script verifier requests the relevant UTXOs from the
state service and waits for all of them to resolve, or fails verification
with a timeout error. Currently, we outsource script verification to
`zcash_consensus`, which does FFI into the same C++ code as `zcashd` uses.
**We need to ensure this code is thread-safe**.
hdevalence marked this conversation as resolved.
Show resolved Hide resolved

Implementing the state request correctly requires considering two sets of behaviors:

1. behaviors related to the state's external API (a `Buffer`ed `tower::Service`);
2. behaviors related to the state's internal implementation (using `sled`).

Making this distinction helps us to ensure we don't accidentally leak
"internal" behaviors into "external" behaviors, which would violate
encapsulation and make it more difficult to replace `sled`.

In the first category, our state is presented to the rest of the application
as a `Buffer`ed `tower::Service`. The `Buffer` wrapper allows shared access
to a service using an actor model, moving the service to be shared into a
worker task and passing messages to it over an multi-producer single-consumer
(mpsc) channel. The worker task receives messages and makes `Service::call`s.
The `Service::call` method returns a `Future`, and the service is allowed to
decide how much work it wants to do synchronously (in `call`) and how much
work it wants to do asynchronously (in the `Future` it returns).

This means that our external API ensures that the state service sees a
linearized sequence of state requests, although the exact ordering is
unpredictable when there are multiple senders making requests.

In the second category, the Sled API presents itself synchronously, but
database and tree handles are clonable and can be moved between threads. All
that's required to process some request asynchronously is to clone the
appropriate handle, move it into an async block, and make the call as part of
the future. (We might want to use Tokio's blocking API for this, but that's a
side detail).

Because the state service has exclusive access to the sled database, and the
state service sees a linearized sequence of state requests, we have an easy
way to opt in to asynchronous database access. We can perform sled operations
synchronously in the `Service::call`, waiting for them to complete, and be
sure that all future requests will see the resulting sled state. Or, we can
perform sled operations asynchronously in the future returned by
`Service::call`.

If we perform all *writes* synchronously and allow reads to be either
hdevalence marked this conversation as resolved.
Show resolved Hide resolved
synchronous or asynchronous, we ensure that writes cannot race each other.
Asynchronous reads are guaranteed to read at least the state present at the
time the request was processed, or a later state.

Now, returning to the UTXO lookup problem, we can map out the possible states
with this restriction in mind. This description assumes that UTXO storage is
split into disjoint sets, one in-memory (e.g., blocks after the reorg limit)
and the other in sled (e.g., blocks after the reorg limit). The details of
this storage are not important for this design, only that the two sets are
disjoint.

When the state service processes a `Request::AwaitUtxo(OutPoint)` referencing
some UTXO `u`, there are three disjoint possibilities:

1. `u` is already contained in an in-memory block storage;
2. `u` is already contained in the sled UTXO set;
yaahc marked this conversation as resolved.
Show resolved Hide resolved
3. `u` is not yet known to the state service.

In case 3, we need to queue `u` and scan all *future* blocks to see whether
they contain `u`. However, if we have a mechanism to queue `u`, we can
perform check 2 asynchronously, because restricting to synchronous writes
means that any async read will return the current or later state. If `u` was
in the sled UTXO set when the request was processed, the only way that an
async read would not return `u` is if the UTXO were spent, in which case the
service is not required to return a response.
teor2345 marked this conversation as resolved.
Show resolved Hide resolved

This behavior can be encapsulated into a `PendingUtxos`
structure described below.

```rust
// sketch
#[derive(Default, Debug)]
struct PendingUtxos(HashMap<OutPoint, oneshot::Sender<TransparentOutput>>);

impl PendingUtxos {
// adds the outpoint and returns (wrapped) rx end of oneshot
// return can be converted to `Service::Future`
pub fn queue(&mut self, outpoint: OutPoint) -> impl Future<Output=Result<Response, ...>>;

// if outpoint is a hashmap key, remove the entry and send output on the channel
pub fn respond(&mut self, outpoint: OutPoint, output: TransparentOutput);


// scans the hashmap and removes any entries with closed senders
pub fn prune(&mut self);
}
```

The state service should maintain an `Arc<Mutex<PendingUtxos>>`, used as follows:

1. In `Service::call(Request::AwaitUtxo(u))`, the service should:
- call `PendingUtxos::queue(u)` to get a future `f` to return to the caller;
spawn a task that does a sled lookup for `u`, calling `PendingUtxos::respond(u, output)` if present;
- check the in-memory storage for `u`, calling `PendingUtxos::respond(u, output)` if present;
- return `f` to the caller (it may already be ready).
The common case is that `u` references an old UTXO, so spawning the lookup
task first means that we don't wait to check in-memory storage for `u`
before starting the sled lookup.

2. In `Service::call(Request::CommitBlock(block, ..))`, the service should:
- call `PendingUtxos::check_block(block.as_ref())`;
- do any other transactional checks before committing a block as normal.
Because the `AwaitUtxo` request is informational, there's no need to do
the transactional checks before matching against pending UTXO requests,
and doing so upfront potentially notifies other tasks earlier.

3. In `Service::poll_ready()`, the service should call
`PendingUtxos::prune()` at least *some* of the time. This is required because
when a consumer uses a timeout layer, the cancelled requests should be
flushed from the queue to avoid a resource leak. However, doing this on every
call will result in us spending a bunch of time iterating over the hashmap.

# Drawbacks
[drawbacks]: #drawbacks

One drawback of this design is that we may have to wait on a lock. However,
the critical section basically amounts to a hash lookup and a channel send,
so I don't think that we're likely to run into problems with long contended
periods, and it's unlikely that we would get a deadlock.
hdevalence marked this conversation as resolved.
Show resolved Hide resolved

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

High-level design rationale is inline with the design sketch. One low-level
option would be to avoid encapsulating behavior in the `PendingUtxos` and
just have an `Arc<Hashmap<..>>`, so that the lock only protects the hashmap
lookup and not sending through the channel. But I think the current design is
cleaner and the cost is probably not too large.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- We need to pick a timeout for UTXO lookup. This should be long enough to
account for the fact that we may start verifying blocks before all of their
hdevalence marked this conversation as resolved.
Show resolved Hide resolved
ancestors are downloaded.
teor2345 marked this conversation as resolved.
Show resolved Hide resolved