Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus: Add a checkpoint verifier stub #502

Merged
merged 6 commits into from
Jun 21, 2020

Conversation

teor2345
Copy link
Contributor

This stub only verifies blocks whose hashes are in the checkpoint
list.

It doesn't have tests, chain child verifies to their ancestors, or
support checkpoint maximum height queries.

Part of #429.

This stub only verifies blocks whose hashes are in the checkpoint
list.

It doesn't have tests, chain child verifies to their ancestors, or
support checkpoint maximum height queries.

Part of ZcashFoundation#429.
@teor2345 teor2345 added Poll::Ready C-design Category: Software design work A-rust Area: Updates to Rust code labels Jun 18, 2020
@teor2345 teor2345 requested a review from a team June 18, 2020 08:59
@teor2345 teor2345 self-assigned this Jun 18, 2020
@codecov
Copy link

codecov bot commented Jun 18, 2020

Codecov Report

Merging #502 into main will increase coverage by 0.06%.
The diff coverage is 46.21%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #502      +/-   ##
==========================================
+ Coverage   53.79%   53.85%   +0.06%     
==========================================
  Files          68       78      +10     
  Lines        3324     3784     +460     
==========================================
+ Hits         1788     2038     +250     
- Misses       1536     1746     +210     
Impacted Files Coverage Δ
tower-batch/src/error.rs 0.00% <0.00%> (ø)
tower-batch/src/future.rs 43.47% <0.00%> (ø)
tower-batch/src/layer.rs 0.00% <0.00%> (ø)
zebra-chain/src/lib.rs 50.00% <ø> (ø)
zebra-chain/src/merkle_tree.rs 0.00% <ø> (ø)
zebra-chain/src/note_commitment_tree.rs 0.00% <ø> (ø)
zebra-chain/src/notes/sapling.rs 74.54% <ø> (ø)
zebra-chain/src/notes/sprout.rs 75.00% <ø> (ø)
zebra-chain/src/proofs/bctv14.rs 68.18% <ø> (ø)
zebra-chain/src/proofs/groth16.rs 68.18% <ø> (ø)
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f8a336...87352a6. Read the comment docs.


// `state_service.call` is OK here because we already called
// `state_service.poll_ready` in our `poll_ready`.
let add_block = self.state_service.call(zebra_state::Request::AddBlock {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to call AddBlock here prior to doing validation? The future isn't actually awaited until after validation, but this is relying on state_service to not commit any data in the body of call rather than in the future, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. in_memory commits in the call, but on_disk commits in the Future.

There's a similar issue in BlockVerifier:
https://github.com/ZcashFoundation/zebra/pull/492/files#diff-7ea86ca07b5d02f05830fd0d71b593f1R59

My understanding is that we want to:

  • unconditionally call AddBlock, to avoid filling the buffer with slots reserved by poll_ready()
  • do expensive work in the async block
  • call AddBlock after the expensive work is done, to avoid committing state early

What's the best way to resolve this issue?

I had trouble moving the AddBlock call into the async block last time I tried. But I can try again.

Or I could write some tests that make sure cancelled futures aren't actually committed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the only reason that CheckpointVerifier::call has to call self.state_service.call unconditionally is that it reserved a slot in poll_ready(). But if it didn't reserve a slot, and just returned Poll::Ready(Ok()) in poll_ready, then it could do

let mut state = self.state_service.clone();
async move {
    state.ready_and().await?.call(...)
}

with no problems -- the issue is just with reserving a slot and then not using it.

Copy link
Contributor Author

@teor2345 teor2345 Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we call poll_ready() so we can propagate backpressure?

I don't think that state will be exerting any backpressure yet, so that's probably ok. If it turns out we need backpressure, it will probably come from the verifiers themselves. Or from the chain reorganisation service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 99af8a8 and 086ea49.

// - if chaining is expensive, move this check to the Future
// - should the state contain a mapping from previous_block_hash to block?
let checkpoint_hash_ref = match checkpoints.get(&block_height) {
Some(hash) => hash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tioli: if this were Some(hash) => *hash, then the outer binding could be let checkpoint_hash = ..., skipping the extra binding below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively you can just place the reference in the pattern

Suggested change
Some(hash) => hash,
Some(&hash) => hash,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7876e82.

/// which only happen in the last few hundred blocks in the chain.
/// (zcashd allows chain reorganizations up to 99 blocks, and prunes
/// orphaned side-chains after 288 blocks.)
checkpoint_list: Option<HashMap<BlockHeight, BlockHeaderHash>>,
Copy link
Contributor

@hdevalence hdevalence Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tioli: If the Option is used only to indicate the presence or absence of a checkpoint list below, is there a reason not to store a HashMap directly and use is_empty()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also had this thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7876e82.

Comment on lines 119 to 156
/// Return a checkpoint verification service, using the provided state service.
///
/// The checkpoint verifier holds a state service of type `S`, into which newly
/// verified blocks will be committed. This state is pluggable to allow for
/// testing or instrumentation.
///
/// The returned type is opaque to allow instrumentation or other wrappers, but
/// can be boxed for storage. It is also `Clone` to allow sharing of a
/// verification service.
///
/// This function should be called only once for a particular state service (and
/// the result be shared) rather than constructing multiple verification services
/// backed by the same state layer.
pub fn init<S>(
state_service: S,
checkpoint_list: Option<HashMap<BlockHeight, BlockHeaderHash>>,
) -> impl Service<
Arc<Block>,
Response = BlockHeaderHash,
Error = Error,
Future = impl Future<Output = Result<BlockHeaderHash, Error>>,
> + Send
+ Clone
+ 'static
where
S: Service<zebra_state::Request, Response = zebra_state::Response, Error = Error>
+ Send
+ 'static,
S::Future: Send + 'static,
{
Buffer::new(
CheckpointVerifier {
state_service,
checkpoint_list,
},
1,
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tioli: if we're not planning to expose the checkpoint verifier as part of the public API, rather than hiding it behind an init function that wraps it in a buffer, we could have the rest of the crate use it directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that reminder, I'll stop it being public.

Do we always want each verifier wrapped in a buffer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to wrap a verifier in a buffer unless we know that we want to share the verifier, since it adds some overhead and extra complexity. We might only need to share a highest-level verifier that we return from the crate's public init() method, and have that verifier own component verifiers like this one directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want zebra-consensus users to be able to use BlockVerifier directly? Or just HybridVerifier?

We can let HybridVerifier use BlockVerifier directly, but require outside users to init() them both with buffers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure! We can always leave things public for now and then clean up the API later, moving stuff around is relatively easy. When we wrote zebra_network we didn't get the nice encapsulated API until the very end with a pass to clean up imports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ea54df9, but I left init() public for now.

Copy link
Contributor

@hdevalence hdevalence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about whether it's safe to call AddBlock prior to doing validation, but otherwise this is great, all the rest of the comments are TIOLI.

@teor2345 teor2345 mentioned this pull request Jun 18, 2020

// These checks are cheap, so we can do them in the call()

let checkpoints = match &self.checkpoint_list {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIOLI: You can collapse all of these matches into one by using a tuple,

let (checkpoints, height) = match (&self.checkpoint_list, block.coinbase_height()) {
    (Some(checkpoints), Some(height)) => (checkpoints, height),
    _ => Error case
} ```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to change the height match to <= max_checkpoint_height later. I'm not sure how much that will change the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll have a look at that for my next PR, once I've changed the internals of CheckpointVerifier.

teor2345 added 5 commits June 19, 2020 15:16
Use HashMap, without an Option<> wrapper.
Tidy up the matches.

Part of ZcashFoundation#429.
We don't want to call the state's AddBlock until we know the block is
valid. This avoids subtle bugs if the state is modified in the call().

(in_memory currently modifies the state in the call(), on_disk modifies
the state in the async block.)

Also moves verification into the async block.

Part of ZcashFoundation#429.
The checkpoint verifier is not shared, so it doesn't need a buffer.

Part of ZcashFoundation#429.
We avoid calling the state's poll_ready() in the checkpoint verifier's
poll_ready. This change allows us to conditionally make AddBlock
requests, via a buffered state instance.

(Tower::Buffer expects a 1:1 relationship between poll_ready() and
call().)

This change breaks backpressure propagation between the state and
verifier users. That's ok, because we don't expect state changes to be
slow enough to exert backpressure on the network. (Verifiers can still
exert backpressure on the network, if verification is slow.)

Part of ZcashFoundation#429.
@teor2345 teor2345 requested a review from hdevalence June 19, 2020 06:14
@teor2345
Copy link
Contributor Author

This PR is ready for review as a stub.

I'll add tests, chain child verifies to their ancestors, and checkpoint maximum height queries in a future PR.

Comment on lines +77 to +80
let block_height = match block.coinbase_height() {
Some(height) => height,
None => return Err("the block does not have a coinbase height".into()),
};
Copy link
Contributor

@yaahc yaahc Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use ok_or and ok_or_else to convert an option to a result which then lets you use ? on it

Suggested change
let block_height = match block.coinbase_height() {
Some(height) => height,
None => return Err("the block does not have a coinbase height".into()),
};
let block_height = block
.coinbase_height()
.ok_or("the block does not have a coinbase height")?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll have a look at that for my next PR, once I've changed the internals of CheckpointVerifier.

// `Tower::Buffer` requires a 1:1 relationship between `poll()`s
// and `call()`s, because it reserves a buffer slot in each
// `call()`.
// TODO(teor): what happens if the await fails?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ready_and().await resolves to an Err (signaling failure of the state service), the error will be bubbled up by ? and verification of the block will fail with the error from the state service, i.e., block verification won't succeed unless the block can be written back to the state, which I think is what we want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That said, I think that this service shouldn't be committing anything to the state, just returning a verification result, so that we can have separation of concerns and factor out the state-commitment behavior from this checkpoint verifier on the one hand and from the full verifier on the other hand).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the buffer slot will be used up, regardless of whether the error is in the buffer or state service?

(I'll factor out the state commitment once we have a HybridVerifier service.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if the error is in the buffer, it won't reserve a slot. And if the error is in the state service, then it will use up the buffer slot.

Copy link
Contributor

@hdevalence hdevalence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to merge as a stub.

One thing I would note, though, is that in the current setup, the CheckpointVerifier tries to move as much work as possible into the response future, so that all of the response futures are as independent from each other as possible. But this is the opposite of the communication/dependency structure we want, where the service queues up a bunch of blocks while it's trying to assemble a chain from one checkpoint to the next, does the work of checking the chain once, and then sends notifications to all of the callers of the queued blocks. So going from here to the functionality we want (checkpoints every 1000 blocks) will require basically a complete restructuring of the CheckpointVerifier internals to change this structure.

@teor2345
Copy link
Contributor Author

One thing I would note, though, is that in the current setup, the CheckpointVerifier tries to move as much work as possible into the response future, so that all of the response futures are as independent from each other as possible. But this is the opposite of the communication/dependency structure we want, where the service queues up a bunch of blocks while it's trying to assemble a chain from one checkpoint to the next, does the work of checking the chain once, and then sends notifications to all of the callers of the queued blocks. So going from here to the functionality we want (checkpoints every 1000 blocks) will require basically a complete restructuring of the CheckpointVerifier internals to change this structure.

I understand, I just wanted something that I could write simple tests against.

@teor2345 teor2345 merged commit d8b48f8 into ZcashFoundation:main Jun 21, 2020
@teor2345
Copy link
Contributor Author

I think we'll want a separate service that makes all of the changes to the main chain state. See #516 for details.

(Pruning side-chains can be a separate service, because side-chains don't have any derived state.)

teor2345 added a commit to teor2345/zebra that referenced this pull request Jun 24, 2020
consensus: Add a checkpoint verifier stub

This stub only verifies blocks whose hashes are in the checkpoint
list.

It doesn't have tests, chain child verifies to their ancestors, or
support checkpoint maximum height queries.

Part of ZcashFoundation#429.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-design Category: Software design work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants