Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EC-343] Add block header validation during fast sync #380

Merged
merged 6 commits into from
Jan 11, 2018

Conversation

LukasGasior1
Copy link
Contributor

No description provided.

@@ -584,6 +626,11 @@ object FastSync {
validators: Validators, peerEventBus: ActorRef, etcPeerManager: ActorRef, syncConfig: SyncConfig, scheduler: Scheduler): Props =
Props(new FastSync(fastSyncStateStorage, appStateStorage, blockchain, validators, peerEventBus, etcPeerManager, syncConfig, scheduler))

// validation parameters (see: https://github.com/ethereum/go-ethereum/pull/1889)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if these values seem to be defined in the referenced PR, shouldn't be a good idea to put them within application.conf? (hidden from the normal user)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

log.warning(s"Block header validation failed during fast sync at block ${header.number}: $error")
blacklist(peer.id, blacklistDuration, "block header validation failed")

// discard last N blocks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this. If we found that a specific block is invalid, why previous N ones are dropped? Those might be valid, or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only validate every ~Xth block (+ some randomization) we cannot be sure these are valid blocks.
See ethereum/go-ethereum#1889 for comprehensive explanation. In short:

With this caveat calculated, the fast sync should be modified so that up to the pivoting point - X, only every K=100-th header should be verified (at random), after which all headers up to pivot point + X should be fully verified before starting state database downloading. Note: if a sync fails due to header verification the last N headers must be discarded as they cannot be trusted enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it always be N when N are available?

In other words: forget the begining of the chain, I'm talking about a situation when we are in the [target - X, target] range. Shouldn't it be:

val numberOfBlocksToDiscard = if (target - current <= X) N + X - (target - current) else N

(maths may need double-checking)

So, N plus the number of blocks already validated in the X phase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think we should drop the extra blocks? For me it doesn't seem to make a difference in this context. (is there a difference if validation fails at block target/2 or target-10? I think it's sufficient to drop N in both cases)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I thought we should revert to a fully validated block, but since N is not divisible by K that is not the case. Anyway, the way put it - no, I don't think it will make a difference 👍


// discard last N blocks
(header.number to ((header.number - N) max 1) by -1).foreach { n =>
blockchain.getBlockHeaderByNumber(n).foreach { header =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this getBlockHeaderByNumber needed if we already have header.hash and removeBlock doesn't fail (afaik) if hash does not exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, it's not needed. (I don't expect this piece of code to be called too often, but still)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is needed here. header was shadowed inside foreach, we only have one header (the one we start with). I changed to name to make that more clear

@KonradStaniec KonradStaniec self-requested a review January 8, 2018 10:59
Copy link
Contributor

@KonradStaniec KonradStaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code look goods, but i have one question.

According to
With this caveat calculated, the fast sync should be modified so that up to the pivoting point - X, only every K=100-th header should be verified (at random), after which all headers up to pivot point + X should be fully verified before starting state database downloading .
Shouldn't we also verify all blocks from targetBlock to targetBlock+x to be sure our targetblock is safe ?

@LukasGasior1
Copy link
Contributor Author

@KonradStaniec
I'm not sure I follow this part. After reaching targetBlock we switch to regular sync, which validates each block anyway. Also for should be fully verified before starting state database downloading - we start state database downloading at the beginning, parallel to block download, so this doesn't seem to make sense. On the other hand, if we detect target block is invalid, do we need to download the state again? Choose different target block? Or just try with different peer?

@KonradStaniec
Copy link
Contributor

@LukasGasior1 I see, my misunderstanding comes from the fact that we have a diffrent algorithm that geth.
But there is a valid point in there, redownloading full state agian seems like overkill if target block is invalid, is it possible to switch algorithm to first downloading blockchain and then state ? ( i think i remeber that it was switched in the past but i dont remember why)

@LukasGasior1
Copy link
Contributor Author

It was because we want state as soon as possible, otherwise other peers may start prunning it and we end up with partially downloaded state that we cannot continue downloading (in fact it still may happen). Btw I think geth also does it this way (i.e it downloads state first).

@rtkaczyk rtkaczyk self-requested a review January 8, 2018 15:24
@rtkaczyk
Copy link
Contributor

rtkaczyk commented Jan 8, 2018

@LukasGasior1

After reaching targetBlock we switch to regular sync, which validates each block anyway.

Right, but if we discover a block was invalid after switching to regular we won't be able to recover, will we?

@rtkaczyk
Copy link
Contributor

rtkaczyk commented Jan 8, 2018

Wait, now I'm not sure I follow...

@KonradStaniec wrote:

Shouldn't we also verify all blocks from targetBlock to targetBlock+x to be sure our targetblock is safe ?

We should full validate from targetBlock - x to targetBlock, which we are doing. Or am I missing something?

@LukasGasior1
Copy link
Contributor Author

@rtkaczyk it comes from this sentence (in geth PR):

With this caveat calculated, the fast sync should be modified so that up to the pivoting point - X, only every K=100-th header should be verified (at random), after which all headers up to pivot point + X should be fully verified before starting state database downloading

according to it, we should also validate pivot + X blocks before starting state download..which doesn't seem to make sense in our case

# See: https://github.com/ethereum/go-ethereum/pull/1889
fast-sync-block-validation-k = 100
fast-sync-block-validation-n = 2048
fast-sync-block-validation-x = 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you choose this X value? In the PR (white paper) X=24

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, so you doubled it because we treat target/pivot differently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was not sure how to interpret that sentence, but my intuition is that validating pivot - 2x to pivot is the same as validating pivot - x to pivot + x. (and validation after pivot happens anyway in regular sync).

@KonradStaniec
Copy link
Contributor

@LukasGasior1 I am not sure how its implemented in geth ( i will look it up to be sure ) and i am still unsure how it should works but this paragraph makes me little unsure:
Using this caveat however would mean, that the pivot point can be considered secure only after N headers have been imported after the pivot itself. To prove the pivot safe faster, we stop the "gapped verificatios" X headers before the pivot point, and verify every single header onward, including an additional X headers post-pivot before accepting the pivot's state .
As I understand, our targetBlock and its state should not be considered safe to download unless there is at least X verified blocks after it. So if we would accept it and download it (this State) and after let's say X - 1 blocks, some block would fail, shouldn't we redownload whole state from some other block?

Also this are maybe question to some futre pr, as till now we were doing pretty fine without validation at all :)

@rtkaczyk
Copy link
Contributor

rtkaczyk commented Jan 9, 2018

So if we would accept it and download it (this State) and after let's say X - 1 blocks, some block would fail, shouldn't we redownload whole state from some other block?

I think we should address that. At the very least we should check whether our state root hash equals to the one on the accepted target block, and otherwise declare failure to sync (and plan something better for the future)

@LukasGasior1
Copy link
Contributor Author

LukasGasior1 commented Jan 10, 2018

So what we need to do is:

  • validate target block (check if our state hash == its state hash)
  • if this is fine we can be pretty sure our fast sync is valid (as target -X blocks were also validated)
  • then if we fail at block target+1 in regular sync, I think we should just drop this one block (in other words, RegularSync should behave as it currently does)

But if validation fails on target block (i.e our state != it's state) we declare failure (and drop the whole blockchain?). Note that if validation fails at something like target - 10 it's sufficient to just drop N blocks (as the gap between blocks we validate is smaller than N).

wdyt @KonradStaniec @rtkaczyk ?

@rtkaczyk
Copy link
Contributor

sounds good

@LukasGasior1
Copy link
Contributor Author

Actually it doesn't sound good. validate target block (check if our state hash == its state hash) - there is no such thing as "our state hash". We can only do regular validation on target block (which we do), the questions is, do we need to drop the whole blockchain and declare a failure? Isn't it enough that we drop N blocks? (Note that if validation fails at something like target - 10 it's sufficient to just drop N blocks (as the gap between blocks we validate is smaller than N). I think it should also apply to target block == no need to declare failure(?))

@LukasGasior1
Copy link
Contributor Author

Let's consider a peer that sends us invalid target block when we start fast sync:
since we validate every K blocks he should be able to provide us with valid blocks up to some point (if he's able to do that to target block than it's a valid chain). Let's say validation fails at some block B (which may be as soon as block 1), we go back N blocks ending up at B-N. Now if validation fails again at this point, we go another N back and so on ultimately reaching block 1 (which we may as well reach on first failure). And I think here's where we're missing some logic: in this case we should restart fast sync and try to choose different target block. Currently we'll reach block 1 and retry to sync, still keeping the same target.

@rtkaczyk
Copy link
Contributor

Since we start with downloading state, we first need to obtain the state root hash for the target block - this is what I meant by our state root hash. But this block cannot be validated until we download all preceding blocks, and when we do, it may surface that our state root hash was incorrect.

So rather than restarting fast sync, or redownloading the state, or whatever, we could do an easy/dumb thing and declare a failure.

As a follow-up we can come up with something better. Here's one idea:

  • download state after the blocks
  • upon reaching the target block, determine a new updated target (500 blocks behind the current top of the chain)
  • repeat above until, upon reaching the updated target block, we are in fact 500 blocks behind (+- some margin, say 10) the top of the chain
  • now downloading of the state is the same with regard to pruning in the source nodes, yet the state root hash has been validated

@LukasGasior1
Copy link
Contributor Author

Ok, it makes sense. If target fails validation we really should declare a failure. Besides that, I think we should also implement what I mentioned (choose different target if current (failed) block is less than N).

@rtkaczyk
Copy link
Contributor

choose different target if current (failed) block is less than N

Sorry I did not get that. Do you mean choosing different target number, or different target header in the initialSyncState?

@LukasGasior1
Copy link
Contributor Author

I mean restarting fast sync from scratch, i.e asking different peer for (possibly) different block. But now I'm not sure it'll be needed anymore since we're going to fail if target is invalid anyway

@rtkaczyk
Copy link
Contributor

rtkaczyk commented Jan 10, 2018

So there are a few ways you can go about restarting:

  1. Redownload the state from the target block that is now validated - but run into the risk of missing node data due to pruning
  2. Choose new target block number - but run into the risk the obtained stateRootHash will be incorrect again, requiring another restart
  3. Continue fast sync block download to a newer target block, and then redownload the state

\1. and 2. are not perfect, and 3. is similar to my idea above - if that would be implemented we wouldn't need to restart at all.

@LukasGasior1
Copy link
Contributor Author

  1. would also make it possible to update our target block while fast sync is running fine (no validation errors). So that we don't have to worry about state prunning on the nodes we sync with, but this would require even more changes. For now I'll go with just declaring a failure and create a task for this improvement.

@LukasGasior1 LukasGasior1 merged commit f397e16 into phase/release1_1 Jan 11, 2018
@LukasGasior1 LukasGasior1 deleted the feature/fastSyncPowValidation branch January 11, 2018 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants