-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
During fast/light sync, check seal on a small subset of headers #1010
Conversation
93abba7
to
c97ea99
Compare
@@ -46,7 +46,7 @@ | |||
], | |||
'test': [ | |||
"hypothesis==3.44.26", | |||
"pytest~=3.2", | |||
"pytest~=3.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because we now have some tests that use the caplog
fixture, which is not available in 3.2
evm/chains/base.py
Outdated
if seal_check_frequency == 1: | ||
headers_to_check_seal = all_indices | ||
else: | ||
headers_to_check_seal = random.sample( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went as far as writing up a big comment with a secure version of random.sample
implemented using secrets.choice
. But then I deleted it because it felt a bit overly paranoid/tinfoil-hat. But then I still wanted to raise the issue.
I respect that there is very little likelihood that we're exposing attack surface by using random.sample
here, but given the potential severity of a security exploit in this type of software, how would you feel about replacing this with such a small utility that uses secure sources of randomness. My basic implementation is below:
def samp(pop: Set, size: int) -> Set: # could probably use a generic for the items in the set `Set[G]`
if size >= len(pop):
return pop
rejects = set()
for _ in range(len(pop) - size):
# tuple cast is needed to use `secrets.choice`
rejects.add(secrets.choice(tuple(pop.difference(rejects))))
return pop.difference(rejects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially wrote this using secrets.choice()
, but changed it to random
because we need to be python3.5 compatible as this is in the evm
package, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct it seems. 👍
evm/chains/base.py
Outdated
if header.parent_hash != parent.hash: | ||
raise ValidationError( | ||
"Invalid header chain; {} has parent {}, but expected {}".format( | ||
header, header.parent_hash, parent.hash)) | ||
vm_class = self.get_vm_class_for_block_number(header.block_number) | ||
vm_class.validate_header(header, parent) | ||
if i in headers_to_check_seal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
semantic nitpick
When I see inclusion checks I default to wanting the population object to be a set
type. Thoughts on casting headers_to_check_seal
to a set
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
evm/chains/base.py
Outdated
if i in headers_to_check_seal: | ||
check_seal = True | ||
else: | ||
check_seal = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just embed the call to vm_class.validate_header(header, parent, check_seal=True)
here and get rid of the check_seal
variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, validate_header()
does more than checking the seal, and we still want to perform those for every header -- it's only the seal check that we want to skip sometimes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, duh, now I understand what you mean. Will do that
p2p/constants.py
Outdated
# (https://github.com/ethereum/go-ethereum/pull/1889#issue-47241762), but in order to err on the | ||
# side of caution, we use 192/4 to ensure we check 4 headers out of every batch (of length | ||
# MAX_HEADERS_FETCH). | ||
FAST_SYNC_SEAL_CHECK_FREQUENCY = 192 // 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's clearer to specify the period here, like:
# check one in 4 headers
FAST_SYNC_SEAL_CHECK_PERIOD = 4
Then, we aren't assuming this value is used with 192 headers, either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AIUI, geth checks 1 in 100, and as I mentioned below we're doing 1 in 48. Why would we do 1 in 4 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I see now that the 'frequency` name was throwing me off. So I think my only issue with this is that it should be renamed to period. Frequency would be 1/period, ~0.02.
Edit: frequency
weakly implies that the checks are cyclical instead of random, and period
implies that even more strongly. Maybe the best solution is to just invert the number and keep it as frequency. Or come up with an alternative name for frequency like chance
or probability
or likelihood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about seal_check_random_sample_rate
?
sample_size = len(all_indices) // seal_check_random_sample_rate
headers_to_check_seal = set(random.sample(all_indices, sample_size))
evm/chains/base.py
Outdated
headers_to_check_seal = all_indices | ||
else: | ||
headers_to_check_seal = random.sample( | ||
all_indices, len(all_indices) // seal_check_frequency) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it's written as if the seal_check constant is a period. Otherwise, 192 // (192/4)
would only check 4 headers total in a chain of 192.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I intended. In geth they check every 100-th header, and we're doing every 48th
evm/vm/base.py
Outdated
@@ -712,7 +713,8 @@ def validate_block(self, block): | |||
) | |||
|
|||
@classmethod | |||
def validate_header(cls, header: BlockHeader, parent_header: BlockHeader) -> None: | |||
def validate_header( | |||
cls, header: BlockHeader, parent_header: BlockHeader, check_seal: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to default check_seal
to True
, which should remove some of the boilerplate changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the code as choosing too few headers.
I misread the comment about geth. 👍 to the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 with a rename from frequency.
Other options would be:
- FAST_SYNC_SEAL_CHECK_PERIOD
- FAST_SYNC_SEAL_CHECK_INVERSE_FREQUENCY
- FAST_SYNC_SEAL_CHECK_ONE_IN_N
- FAST_SYNC_SEAL_CHECK_EVERY
- FAST_SYNC_SEAL_CHECK_SKIP (although this is semantically off-by-one)
OR to invert the value to be a frequency (plus the related logic of course), which is easier to name.
Edit: frequency weakly implies that the checks are cyclical instead of random, and period implies that even more strongly. Maybe the best solution is to just invert the number and keep it as frequency. We can drop the cyclical implication entirely with an alternative name like chance
or probability
or likelihood
.
For every header batch, pick a random subset and check the seal only on those.
For every header batch, pick a random subset and check the seal only on those.
Before this change:
Imported 192 headers in 1.33 seconds, new head: #3583663
After this change:
Imported 192 headers in 0.53 seconds, new head: #3585558
Closes: #932