Optimistic Sync tests proposal #28

mkalinin · 2022-07-21T12:22:56Z

Optimistic sync is an extension of consensus specs, but it seems valuable to feature consensus-spec-tests with a few test cases checking CL client's behaviour in complicated scenarios. Hive is the main testing tool that currently does this job. Debugging cross-layer (CL + EL) Hive test is difficult, writing Hive tests employing sophisticated re-org scenarios is even harder while both should be much easier if handled by consensus-spec-tests suite.

As optimistic sync is an opt-in backwards compatible extension, test format should follow the same principle. A subset of tests separated from the others and utilizing fork_choice test format seems like a reasonable approach. Clients that are not supporting optimistic sync (tho there is no such clients to date) may not support this subset at all.

It is proposed to extend fork choice tests format with PayloadStatusV1 responses which EL would return when newPayload or forkchoiceUpdated method is called for a corresponding block. CL clients that are implementing opti sync tests will have to mock EL clients accordingly.

Format extension for `on_block` handler

{
  block: block_{block_root},
  valid: true/false,
  payload_block_hash: block.body.execution_payload.block_hash, # payload hash is convenient for the EL mock
  payload_status: {status: "VALID" | "INVALID" | "SYNCING" | "ACCEPTED", latestValidHash: blockHash | null, validationError: null}
}

Should the PayloadStatus be ever updated in a backwards incompatible way these tests will have to be updated accordingly. Though, this is unlikely to happen outside of a hard fork on CL side, thus, tests may be aligned as well supporting the old status structure for previous hard forks.

Example scenario

B0 <- B1 <- B2 <- B3 
  \
    <- B1' <- B2' <- B3' <- B4'

1. Client imports [B0 ... B3], payload_status: {status: "VALID", ...} for each block in this branch
2. Client reorgs to and optimistically imports [B1' ... B3'], payload_status: {status: "SYNCING", ...} for each of this blocks
3. Client imports B4' with payload_status: {status: "INVALID", latestValidHash: B0.body.execution_payload.block_hash, ...}
4. head == B3

It is also proposed that valid flag for each of B1' ... B4' blocks is set to false indicating that these blocks are invalid allowing this test to result in the same outcome on CI and when run by a client without optimistic sync support. CL clients that do support optimistic sync should omit this flag and rely on payload_status instead.

cc @hwwhww @djrtwo @potuz @ajsutton @tersec @paulhauner

The text was updated successfully, but these errors were encountered:

potuz · 2022-07-21T16:28:26Z

I strongly support the creation of these tests. I think though the format may be forced to be a little more complicated. When importing block A with a payload P, the proposed format is to include the execution status of P. There are typically two calls when importing a block. A call to new_Payload(P) that would be covered by this format. But also there's a call to notify_FockhoiceUpdated. This latter call may be triggered for a payload different than P. It can be for a previously imported payload that has since changed its execution status.

In fact there may be arbitrarily many calls to FCU, since if we are following a branch in a tree with N forks, that turns out to have N-1 branches with INVALID tip and only one VALID, if the LMD weights of the branches is unfortunate, we would have to call FCU N times until we find a viable head.

I thus propose that the format has a map {payloadHash: payload_status} where it's listed the status response that the mocked EL would give for calls to each of the payloads that may be involved. It may even be a good indication of an error if the CL calls the EL with a payload not in the map.

g11tech · 2022-07-21T18:50:04Z

+1 from lodestar

tersec · 2022-07-22T10:49:53Z

This is reasonable, and useful, as long as the tests are strongly distinguished from current consensus-spec tests which are are all

100% deterministic
100% self-contained (each test contains some completely defined starting conditions, some completely defined set of steps to perform, and some defined acceptance conditions)
do not require access to a network or other external resources; all steps required are part of internal state transition function or closely adjacent parts of a CL
depend only one the CL's code, no other software

The CL tests that exist are partly useful because they essentially never (as close to 0% as feasible -- sometimes, there are CI runner infrastructure failures, etc) create false-positive failures. Therefore, when I look at why a commit might have failed CI, I look at the consensus-spec-tests and if they're failing, something is almost certainly wrong with that commit. If a flakier test fails, well, time to look at what happened in more detail. As long as this change doesn't undermine that reasoning, I'm okay with it.

tersec · 2022-07-22T10:52:07Z

By "strongly distinguished": I mean, not just one more superficially similar-looking test in a folder of otherwise-self-contained-deterministic/etc test which has some new flag to treat it differently. These should be put elsewhere in the repo organization so that they can be reasonably run separately without continuously having to sort/filter which are the guaranteed reliable tests and which are the potentially flakier tests.

ajsutton · 2022-07-22T23:26:05Z

100% deterministic
100% self-contained (each test contains some completely defined starting conditions, some completely defined set of steps to perform, and some defined acceptance conditions)
do not require access to a network or other external resources; all steps required are part of internal state transition function or closely adjacent parts of a CL
depend only one the CL's code, no other software

I would have expected these tests to meet these conditions as long as you consider the test runner part of the CL code. The test runner is just being augmented to be able to return additional execution payload statuses - not just VALID or INVALID but also SYNCING and ACCEPTED and it can now return the latestValidHash value when the response is INVALID.

You wouldn't actually be running an EL for these tests.

tersec · 2022-07-23T18:11:07Z

100% deterministic
100% self-contained (each test contains some completely defined starting conditions, some completely defined set of steps to perform, and some defined acceptance conditions)
do not require access to a network or other external resources; all steps required are part of internal state transition function or closely adjacent parts of a CL
depend only one the CL's code, no other software

I would have expected these tests to meet these conditions as long as you consider the test runner part of the CL code. The test runner is just being augmented to be able to return additional execution payload statuses - not just VALID or INVALID but also SYNCING and ACCEPTED and it can now return the latestValidHash value when the response is INVALID.

You wouldn't actually be running an EL for these tests.

Sure, seems reasonable and useful.

potuz · 2022-07-23T19:39:50Z

The test runner is just being augmented to be able to return additional execution payload statuses - not just VALID or INVALID but also SYNCING and ACCEPTED and it can now return the latestValidHash value when the response is INVALID.

I've been wondering if it's not worth also to allow "timeout" to be a response. That's borderline implementation dependent more than spec-domain, but this is also a source of bugs if the CL acts on an optimistic head depending on the status when the EL timesout

hwwhww · 2022-07-26T08:54:10Z

Thank you @mkalinin for the proposal!

It also bothers me that there are no tests cover optimistic sync specs. 😅

payload_status: {status: "VALID" | "INVALID" | "SYNCING" | "ACCEPTED", latestValidHash: blockHash | null, validationError: null}

Is it correct to say, for the existing Bellatrix fork choice tests, clients take all payloads as VALID?
That is, is it okay if I start by adding this to all existing Bellatrix tests:

payload_block_hash: block.body.execution_payload.block_hash,
payload_status: {status: "VALID", latestValidHash: blockHash, validationError: null}

And then write new tests of the invalid/syncing cases?

mkalinin · 2022-07-26T11:19:56Z

Thank you @mkalinin for the proposal!

It also bothers me that there are no tests cover optimistic sync specs. 😅

payload_status: {status: "VALID" | "INVALID" | "SYNCING" | "ACCEPTED", latestValidHash: blockHash | null, validationError: null}

Is it correct to say, for the existing Bellatrix fork choice tests, clients take all payloads as VALID? That is, is it okay if I start by adding this to all existing Bellatrix tests:
payload_block_hash: block.body.execution_payload.block_hash,
payload_status: {status: "VALID", latestValidHash: blockHash, validationError: null}
And then write new tests of the invalid/syncing cases?

I think we should rather leave fork_choice tests unaffected and create separate optimistic_sync test suite allowing for existing tests to run with no changes from the client side

mkalinin · 2022-07-26T11:53:11Z

I thus propose that the format has a map {payloadHash: payload_status} where it's listed the status response that the mocked EL would give for calls to each of the payloads that may be involved. It may even be a good indication of an error if the CL calls the EL with a payload not in the map.

I am not sure if we want to maintain this level of complexity. My gut is that we can get to any desirable state by adding one more block to a branch of a block tree and pass required payload status in it.

Having a specification of method calls and responses for each Engine API method might be valuable, currently we have newPayload and fcU but these set of methods may be updated in the future. Something like the following:

{
  payload_block_hash: block.body.execution_payload.block_hash,
  engine_api_responses: {
    "engine_newPayloadV1": {status: "VALID", latestValidHash: blockHash, validationError: null},
    "engine_forkchoiceUpdatedV1": {payloadStatus: {status: "VALID", latestValidHash: blockHash, validationError: null}, payloadId: null}
  }
}

This format has two properties: i) we can have different payload status returned by newPayload and fcU, e.g. ACCEPTED and VALID respectively ii) allows for simplified EL mocking leveraging full name of methods and fully specified responses

potuz · 2022-07-26T12:44:56Z

I am not sure if we want to maintain this level of complexity. My gut is that we can get to any desirable state by adding one more block to a branch of a block tree and pass required payload status in it.

The kind of situation that worries me about optimistic sync tests is if clients agree on head/justified information after a block is deemed INVALID. If we are syncing optimistically a tree like this, blocks arrived in order and they are all SYNCING

1 <-- 2 <--- 3 <--- 4 <---- 5 <--- 6
        \------7 <----8 <----9
         \      \------10
          \------------11 <--- 12 <---13

The last block is 13, and at this point the EL catches up and replies INVALID, LVH pointing to block 2. What is our Head?
With the information in the current proposal the ONLY thing the CL can do is remove the branch 11 <-- 12 <-- 13. and choose between 6, 9, and 10 to be the new head by normal forkchoice rules.

But the issue here is that in runtime this is not all we do when we import block 13. The right answer to the question "what is our head?" depends on the optimistic status of the remaining blocks. If they are still optimistic the answer would be just as the previous package, but the EL has just catched up, so it could well be that the forkchoice head is 6 but the EL already knows it's INVALID. In fact all descending blocks from two may be INVALID except 10, which should be our head, regardless if it's optimistic or not. The way nodes find is out is by sending repeated FCUs until they find a viable or optimistic branch to follow.

I see several possible bugs here, among them
a) a node does not call repeated FCUs, takes 6 as a head and tries to import a child of that block, but the EL already discarded 6 as INVALID and can't connect.
b) The node does call repeated FCUs but does not update the optimistic status of the new head correctly. Works both ways it finds 10 as head and treats it as SYNCING still (since it was its status before) and therefore does not act on it because it thinks is optimistic, or the worse possible bug is that it treats it as VALID and acts on it when it may be optimistic.
c) The node's algorithm to discard branches leaves the LMD weights in an incorrect status, leading to wrong heads upon multiple removals.

This is why I consider valuable to make sure we agree on behavior in these more complicated situations.

mkalinin · 2022-07-27T07:28:13Z

We may achieve the necessary level of flexibility by adding artificial on_payload_status step:

- {payload_status: {block_hash: '0x...', status: {status: 'VALID', latestValidHash: blockHash, validationError: null}}}

This step will need to precede the corresponding on_block step to let EL mock be ready for handling newPayload and fcU:

- {payload_status: {block_hash: '0x...', status: {status: 'VALID', latestValidHash: blockHash, validationError: null}}}
- {block: block_0x..., valid: true/false}

thoughts? cc @potuz @hwwhww

potuz · 2022-07-27T09:51:22Z

We may achieve the necessary level of flexibility by adding artificial on_payload_status step

This may work if these are interpreted not as current values returned from the EL but rather setting up the mock, so that when on_block step arrives, the EL returns those values. This makes it equivalent to the map in my post I think

mkalinin · 2022-07-27T10:51:25Z

We may achieve the necessary level of flexibility by adding artificial on_payload_status step

This may work if these are interpreted not as current values returned from the EL but rather setting up the mock, so that when on_block step arrives, the EL returns those values. This makes it equivalent to the map in my post I think

Yes, exactly this way. And it makes it easier to implement I guess and keep the format backwards compatible to the fork choice tests format. So, any changes to that format may be easily applied to optimistic sync tests as well

mkalinin · 2022-08-01T14:23:54Z

@hwwhww does the following format make sense to you:

- {payload_status: {block_hash: '0x...', status: {status: 'VALID', latestValidHash: blockHash, validationError: null}}}
- {block: block_0x..., valid: true/false}

hwwhww · 2022-08-02T14:00:42Z

@mkalinin
To avoid payload_status.status, I'd suggest renaming payload_status to payload_info or so. Otherwise, it looks good to me.

mkalinin · 2022-08-02T14:07:00Z

@mkalinin
To avoid payload_status.status, I'd suggest renaming payload_status to payload_info or so. Otherwise, it looks good to me.

I like payload_info more 👍 As we may easily extend it with more data in the future

mkalinin · 2022-09-12T10:15:54Z

Implemented in ethereum/consensus-specs#2965 and ethereum/consensus-specs#2982

mkalinin mentioned this issue Aug 3, 2022

Extend fork_choice test format with on_payload_info ethereum/consensus-specs#2965

Merged

hwwhww mentioned this issue Aug 23, 2022

Add optimistic sync tests ethereum/consensus-specs#2982

Merged

4 tasks

mkalinin closed this as completed Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimistic Sync tests proposal #28

Optimistic Sync tests proposal #28

mkalinin commented Jul 21, 2022

potuz commented Jul 21, 2022

g11tech commented Jul 21, 2022

tersec commented Jul 22, 2022 •

edited

Loading

tersec commented Jul 22, 2022

ajsutton commented Jul 22, 2022

tersec commented Jul 23, 2022

potuz commented Jul 23, 2022

hwwhww commented Jul 26, 2022

mkalinin commented Jul 26, 2022

mkalinin commented Jul 26, 2022

potuz commented Jul 26, 2022

mkalinin commented Jul 27, 2022

potuz commented Jul 27, 2022

mkalinin commented Jul 27, 2022

mkalinin commented Aug 1, 2022

hwwhww commented Aug 2, 2022

mkalinin commented Aug 2, 2022

mkalinin commented Sep 12, 2022

Optimistic Sync tests proposal #28

Optimistic Sync tests proposal #28

Comments

mkalinin commented Jul 21, 2022

Format extension for on_block handler

Example scenario

potuz commented Jul 21, 2022

g11tech commented Jul 21, 2022

tersec commented Jul 22, 2022 • edited Loading

tersec commented Jul 22, 2022

ajsutton commented Jul 22, 2022

tersec commented Jul 23, 2022

potuz commented Jul 23, 2022

hwwhww commented Jul 26, 2022

mkalinin commented Jul 26, 2022

mkalinin commented Jul 26, 2022

potuz commented Jul 26, 2022

mkalinin commented Jul 27, 2022

potuz commented Jul 27, 2022

mkalinin commented Jul 27, 2022

mkalinin commented Aug 1, 2022

hwwhww commented Aug 2, 2022

mkalinin commented Aug 2, 2022

mkalinin commented Sep 12, 2022

Format extension for `on_block` handler

tersec commented Jul 22, 2022 •

edited

Loading