Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EFM] Bug: Service Events Emitted outside System Chunk fail verification #6622

Closed
jordanschalm opened this issue Nov 6, 2024 · 5 comments
Closed
Assignees
Labels
Bug Something isn't working Execution Cadence Execution Team Protocol Team: Issues assigned to the Protocol Pillar. S-BFT

Comments

@jordanschalm
Copy link
Member

jordanschalm commented Nov 6, 2024

Context

In #5828, we modified the definition of Service Events to allow emission outside the context of the system chunk. This was done to simplify the smart contract code associated with emitting service events.

Bug Description

Verification Nodes make assumptions about service events during verification which no longer hold. In particular, they assume that all service events attached to an ExecutionResult were emitted during the system chunk.

if systemChunk {
equal, err := result.ServiceEvents.EqualTo(serviceEvents)
if err != nil {
return nil, fmt.Errorf("error while comparing service events: %w", err)
}
if !equal {
return nil, chmodels.CFInvalidServiceSystemEventsEmitted(result.ServiceEvents, serviceEvents, chIndex, execResID)
}
}

This results in verification failing when verifying the system chunk:

{"level":"info","node_role":"verification","node_id":"44696574657220536869726c6579008b30c66ada1bb8c675a58c3d3d491a0d81","engine":"verifier","result_id":"6c9b24437f2acbbb73df8d1890af617edb452a5a5c2ef19fe7834710599ddfcc","chunk_id":"65048fec4cf4e6d9b72d06e35eca6dbb4dc3e304ae9b9b759266bfce092225b0","chunk_index":1,"error":"unknown type of chunk fault is received (type: *chunks.CFInvalidServiceEventsEmitted) : service events differs, got [[]] expected [[{recover %!s(*flow.EpochRecover=&{{21 118800 119899 120899 121899 123799 [0xc022c38d80 0xc022c38de0 0xc022c38e40 0xc022c38ea0 0xc022c38f00 0xc022c38f60 0xc022c38fc0 0xc022c39020] [[[48 215 24 233 154 63 70 22 238 6 120 107 217 128 138 180 160 153 180 231 186 100 196 145 250 52 64 18 189 127 222 172] [65 108 101 120 32 72 101 110 116 115 99 104 101 108 0 0 10 101 60 102 43 200 34 121 122 101 211 57 99 249 137 225]]] [125 178 93 194 178 14 7 68 10 68 173 196 102 133 201 206] 5000 1730965921} {21 [{[142 165 199 58 12 119 153 19 120 78 46 183 158 92 159 237 40 154 91 86 216 169 208 194 43 203 42 95 123 208 203 131 19 18 243 33 20 118 171 139 133 102 78 173 68 195 110 226] [[48 215 24 233 154 63 70 22 238 6 120 107 217 128 138 180 160 153 180 231 186 100 196 145 250 52 64 18 189 127 222 172] [65 108 101 120 32 72 101 110 116 115 99 104 101 108 0 0 10 101 60 102 43 200 34 121 122 101 211 57 99 249 137 225]]}] 0xc020d1f180 [0xc020d1f2c0 0xc020d1f400] map[[65 110 100 114 101 119 32 66 117 114 105 97 110 0 19 76 51 252 76 24 250 225 100 67 93 120 95 9 75 86 140 122]:0 [66 97 115 116 105 97 110 32 77 117 108 108 101 114 0 145 95 102 66 26 81 66 133 146 87 75 179 227 189 103 27 135]:1]}})}]] for chunk 1 with result ID 6c9b24437f2acbbb73df8d1890af617edb452a5a5c2ef19fe7834710599ddfcc","time":"2024-11-06T20:01:31.721586794Z","message":"could not verify chunk"}
  • A service event S was emitted outside the system chunk, and is included in the ExecutionResult
  • The Verification node executes the system chunk and observes no service event
  • The Verification node, while verifying its execution, expects S to have been emitted during the system chunk, and fails verification

Proposed Solution

VNs only process a subset of chunks. Because of that, we need to both:

  1. update the Execution Node to explicitly state which service events were emitted in which chunk
  2. update the Verification Node to:
    • read which service events were emitted in the chunk it is verifying (for all chunks)
    • verify service events for all chunk (not only system chunk)
    • enforce that byzantine Execution Node can't add Service Events (e.g. in between the chunks)

Solution 1 - Add service event chunk mapping to ExecutionResult data structure

The simplest way to communicate which service event was emitted in which chunk is to add this to the ExecutionResult data structure, which is a part of the Block. We can make one of the following modifications:

  • ExecutionResult.Chunks - We can put a ServiceEventIndices here referencing the outer ExecutionResult.ServiceEvents
  • ExecutionResult.ServiceEvents - Or, we can put a ChunkIndex here referencing where the event was emitted

Although logically simpler, this solution requires a change to the Block data model, which is used in both the networking and storage layers. Coordinating these changes in a Protocol HCU would be much more challenging and error-prone.

Solution 2 - Add service event index branch to ChunkBody.EventsCollection Merkle commitment

In this solution, we re-use the existing ChunkBody.EventsCollection field, but extend the underlying Merkle tree to include information about service events.

ChunkBody.EventsCollection Changes

Currently this field is the root hash of the Merkle tree formed by the ordered list of events. We propose adding a branch to the Merkle tree to include the indices of service events. This way the ExecutionResult contains a commitment to which service events are emitted in which chunk.

The diagram below shows a scenario where there an ExecutionResult has multiple service events, not all emitted in the same chunk. We are looking at one of the chunks, in which service events S2, S3 (indices 1,2) were emitted:
Image

ChunkDataPack Changes

The ChunkDataPack needs to include enough information for Verification Nodes to:

  • verify that service events emitted in their chunk were computed correctly
  • verify that all service events in the ExecutionResult.ServiceEvents field are associated with some chunk (caution order sensitive)

To accomplish this, we make the following data model changes to ChunkDataPack:

  • Add a new field (ServiceEventIndicesProofs) which is a list with one proof for each chunk. Each element looks like:
    • Merkle proof for the new branch of the EventsCollection Merkle tree. This contains:
      • ServiceEventIndices - the concrete value of the new top-level right branch of the Merkle Tree
      • EventsHash ($H_events$ in the diagram) - the hash of the top-level left branch of the Merkle Tree

Verification Logic Changes

Verification Nodes will determine the list of expected service events for their chunk committed by the Execution Node by using the ServiceEventIndices field from the ChunkDataPack. They will verify that the service events they compute are equal to the expected list.

Additional verification requirements:

  • Verification nodes must validate each Merkle proof in ServiceEventIndicesProofs
  • Verification nodes must validate that the ServiceEventIndicesProofs is internally consistent and consistent with ExecutionResult.ServiceEvents.
    • Concatenating all ServiceEventIndices lists for consecutive chunks should result in exactly the list [0,1,...,n-1], where n=len(ExecutionResult.ServiceEvents)
    • In the typical case, where n=0, all ServiceEventIndices are empty.

Compatibility

  • ChunkDataPack storage (EN)
  • ChunkDataPack network messages (EN/VN)
  • ChunkDataPack construction business logic (EN)
  • Verification business logic (VN)

Solution 3 - Chunk-Indexed generic service events (wip)

The ServiceEvent.Event field is currently interface{}-typed. Currently the concrete values added here are instances of EpochSetup, etc. If we replace these concrete values with ChunkIndexed[ServiceEvent], a wrapper which includes a chunk index.

type ChunkIndexed[T SpecificServiceEvent] struct {
  ChunkIndex uint
  Event T
}

This would:

  • not require a data model change to Block
  • it could be done in a backward compatible way:
    • when decoding ServiceEvent.Event, if the underlying type is not ChunkIndexed, we infer a ChunkIndex of 0 (all service events prior to this change)
  • VNs have direct access to a commitment from the ENs, in the Execution Result, for which service events are emitted in which chunk
@jordanschalm jordanschalm added Bug Something isn't working Execution Cadence Execution Team Protocol Team: Issues assigned to the Protocol Pillar. S-BFT labels Nov 6, 2024
@jordanschalm jordanschalm self-assigned this Nov 6, 2024
@jordanschalm jordanschalm added this to the EFM-Q2 Follow-on updates milestone Nov 6, 2024
@jordanschalm
Copy link
Member Author

I think we will want to use service event hashes rather than indices in the EventsCollection proof.

  • We want to support parallel transaction execution in the future
  • The execution engine is already written in a concurrent way, where individual collections are executed independently, without the context of prior collection execution readily accessible.
  • If we use indices, we need to populate the EventsCollection separately from other fields, after all collections are committed
  • If we use hashes, we can independently populate each EventsCollection only based on executing that one collection (potentially even if prior collections have not been executed yet).

@jordanschalm
Copy link
Member Author

jordanschalm commented Nov 14, 2024

Notes from conversation with @zhangchiqing

Generally onboard with design, and agrees that Option 2 is better overall. Pointed out that changing the meaning of EventsCollection would have downstream impact on Rosetta (Coinbase), which would entail additional engineering work to allow Rosetta to validate event hashes, as well as making sure we have buy-in from them to coordinate the change.

Rough overview of additional work this would entail

  • Modify ExecutionData to include ServiceEventIndicesProofs, so ANs have access to this data
    • This includes modifying the calculation of ExecutionResult.ExecutionDataID (though still no data model changes)
    • This adds another data model which would need to be upgraded mid-spork, as part of this upgrade
  • Modify some AN GRPC method to return ServiceEventIndicesProofs for a block
  • Modify Rosetta to use ServiceEventIndicesProofs when verifying that events returned from the AN match the hash committed to in the block

@AlexHentschel
Copy link
Member

I think we will want to use service event hashes rather than indices in the EventsCollection proof.

overall, I have no objections. Only difference between indices and hashes is:

  • I don't think we can assume that service events have always different hashes. In other words, we need to consider the case that two different service events in the same block, possibly next to each other but emitted in different chunks have the same hash.
  • In principle there is not problem with using hashes. It only means that every verification node needs to verify the entire list of hashes 👇 Image
  • If we used indices, a verifier would only need to inspect that the indices of the service event in its chunk form a continuous sequence with the adjacent chunks. But computationally that is largely irrelevant and conceptually there is also not much of a difference in terms of complexity.

@AlexHentschel
Copy link
Member

AlexHentschel commented Nov 15, 2024

I took a loot at option 3, which I like very much:

  • hopefully no changes to rosetta needed (eliminating major source of complexity).
  • no change to block data model

In my opinion, Option 3 is the best so far (if we can work it out without discovering other major complexities).

@jordanschalm
Copy link
Member Author

Addressed by #6744

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Execution Cadence Execution Team Protocol Team: Issues assigned to the Protocol Pillar. S-BFT
Projects
None yet
Development

No branches or pull requests

2 participants