op-node: log mgasps across block building/inserting lifecycle #12907

bitwiseguy · 2024-11-12T21:21:37Z

Overview

In order to provide more visibility into the block processing performance of op-node + its execution client, this pr attempts to add logs to surface data for plotting mgasps (Mgas per second) for every block. I figured modifying op-node instead of op-geth would give us more consistent measurements when we measure other execution clients.

This is similar to the log written by op-geth here, however this pr attempts to provide a more holistic view of mgasps by logging duration of the entire block processing lifecycle instead of just the engineApi.NewPayload portion of block processing. The entire lifecycle consists of the following sequential engineApi calls:

engineApi.ForkchoiceUpdate(attrs)
engineApi.GetPayload
engineApi.NewPayload
engineApi.ForkchoiceUpdate

Event-based block building+inserting codepath

BuildStartEvent
- EngDeriver.onBuildStart
  - engine.startPayload
    - engine.ForkchoiceUpdate(attrs)
  - emit ForkchoiceUpdateEvent
  - emit BuildStartedEvent
BuildStartedEvent
- EngDeriver.onBuildStarted (basically a no-op)
  - emit BuildSealEvent
BuildSealEvent
- EngDeriver.onBuildSeal
  - engine.GetPayload
  - log Built new l2 block
    - seal_time (covers GetPayload)
    - build_time (covers ForkchoiceUpdate + GetPayload)
  - emit BuildSealedEvent
BuildSealedEvent
- EngDeriver.onBuildSealed (basically a no-op)
  - emit PayloadProcessEvent
PayloadProcessEvent
- EngDeriver.onPayloadProcess
  - engineFork.NewPayload
  - emit PayloadSuccessEvent
PayloadSuccessEvent
- EngDeriver.onPayloadSuccess
  - emit TryUpdateEngineEvent
TryUpdateEngineEvent
- log Inserted block
  - build_time (ForkchoiceUpdate + GetPayload)
  - insert_time (NewPayload + ForkchoiceUpdate)
  - total_time
  - mgasps

Legacy block inserting codepath

clsync.OnEvent
- ReceivedUnsafePayloadEvent
  - clsync.onUnsafePayload
    - emit engine.ForkchoiceRequestEvent
- engDeriver.ForkchoiceRequestEvent
  * emit ForkchoiceUpdateEvent
- ForkchoiceUpdateEvent
  - clsync.onForkchoiceUpdate
  - emit engine.ProcessUnsafePayloadEvent
- ProcessUnsafePayloadEvent
  - engineController.InsertUnsafePayload
    - engine.NewPayload
    - engine.ForkchoiceUpdate

Questions

Why do I not see any of these logs emitted by op-node? Does that event handler only get triggered by a certain sync type (execution-layer versus consensus-layer)?

Answer: For block processing of P2P blocks as verifier, there is still an older payload processing function, that does not use those events. It’s attached to the engine-controller. I am keen to remove that and to make it consistent with the sequencing / block replication codepath. Added the log for the legacy codepath in this commit: 29ee466

Should we include TryUpdateEngineEvent processing time in total_time value?

Answer: included as part of this commit: 61c1334

Testing

Example logs from op-e2e-http ci tests.

sequencer:

"Time":"2024-11-19T21:11:41.043342365Z","Action":"output","Package":"github.com/ethereum-optimism/optimism/op-e2e/system/altda","Test":"TestBatcherConcurrentAltDARequests","Output":"    events.go:383:              INFO [11-19|20:59:03.284] Inserted new L2 unsafe block             role=sequencer hash=576962..6e3cea number=1 state_root=361c9a..6dae76 timestamp=1,732,049,932 parent=0e4738..f8f825   prev_randao=000000..000000 fee_recipient=0x4200000000000000000000000000000000000011 txs=1 build_time=2.479ms insert_time=2.293ms total_time=4.773ms mgas=0.160 mgasps=33.604\n"}

verifier:

{"Time":"2024-11-19T21:11:42.465707551Z","Action":"output","Package":"github.com/ethereum-optimism/optimism/op-e2e/system/p2p","Test":"TestSystemMockP2P","Output":"    engine_controller.go:405:   INFO [11-19|20:59:03.094] Inserted new L2 unsafe block (synchronous) role=verifier hash=7b7a6f..03988c number=2 newpayload_time=1.893ms fcu2_time=\"449.339µs\" total_time=2.343ms mgas=0.052 mgasps=22.273\n"}

Metadata

Closes https://github.com/ethereum-optimism/platforms-team/issues/446

Alternative Design Considered

I considered an alternative implementation where I created a new performance deriver that subscribed to events and assembled block time metrics instead of threading data fields through events, but I did not like that I would have to periodically do garbage collection on incomplete blocks if they were interrupted or intentionally not part of a full block building/inserting flow.

codecov · 2024-11-12T21:21:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 66.77%. Comparing base (72e67e5) to head (d7906bb).

Additional details and impacted files

@@             Coverage Diff             @@
##           develop   #12907      +/-   ##
===========================================
- Coverage    68.78%   66.77%   -2.02%     
===========================================
  Files           56       56              
  Lines         4665     4665              
===========================================
- Hits          3209     3115      -94     
- Misses        1274     1378     +104     
+ Partials       182      172      -10

Flag	Coverage Δ
cannon-go-tests-32	`61.81% <ø> (-2.02%)`	⬇️
cannon-go-tests-64	`56.38% <ø> (-1.66%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 8 files with indirect coverage changes

op-node/rollup/engine/payload_success.go

op-node/rollup/engine/events.go

op-node/rollup/engine/engine_controller.go

axelKingsley

I am interested in @protolambda 's opinion regarding the pattern for extending Derivers. This wires new content through existing derivers and adds logic. For the scope of the changes, it's fine, but larger changes might warrant a whole new deriver to work on the event data without being attached to existing process.

op-node/rollup/engine/events.go

op-node/rollup/engine/payload_process.go

op-node/rollup/engine/payload_success.go

op-node/rollup/sequencing/sequencer_chaos_test.go

protolambda

Looks ok, just a few nits, then good to merge

op-node/rollup/engine/engine_controller.go

op-node/rollup/engine/events.go

…y-zero

bitwiseguy · 2024-11-20T14:14:45Z

@protolambda I'm confused by this log where role=verifier but it has a build_time (I expected to only see build_time for the active sequencer). Does the TestMixedDepositValidity exercise a unique codepath? Or is build_time sometimes expected for a verifier?

{"Time":"2024-11-19T20:29:37.066059307Z","Action":"output","Package":"github.com/ethereum-optimism/optimism/op-e2e/system/bridge","Test":"TestMixedDepositValidity","Output":"    events.go:372:              INFO [11-19|20:29:37.066] Inserted new L2 unsafe block             role=verifier hash=df651b..8901ca number=120 state_root=75f5ad..374cb1 timestamp=1,732,048,173 parent=26bd33..1bbe52 prev_randao=000000..000000 fee_recipient=0x4200000000000000000000000000000000000011 txs=2 build_time=1.268ms  insert_time=1.489ms  total_time=2.759ms  mgas=0.152 mgasps=55.151\n"}

protolambda · 2024-11-22T16:56:34Z

@bitwiseguy the verifier reproduces blocks from batch data in L1, which turns into block-inputs, and goes through the same block-building code-path, using the engine API and everything.

op-node: log mgasps across block processing lifecycle

d7906bb

bitwiseguy requested a review from protolambda November 13, 2024 03:04

bitwiseguy marked this pull request as ready for review November 13, 2024 03:04

bitwiseguy requested review from a team as code owners November 13, 2024 03:04

bitwiseguy changed the title ~~op-node: log mgasps across block processing lifecycle~~ op-node: log mgasps across event based block processing lifecycle Nov 13, 2024

op-node: add 'import_time' field to block processing log

492a1d2

bitwiseguy commented Nov 13, 2024

View reviewed changes

op-node/rollup/engine/payload_success.go Outdated Show resolved Hide resolved

bitwiseguy added 4 commits November 14, 2024 10:53

op-node: make log message more descriptive

72b0654

op-node: log legacy codepath for InsertUnsafePayload

29ee466

op-node: include final ForkchoiceUpdate in block insertion time

61c1334

op-node: remove unused BuildTime field from events

020d4e3

bitwiseguy commented Nov 18, 2024

View reviewed changes

op-node/rollup/engine/events.go Outdated Show resolved Hide resolved

bitwiseguy commented Nov 18, 2024

View reviewed changes

op-node/rollup/engine/engine_controller.go Outdated Show resolved Hide resolved

bitwiseguy changed the title ~~op-node: log mgasps across event based block processing lifecycle~~ op-node: log mgasps across block building/inserting lifecycle Nov 18, 2024

axelKingsley reviewed Nov 18, 2024

View reviewed changes

op-node/rollup/engine/events.go Outdated Show resolved Hide resolved

op-node/rollup/engine/payload_process.go Show resolved Hide resolved

op-node/rollup/engine/payload_success.go Show resolved Hide resolved

op-node/rollup/sequencing/sequencer_chaos_test.go Show resolved Hide resolved

bitwiseguy added 2 commits November 18, 2024 15:29

op-node: encapsulate log logic in TryUpdateEngineEvent methods

ddc086d

op-node: linter fix

4952ff3

protolambda reviewed Nov 19, 2024

View reviewed changes

op-node/rollup/engine/engine_controller.go Outdated Show resolved Hide resolved

op-node/rollup/engine/events.go Show resolved Hide resolved

bitwiseguy added 4 commits November 19, 2024 11:28

op-node: add comment and adjust sync log wording

026614d

op-node: fix BlockHash, BlockNumber in log

82437de

op-node: seq add BuildStarted to PayloadProcessEvent

30669df

op-node: refactor getBlockProcessingMetrics, protect against divide-b…

b6cf313

…y-zero

Merge branch 'develop' into ss/block-building-logs

9de1808

protolambda approved these changes Nov 22, 2024

View reviewed changes

protolambda added this pull request to the merge queue Nov 22, 2024

Merged via the queue into develop with commit 1c36df3 Nov 22, 2024
45 checks passed

protolambda deleted the ss/block-building-logs branch November 22, 2024 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

op-node: log mgasps across block building/inserting lifecycle #12907

op-node: log mgasps across block building/inserting lifecycle #12907

bitwiseguy commented Nov 12, 2024 •

edited

Loading

codecov bot commented Nov 12, 2024 •

edited

Loading

axelKingsley left a comment

protolambda left a comment

bitwiseguy commented Nov 20, 2024

protolambda commented Nov 22, 2024

op-node: log mgasps across block building/inserting lifecycle #12907

op-node: log mgasps across block building/inserting lifecycle #12907

Conversation

bitwiseguy commented Nov 12, 2024 • edited Loading

Overview

Event-based block building+inserting codepath

Legacy block inserting codepath

Questions

Testing

Metadata

Alternative Design Considered

codecov bot commented Nov 12, 2024 • edited Loading

Codecov Report

axelKingsley left a comment

Choose a reason for hiding this comment

protolambda left a comment

Choose a reason for hiding this comment

bitwiseguy commented Nov 20, 2024

protolambda commented Nov 22, 2024

bitwiseguy commented Nov 12, 2024 •

edited

Loading

codecov bot commented Nov 12, 2024 •

edited

Loading