Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op-node: log mgasps across block building/inserting lifecycle #12907

Merged
merged 13 commits into from
Nov 22, 2024

Conversation

bitwiseguy
Copy link
Contributor

@bitwiseguy bitwiseguy commented Nov 12, 2024

Overview

In order to provide more visibility into the block processing performance of op-node + its execution client, this pr attempts to add logs to surface data for plotting mgasps (Mgas per second) for every block. I figured modifying op-node instead of op-geth would give us more consistent measurements when we measure other execution clients.

This is similar to the log written by op-geth here, however this pr attempts to provide a more holistic view of mgasps by logging duration of the entire block processing lifecycle instead of just the engineApi.NewPayload portion of block processing. The entire lifecycle consists of the following sequential engineApi calls:

  1. engineApi.ForkchoiceUpdate(attrs)
  2. engineApi.GetPayload
  3. engineApi.NewPayload
  4. engineApi.ForkchoiceUpdate

Event-based block building+inserting codepath

  • BuildStartEvent
    • EngDeriver.onBuildStart
      • engine.startPayload
        • engine.ForkchoiceUpdate(attrs)
      • emit ForkchoiceUpdateEvent
      • emit BuildStartedEvent
  • BuildStartedEvent
    • EngDeriver.onBuildStarted (basically a no-op)
      • emit BuildSealEvent
  • BuildSealEvent
    • EngDeriver.onBuildSeal
      • engine.GetPayload
      • log Built new l2 block
        • seal_time (covers GetPayload)
        • build_time (covers ForkchoiceUpdate + GetPayload)
      • emit BuildSealedEvent
  • BuildSealedEvent
    • EngDeriver.onBuildSealed (basically a no-op)
      • emit PayloadProcessEvent
  • PayloadProcessEvent
    • EngDeriver.onPayloadProcess
      • engineFork.NewPayload
      • emit PayloadSuccessEvent
  • PayloadSuccessEvent
    • EngDeriver.onPayloadSuccess
      • emit TryUpdateEngineEvent
  • TryUpdateEngineEvent
    • log Inserted block
      • build_time (ForkchoiceUpdate + GetPayload)
      • insert_time (NewPayload + ForkchoiceUpdate)
      • total_time
      • mgasps

Legacy block inserting codepath

  • clsync.OnEvent
    • ReceivedUnsafePayloadEvent
      • clsync.onUnsafePayload
        • emit engine.ForkchoiceRequestEvent
    • engDeriver.ForkchoiceRequestEvent
      * emit ForkchoiceUpdateEvent
    • ForkchoiceUpdateEvent
      • clsync.onForkchoiceUpdate
      • emit engine.ProcessUnsafePayloadEvent
    • ProcessUnsafePayloadEvent
      • engineController.InsertUnsafePayload
        • engine.NewPayload
        • engine.ForkchoiceUpdate

Questions

  1. Why do I not see any of these logs emitted by op-node? Does that event handler only get triggered by a certain sync type (execution-layer versus consensus-layer)?

Answer: For block processing of P2P blocks as verifier, there is still an older payload processing function, that does not use those events. It’s attached to the engine-controller. I am keen to remove that and to make it consistent with the sequencing / block replication codepath. Added the log for the legacy codepath in this commit: 29ee466

  1. Should we include TryUpdateEngineEvent processing time in total_time value?

Answer: included as part of this commit: 61c1334

Testing

Example logs from op-e2e-http ci tests.

sequencer:

"Time":"2024-11-19T21:11:41.043342365Z","Action":"output","Package":"github.com/ethereum-optimism/optimism/op-e2e/system/altda","Test":"TestBatcherConcurrentAltDARequests","Output":"    events.go:383:              INFO [11-19|20:59:03.284] Inserted new L2 unsafe block             role=sequencer hash=576962..6e3cea number=1 state_root=361c9a..6dae76 timestamp=1,732,049,932 parent=0e4738..f8f825   prev_randao=000000..000000 fee_recipient=0x4200000000000000000000000000000000000011 txs=1 build_time=2.479ms insert_time=2.293ms total_time=4.773ms mgas=0.160 mgasps=33.604\n"}

verifier:

{"Time":"2024-11-19T21:11:42.465707551Z","Action":"output","Package":"github.com/ethereum-optimism/optimism/op-e2e/system/p2p","Test":"TestSystemMockP2P","Output":"    engine_controller.go:405:   INFO [11-19|20:59:03.094] Inserted new L2 unsafe block (synchronous) role=verifier hash=7b7a6f..03988c number=2 newpayload_time=1.893ms fcu2_time=\"449.339µs\" total_time=2.343ms mgas=0.052 mgasps=22.273\n"}

Metadata

Closes https://github.com/ethereum-optimism/platforms-team/issues/446

Alternative Design Considered

I considered an alternative implementation where I created a new performance deriver that subscribed to events and assembled block time metrics instead of threading data fields through events, but I did not like that I would have to periodically do garbage collection on incomplete blocks if they were interrupted or intentionally not part of a full block building/inserting flow.

Copy link

codecov bot commented Nov 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 66.77%. Comparing base (72e67e5) to head (d7906bb).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #12907      +/-   ##
===========================================
- Coverage    68.78%   66.77%   -2.02%     
===========================================
  Files           56       56              
  Lines         4665     4665              
===========================================
- Hits          3209     3115      -94     
- Misses        1274     1378     +104     
+ Partials       182      172      -10     
Flag Coverage Δ
cannon-go-tests-32 61.81% <ø> (-2.02%) ⬇️
cannon-go-tests-64 56.38% <ø> (-1.66%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 8 files with indirect coverage changes

@bitwiseguy bitwiseguy marked this pull request as ready for review November 13, 2024 03:04
@bitwiseguy bitwiseguy requested review from a team as code owners November 13, 2024 03:04
@bitwiseguy bitwiseguy changed the title op-node: log mgasps across block processing lifecycle op-node: log mgasps across event based block processing lifecycle Nov 13, 2024
@bitwiseguy bitwiseguy changed the title op-node: log mgasps across event based block processing lifecycle op-node: log mgasps across block building/inserting lifecycle Nov 18, 2024
Copy link
Contributor

@axelKingsley axelKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested in @protolambda 's opinion regarding the pattern for extending Derivers. This wires new content through existing derivers and adds logic. For the scope of the changes, it's fine, but larger changes might warrant a whole new deriver to work on the event data without being attached to existing process.

op-node/rollup/engine/events.go Outdated Show resolved Hide resolved
op-node/rollup/engine/payload_process.go Show resolved Hide resolved
op-node/rollup/engine/payload_success.go Show resolved Hide resolved
Copy link
Contributor

@protolambda protolambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok, just a few nits, then good to merge

op-node/rollup/engine/engine_controller.go Outdated Show resolved Hide resolved
op-node/rollup/engine/events.go Show resolved Hide resolved
@bitwiseguy
Copy link
Contributor Author

@protolambda I'm confused by this log where role=verifier but it has a build_time (I expected to only see build_time for the active sequencer). Does the TestMixedDepositValidity exercise a unique codepath? Or is build_time sometimes expected for a verifier?

{"Time":"2024-11-19T20:29:37.066059307Z","Action":"output","Package":"github.com/ethereum-optimism/optimism/op-e2e/system/bridge","Test":"TestMixedDepositValidity","Output":"    events.go:372:              INFO [11-19|20:29:37.066] Inserted new L2 unsafe block             role=verifier hash=df651b..8901ca number=120 state_root=75f5ad..374cb1 timestamp=1,732,048,173 parent=26bd33..1bbe52 prev_randao=000000..000000 fee_recipient=0x4200000000000000000000000000000000000011 txs=2 build_time=1.268ms  insert_time=1.489ms  total_time=2.759ms  mgas=0.152 mgasps=55.151\n"}

@protolambda
Copy link
Contributor

@bitwiseguy the verifier reproduces blocks from batch data in L1, which turns into block-inputs, and goes through the same block-building code-path, using the engine API and everything.

@protolambda protolambda added this pull request to the merge queue Nov 22, 2024
Merged via the queue into develop with commit 1c36df3 Nov 22, 2024
45 checks passed
@protolambda protolambda deleted the ss/block-building-logs branch November 22, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants