-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Improve debugging experience for builder proposals #3725
Conversation
This might be a good one for @realbigsean, since he's familiar with this code? I'm keen to hear your thoughts on the extra checks for builder-only proposals 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great on the whole, will be very useful for trying out some new relays ;)
"relay_block_hash" => ?header.block_hash(), | ||
"local_block_hash" => %local.block_hash(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the long-form for the relay block hash here intentional, as it's most likely to be selected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I actually intended to make both of these long-form. I figured that long-form would make it easier for us to search third-party services. I find it a little frustrating when we have short-form and then need to cross-reference with something like beaconcha.in.
I've gone and changed these in 26118a4. I ended up changing all roots/hashes in this file to be Display
, hopefully this is not perceived as over-zealous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm onboard with the long hashes! Agree 100% about the annoyances of short hashes.
I think you got the impls switched in 26118a4 though. The Debug
implementation is the long-form one
Related features that might be cool to add to this PR (or a follow-up):
|
Co-authored-by: Michael Sproul <micsproul@gmail.com>
I added a bunch of new metrics in 339698b and updated the description to describe them. Good idea. EDIT: a fixed a bug in those metrics in 55e0b9e. EDIT 2: In 54390be I swapped to tracking the bid in gwei, since we'd be stuck at ~9 ETH when trying to fit wei inside an i64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Do we have this running on any test infra yet?
Co-authored-by: Michael Sproul <micsproul@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved!
I fixed a sneaky bug here: c6256e1 |
Alrighty, I'm running this on some Prater nodes to good result! Thanks for the review bors r+ |
## Issue Addressed NA ## Proposed Changes This PR sets out to improve the logging/metrics experience when interacting with the builder. Namely, it: - Adds/changes metrics (see "Metrics Changes" section). - Adds new logs which show the duration of requests to the builder/local EL. - Refactors existing logs for consistency and so that the `parent_hash` is include in all relevant logs (we can grep for this field when trying to trace the flow of block production). Additionally, when I was implementing this PR I noticed that we skip some verification of the builder payload in the scenario where the builder return `Ok` but the local EL returns with `Err`. Namely, we were skipping the bid signature and other values like parent hash and prev randao. In this PR I've changed it so we *always* check these values and reject the bid if they're incorrect. With these changes, we'll sometimes choose to skip a proposal rather than propose something invalid -- that's the only side-effect to the changes that I can see. ## Metrics Changes - Changed: `execution_layer_request_times`: - `method = "get_blinded_payload_local"`: time taken to get a payload from a local EE. - `method = "get_blinded_payload_builder"`: time taken to get a blinded payload from a builder. - `method = "post_blinded_payload_builder"`: time taken to get a builder to reveal a payload they've previously supplied us. - `execution_layer_get_payload_outcome` - `outcome = "success"`: we successfully produced a payload from a builder or local EE. - `outcome = "failure"`: we were unable to get a payload from a builder or local EE. - New: `execution_layer_builder_reveal_payload_outcome` - `outcome = "success"`: a builder revealed a payload from a signed, blinded block. - `outcome = "failure"`: the builder did not reveal the payload. - New: `execution_layer_get_payload_source` - `type = "builder"`: we used a payload from a builder to produce a block. - `type = "local"`: we used a payload from a local EE to produce a block. - New: `execution_layer_get_payload_builder_rejections` has a `reason` field to describe why we rejected a payload from a builder. - New: `execution_layer_payload_bids` tracks the bid (in gwei) from the builder or local EE (local EE not yet supported, waiting on EEs to expose the value). Can only record values that fit inside an i64 (roughly 9 million ETH). ## Additional Info NA
@michaelsproul I just got woken up in the middle of the night because a block was not revealed by a builder but was actually broadcast 😅 Perhaps you're right about dropping that log level. |
Let's drop it to |
## Issue Addressed NA ## Proposed Changes In #3725 I introduced a `CRIT` log for unrevealed payloads, against @michaelsproul's [advice](#3725 (comment)). After being woken up in the middle of the night by a block that was not revealed to the BN but *was* revealed to the network, I have capitulated. This PR implements @michaelsproul's suggestion and reduces the severity to `ERRO`. Additionally, I have dropped a `CRIT` to an `ERRO` for when a block is published late. The block in question was indeed published late on the network, however now that we have builders that can slow down block production I don't think the error is "actionable" enough to warrant a `CRIT` for the user. ## Additional Info NA
## Issue Addressed NA ## Proposed Changes This PR sets out to improve the logging/metrics experience when interacting with the builder. Namely, it: - Adds/changes metrics (see "Metrics Changes" section). - Adds new logs which show the duration of requests to the builder/local EL. - Refactors existing logs for consistency and so that the `parent_hash` is include in all relevant logs (we can grep for this field when trying to trace the flow of block production). Additionally, when I was implementing this PR I noticed that we skip some verification of the builder payload in the scenario where the builder return `Ok` but the local EL returns with `Err`. Namely, we were skipping the bid signature and other values like parent hash and prev randao. In this PR I've changed it so we *always* check these values and reject the bid if they're incorrect. With these changes, we'll sometimes choose to skip a proposal rather than propose something invalid -- that's the only side-effect to the changes that I can see. ## Metrics Changes - Changed: `execution_layer_request_times`: - `method = "get_blinded_payload_local"`: time taken to get a payload from a local EE. - `method = "get_blinded_payload_builder"`: time taken to get a blinded payload from a builder. - `method = "post_blinded_payload_builder"`: time taken to get a builder to reveal a payload they've previously supplied us. - `execution_layer_get_payload_outcome` - `outcome = "success"`: we successfully produced a payload from a builder or local EE. - `outcome = "failure"`: we were unable to get a payload from a builder or local EE. - New: `execution_layer_builder_reveal_payload_outcome` - `outcome = "success"`: a builder revealed a payload from a signed, blinded block. - `outcome = "failure"`: the builder did not reveal the payload. - New: `execution_layer_get_payload_source` - `type = "builder"`: we used a payload from a builder to produce a block. - `type = "local"`: we used a payload from a local EE to produce a block. - New: `execution_layer_get_payload_builder_rejections` has a `reason` field to describe why we rejected a payload from a builder. - New: `execution_layer_payload_bids` tracks the bid (in gwei) from the builder or local EE (local EE not yet supported, waiting on EEs to expose the value). Can only record values that fit inside an i64 (roughly 9 million ETH). ## Additional Info NA
## Issue Addressed NA ## Proposed Changes In sigp#3725 I introduced a `CRIT` log for unrevealed payloads, against @michaelsproul's [advice](sigp#3725 (comment)). After being woken up in the middle of the night by a block that was not revealed to the BN but *was* revealed to the network, I have capitulated. This PR implements @michaelsproul's suggestion and reduces the severity to `ERRO`. Additionally, I have dropped a `CRIT` to an `ERRO` for when a block is published late. The block in question was indeed published late on the network, however now that we have builders that can slow down block production I don't think the error is "actionable" enough to warrant a `CRIT` for the user. ## Additional Info NA
## Issue Addressed NA ## Proposed Changes This PR sets out to improve the logging/metrics experience when interacting with the builder. Namely, it: - Adds/changes metrics (see "Metrics Changes" section). - Adds new logs which show the duration of requests to the builder/local EL. - Refactors existing logs for consistency and so that the `parent_hash` is include in all relevant logs (we can grep for this field when trying to trace the flow of block production). Additionally, when I was implementing this PR I noticed that we skip some verification of the builder payload in the scenario where the builder return `Ok` but the local EL returns with `Err`. Namely, we were skipping the bid signature and other values like parent hash and prev randao. In this PR I've changed it so we *always* check these values and reject the bid if they're incorrect. With these changes, we'll sometimes choose to skip a proposal rather than propose something invalid -- that's the only side-effect to the changes that I can see. ## Metrics Changes - Changed: `execution_layer_request_times`: - `method = "get_blinded_payload_local"`: time taken to get a payload from a local EE. - `method = "get_blinded_payload_builder"`: time taken to get a blinded payload from a builder. - `method = "post_blinded_payload_builder"`: time taken to get a builder to reveal a payload they've previously supplied us. - `execution_layer_get_payload_outcome` - `outcome = "success"`: we successfully produced a payload from a builder or local EE. - `outcome = "failure"`: we were unable to get a payload from a builder or local EE. - New: `execution_layer_builder_reveal_payload_outcome` - `outcome = "success"`: a builder revealed a payload from a signed, blinded block. - `outcome = "failure"`: the builder did not reveal the payload. - New: `execution_layer_get_payload_source` - `type = "builder"`: we used a payload from a builder to produce a block. - `type = "local"`: we used a payload from a local EE to produce a block. - New: `execution_layer_get_payload_builder_rejections` has a `reason` field to describe why we rejected a payload from a builder. - New: `execution_layer_payload_bids` tracks the bid (in gwei) from the builder or local EE (local EE not yet supported, waiting on EEs to expose the value). Can only record values that fit inside an i64 (roughly 9 million ETH). ## Additional Info NA
Squashed commit of the following: commit ad08d07 Author: Paul Hauner <paul@paulhauner.com> Date: Mon Dec 5 16:51:06 2022 +1100 Remove crits for late block commit 8e85d62 Author: Paul Hauner <paul@paulhauner.com> Date: Mon Dec 5 16:48:43 2022 +1100 Downgrade log for payload reveal failure commit 84392d6 Author: Michael Sproul <michael@sigmaprime.io> Date: Fri Dec 2 00:07:43 2022 +0000 Delete DB schema migrations for v11 and earlier (sigp#3761) ## Proposed Changes Now that the Gnosis merge is scheduled, all users should have upgraded beyond Lighthouse v3.0.0. Accordingly we can delete schema migrations for versions prior to v3.0.0. ## Additional Info I also deleted the state cache stuff I added in sigp#3714 as it turned out to be useless for the light client proofs due to the one-slot offset. commit 18c9be5 Author: Mac L <mjladson@pm.me> Date: Thu Dec 1 06:03:53 2022 +0000 Add API endpoint to count statuses of all validators (sigp#3756) ## Issue Addressed sigp#3724 ## Proposed Changes Adds an endpoint to quickly count the number of occurances of each status in the validator set. ## Usage ```bash curl -X GET "http://localhost:5052/lighthouse/ui/validator_count" -H "accept: application/json" | jq ``` ```json { "data": { "active_ongoing":479508, "active_exiting":0, "active_slashed":0, "pending_initialized":28, "pending_queued":0, "withdrawal_possible":933, "withdrawal_done":0, "exited_unslashed":0, "exited_slashed":3 } } ``` commit 2211504 Author: Michael Sproul <michael@sigmaprime.io> Date: Wed Nov 30 05:22:58 2022 +0000 Prioritise important parts of block processing (sigp#3696) ## Issue Addressed Closes sigp#2327 ## Proposed Changes This is an extension of some ideas I implemented while working on `tree-states`: - Cache the indexed attestations from blocks in the `ConsensusContext`. Previously we were re-computing them 3-4 times over. - Clean up `import_block` by splitting each part into `import_block_XXX`. - Move some stuff off hot paths, specifically: - Relocate non-essential tasks that were running between receiving the payload verification status and priming the early attester cache. These tasks are moved after the cache priming: - Attestation observation - Validator monitor updates - Slasher updates - Updating the shuffling cache - Fork choice attestation observation now happens at the end of block verification in parallel with payload verification (this seems to save 5-10ms). - Payload verification now happens _before_ advancing the pre-state and writing it to disk! States were previously being written eagerly and adding ~20-30ms in front of verifying the execution payload. State catchup also sometimes takes ~500ms if we get a cache miss and need to rebuild the tree hash cache. The remaining task that's taking substantial time (~20ms) is importing the block to fork choice. I _think_ this is because of pull-tips, and we should be able to optimise it out with a clever total active balance cache in the state (which would be computed in parallel with payload verification). I've decided to leave that for future work though. For now it can be observed via the new `beacon_block_processing_post_exec_pre_attestable_seconds` metric. Co-authored-by: Michael Sproul <micsproul@gmail.com> commit b4f4c0d Author: Divma <divma@protonmail.com> Date: Wed Nov 30 03:21:35 2022 +0000 Ipv6 bootnodes (sigp#3752) ## Issue Addressed our bootnodes as of now support only ipv4. this makes it so that they support ipv6 ## Proposed Changes - Adds code necessary to update the bootnodes to run on dual stack nodes and therefore contact and store ipv6 nodes. - Adds some metrics about connectivity type of stored peers. It might have been nice to see some metrics over the sessions but that feels out of scope right now. ## Additional Info - some code quality improvements sneaked in since the changes seemed small - I think it depends on the OS, but enabling mapped addresses on an ipv6 node without dual stack support enabled could fail silently, making these nodes effectively ipv6 only. In the future I'll probably change this to use two sockets, which should fail loudly commit 3534c85 Author: GeemoCandama <geemo@tutanota.com> Date: Tue Nov 29 08:19:27 2022 +0000 Optimize finalized chain sync by skipping newPayload messages (sigp#3738) ## Issue Addressed sigp#3704 ## Proposed Changes Adds is_syncing_finalized: bool parameter for block verification functions. Sets the payload_verification_status to Optimistic if is_syncing_finalized is true. Uses SyncState in NetworkGlobals in BeaconProcessor to retrieve the syncing status. ## Additional Info I could implement FinalizedSignatureVerifiedBlock if you think it would be nicer. commit a2969ba Author: Paul Hauner <paul@paulhauner.com> Date: Tue Nov 29 05:51:42 2022 +0000 Improve debugging experience for builder proposals (sigp#3725) ## Issue Addressed NA ## Proposed Changes This PR sets out to improve the logging/metrics experience when interacting with the builder. Namely, it: - Adds/changes metrics (see "Metrics Changes" section). - Adds new logs which show the duration of requests to the builder/local EL. - Refactors existing logs for consistency and so that the `parent_hash` is include in all relevant logs (we can grep for this field when trying to trace the flow of block production). Additionally, when I was implementing this PR I noticed that we skip some verification of the builder payload in the scenario where the builder return `Ok` but the local EL returns with `Err`. Namely, we were skipping the bid signature and other values like parent hash and prev randao. In this PR I've changed it so we *always* check these values and reject the bid if they're incorrect. With these changes, we'll sometimes choose to skip a proposal rather than propose something invalid -- that's the only side-effect to the changes that I can see. ## Metrics Changes - Changed: `execution_layer_request_times`: - `method = "get_blinded_payload_local"`: time taken to get a payload from a local EE. - `method = "get_blinded_payload_builder"`: time taken to get a blinded payload from a builder. - `method = "post_blinded_payload_builder"`: time taken to get a builder to reveal a payload they've previously supplied us. - `execution_layer_get_payload_outcome` - `outcome = "success"`: we successfully produced a payload from a builder or local EE. - `outcome = "failure"`: we were unable to get a payload from a builder or local EE. - New: `execution_layer_builder_reveal_payload_outcome` - `outcome = "success"`: a builder revealed a payload from a signed, blinded block. - `outcome = "failure"`: the builder did not reveal the payload. - New: `execution_layer_get_payload_source` - `type = "builder"`: we used a payload from a builder to produce a block. - `type = "local"`: we used a payload from a local EE to produce a block. - New: `execution_layer_get_payload_builder_rejections` has a `reason` field to describe why we rejected a payload from a builder. - New: `execution_layer_payload_bids` tracks the bid (in gwei) from the builder or local EE (local EE not yet supported, waiting on EEs to expose the value). Can only record values that fit inside an i64 (roughly 9 million ETH). ## Additional Info NA commit 99ec9d9 Author: kevinbogner <kevbogner@gmail.com> Date: Mon Nov 28 10:05:43 2022 +0000 Add Run a Node guide (sigp#3681) ## Issue Addressed Related to sigp#3672 ## Proposed Changes - Added a guide to run a node. Mainly, copy and paste from 'Merge Migration' and 'Checkpoint Sync'. - Ranked it high in ToC: - Introduction - Installation - Run a Node - Become a Validator ... - Hid 'Merge Migration' in ToC. ## Additional Info - Should I add/rephrase/delete something? - Now there is some redundancy: - 'Run a node' and 'Checkpoint Sync' contain similar information. - Same for 'Run a node' and 'Become a Validator'. Co-authored-by: kevinbogner <114221396+kevinbogner@users.noreply.github.com> Co-authored-by: Michael Sproul <micsproul@gmail.com> commit 2779017 Author: Age Manning <Age@AgeManning.com> Date: Mon Nov 28 07:36:52 2022 +0000 Gossipsub fast message id change (sigp#3755) For improved consistency, this mixes in the topic into our fast message id for more consistent tracking of messages across topics. commit c881b80 Author: Mac L <mjladson@pm.me> Date: Mon Nov 28 00:22:53 2022 +0000 Add CLI flag for gui requirements (sigp#3731) ## Issue Addressed sigp#3723 ## Proposed Changes Adds a new CLI flag `--gui` which enables all the various flags required for the gui to function properly. Currently enables the `--http` and `--validator-monitor-auto` flags. commit 969ff24 Author: Mac L <mjladson@pm.me> Date: Fri Nov 25 07:57:11 2022 +0000 Add CLI flag to opt in to world-readable log files (sigp#3747) ## Issue Addressed sigp#3732 ## Proposed Changes Add a CLI flag to allow users to opt out of the restrictive permissions of the log files. ## Additional Info This is not recommended for most users. The log files can contain sensitive information such as validator indices, public keys and API tokens (see sigp#2438). However some users using a multi-user setup may find this helpful if they understand the risks involved. commit e9bf7f7 Author: antondlr <anton@delaruelle.net> Date: Fri Nov 25 07:57:10 2022 +0000 remove commas from comma-separated kv pairs (sigp#3737) ## Issue Addressed Logs are in comma separated kv list, but the values sometimes contain commas, which breaks parsing commit d5a2de7 Author: Giulio rebuffo <giulio.rebuffo@gmail.com> Date: Fri Nov 25 05:19:00 2022 +0000 Added LightClientBootstrap V1 (sigp#3711) ## Issue Addressed Partially addresses sigp#3651 ## Proposed Changes Adds server-side support for light_client_bootstrap_v1 topic ## Additional Info This PR, creates each time a bootstrap without using cache, I do not know how necessary a cache is in this case as this topic is not supposed to be called frequently and IMHO we can just prevent abuse by using the limiter, but let me know what you think or if there is any caveat to this, or if it is necessary only for the sake of good practice. Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
NA This PR sets out to improve the logging/metrics experience when interacting with the builder. Namely, it: - Adds/changes metrics (see "Metrics Changes" section). - Adds new logs which show the duration of requests to the builder/local EL. - Refactors existing logs for consistency and so that the `parent_hash` is include in all relevant logs (we can grep for this field when trying to trace the flow of block production). Additionally, when I was implementing this PR I noticed that we skip some verification of the builder payload in the scenario where the builder return `Ok` but the local EL returns with `Err`. Namely, we were skipping the bid signature and other values like parent hash and prev randao. In this PR I've changed it so we *always* check these values and reject the bid if they're incorrect. With these changes, we'll sometimes choose to skip a proposal rather than propose something invalid -- that's the only side-effect to the changes that I can see. - Changed: `execution_layer_request_times`: - `method = "get_blinded_payload_local"`: time taken to get a payload from a local EE. - `method = "get_blinded_payload_builder"`: time taken to get a blinded payload from a builder. - `method = "post_blinded_payload_builder"`: time taken to get a builder to reveal a payload they've previously supplied us. - `execution_layer_get_payload_outcome` - `outcome = "success"`: we successfully produced a payload from a builder or local EE. - `outcome = "failure"`: we were unable to get a payload from a builder or local EE. - New: `execution_layer_builder_reveal_payload_outcome` - `outcome = "success"`: a builder revealed a payload from a signed, blinded block. - `outcome = "failure"`: the builder did not reveal the payload. - New: `execution_layer_get_payload_source` - `type = "builder"`: we used a payload from a builder to produce a block. - `type = "local"`: we used a payload from a local EE to produce a block. - New: `execution_layer_get_payload_builder_rejections` has a `reason` field to describe why we rejected a payload from a builder. - New: `execution_layer_payload_bids` tracks the bid (in gwei) from the builder or local EE (local EE not yet supported, waiting on EEs to expose the value). Can only record values that fit inside an i64 (roughly 9 million ETH). NA
NA In sigp#3725 I introduced a `CRIT` log for unrevealed payloads, against @michaelsproul's [advice](sigp#3725 (comment)). After being woken up in the middle of the night by a block that was not revealed to the BN but *was* revealed to the network, I have capitulated. This PR implements @michaelsproul's suggestion and reduces the severity to `ERRO`. Additionally, I have dropped a `CRIT` to an `ERRO` for when a block is published late. The block in question was indeed published late on the network, however now that we have builders that can slow down block production I don't think the error is "actionable" enough to warrant a `CRIT` for the user. NA
Issue Addressed
NA
Proposed Changes
This PR sets out to improve the logging/metrics experience when interacting with the builder. Namely, it:
parent_hash
is include in all relevant logs (we can grep for this field when trying to trace the flow of block production).Additionally, when I was implementing this PR I noticed that we skip some verification of the builder payload in the scenario where the builder return
Ok
but the local EL returns withErr
. Namely, we were skipping the bid signature and other values like parent hash and prev randao. In this PR I've changed it so we always check these values and reject the bid if they're incorrect. With these changes, we'll sometimes choose to skip a proposal rather than propose something invalid -- that's the only side-effect to the changes that I can see.Metrics Changes
execution_layer_request_times
:method = "get_blinded_payload_local"
: time taken to get a payload from a local EE.method = "get_blinded_payload_builder"
: time taken to get a blinded payload from a builder.method = "post_blinded_payload_builder"
: time taken to get a builder to reveal a payload they've previously supplied us.execution_layer_get_payload_outcome
outcome = "success"
: we successfully produced a payload from a builder or local EE.outcome = "failure"
: we were unable to get a payload from a builder or local EE.execution_layer_builder_reveal_payload_outcome
outcome = "success"
: a builder revealed a payload from a signed, blinded block.outcome = "failure"
: the builder did not reveal the payload.execution_layer_get_payload_source
type = "builder"
: we used a payload from a builder to produce a block.type = "local"
: we used a payload from a local EE to produce a block.execution_layer_get_payload_builder_rejections
has areason
field to describe why we rejected a payload from a builder.execution_layer_payload_bids
tracks the bid (in gwei) from the builder or local EE (local EE not yet supported, waiting on EEs to expose the value). Can only record values that fit inside an i64 (roughly 9 million ETH).Additional Info
NA