Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise concurrent block production #5368

Merged

Conversation

michaelsproul
Copy link
Member

Issue Addressed

Closes #5365

Proposed Changes

Speed up concurrent block production by being stricter about when locks are dropped. We were getting screwed by Rust's auto-drop logic which was not releasing locks in a timely way when they were obtained in the branches of an if.

Prior to making this change you can see how staggered the block responses end up (running blockdreamer with 4x copies of the same node):

Mar 07 06:08:00.111 DEBG Processed HTTP API request, method: GET, path: /eth/v3/validator/blocks/7773640, status: 200 OK, elapsed: 111.176792ms, module: http_api:205
Mar 07 06:08:00.414 DEBG Processed HTTP API request, method: GET, path: /eth/v3/validator/blocks/7773640, status: 200 OK, elapsed: 413.336008ms, module: http_api:205
Mar 07 06:08:00.715 DEBG Processed HTTP API request, method: GET, path: /eth/v3/validator/blocks/7773640, status: 200 OK, elapsed: 714.450081ms, module: http_api:205
Mar 07 06:08:01.027 DEBG Processed HTTP API request, method: GET, path: /eth/v3/validator/blocks/7773640, status: 200 OK, elapsed: 1.026413336s, module: http_api:205

Whereas after making this change (mainly the block_production_state locking change), they execute properly in parallel:

Mar 07 06:23:48.126 DEBG Processed HTTP API request, method: GET, path: /eth/v3/validator/blocks/7773719, status: 200 OK, elapsed: 124.879958ms, module: http_api:205
Mar 07 06:23:48.437 DEBG Processed HTTP API request, method: GET, path: /eth/v3/validator/blocks/7773719, status: 200 OK, elapsed: 436.259608ms, module: http_api:205
Mar 07 06:23:48.449 DEBG Processed HTTP API request, method: GET, path: /eth/v3/validator/blocks/7773719, status: 200 OK, elapsed: 447.748269ms, module: http_api:205
Mar 07 06:23:48.461 DEBG Processed HTTP API request, method: GET, path: /eth/v3/validator/blocks/7773719, status: 200 OK, elapsed: 459.995933ms, module: http_api:205

The first request is 300ms faster than the subsequent 3 because it is able to use the single-use block_production_state cache.

@michaelsproul michaelsproul added ready-for-review The code is ready for review optimization Something to make Lighthouse run more efficiently. HTTP-API labels Mar 7, 2024
@michaelsproul michaelsproul added the dvt Distributed validator technology e.g. SSV, Obol label Mar 7, 2024
@michaelsproul
Copy link
Member Author

michaelsproul commented Mar 7, 2024

Not fully satisfied with the auto-drop explanation. In isolation Rust seems to be smarter than this: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a7689777c1941e424d91b1d1e4728c12

@michaelsproul michaelsproul added the v5.1.0 Q2 2024 label Mar 8, 2024
@michaelsproul
Copy link
Member Author

I found out some interesting things about the drop ordering. If the lock guard is borrowed mutably then the compiler will drop it "late". In my first example above we don't see the late drop behaviour because the borrow is immutable.

Copying the structure of Lighthouse's code more closely reproduces the issue:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3bcdbd11c7aa3e8ea5e9ce750e12e723

This prints:

case 2: World
releasing the lock lock2
releasing the lock lock1

Modifying the first playground to do a mutable borrow only for the first lock, forces that lock to be dropped late, while the other lock is dropped early:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a7689777c1941e424d91b1d1e4728c12

releasing the lock lock2
case 2: World
releasing the lock lock1

I think this could be a good Clippy lint!

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, interesting find! Looks like a low risk change to me.

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Mar 8, 2024
@jimmygchen
Copy link
Member

@mergify queue

Copy link

mergify bot commented Mar 8, 2024

queue

🛑 The pull request has been removed from the queue default

The queue conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Mar 8, 2024
mergify bot added a commit that referenced this pull request Mar 8, 2024
@paulhauner
Copy link
Member

@mergify requeue

Copy link

mergify bot commented Mar 8, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Mar 8, 2024

queue

🛑 The pull request has been removed from the queue default

Pull request #5368 has been dequeued by a dequeue command.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Mar 8, 2024
@jimmygchen
Copy link
Member

@mergify dequeue

Copy link

mergify bot commented Mar 8, 2024

dequeue

✅ The pull request has been removed from the queue default

@jimmygchen
Copy link
Member

@mergify requeue

Copy link

mergify bot commented Mar 8, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Mar 8, 2024

queue

🛑 The pull request has been removed from the queue default

Pull request #5368 has been dequeued by a dequeue command.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Mar 8, 2024
@paulhauner
Copy link
Member

@mergify dequeue

Copy link

mergify bot commented Mar 8, 2024

dequeue

✅ The pull request has been removed from the queue default

@paulhauner
Copy link
Member

@mergify requeue

Copy link

mergify bot commented Mar 8, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Mar 8, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at f93844e

mergify bot added a commit that referenced this pull request Mar 8, 2024
@mergify mergify bot merged commit f93844e into sigp:unstable Mar 8, 2024
29 checks passed
@paulhauner paulhauner mentioned this pull request Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dvt Distributed validator technology e.g. SSV, Obol HTTP-API optimization Something to make Lighthouse run more efficiently. ready-for-merge This PR is ready to merge. v5.1.0 Q2 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants