Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duties override bug in VC #5305

Merged
merged 5 commits into from
Mar 4, 2024

Conversation

michaelsproul
Copy link
Member

Issue Addressed

Alternative to:

Proposed Changes

I wasn't fully satisfied with the fix from #5296, as while it works, it's a bit of a bandaid.

This PR attempts to address a bug that I found in how we calculate duties which I think is the root cause. We were previously requesting duties for a single validator to work out whether any updates were required, and then making a request for all validators that we determined needed updating. The problem was we would assume that the duties for the single validator were "new" and relevant, and then use them to overwrite existing duties for that validator, thus removing our knowledge of the subscriptions we'd already sent.

This explains why some users noticed that the expired subscription warnings occurred more often on nodes with validators changing their status from pending to active. These validators would be deemed in need of updating, and would allow the update to go through for the single validator (which is not usually the same validator).

I've also updated the duty update logic to be a bit more defensive about overriding data, logging a warning in cases where we would previously override (which I hope is now unreachable).

Additional Info

This PR could be merged in addition to #5296 if we want to really guarantee (defense in depth) that we don't send bad subscriptions.

@michaelsproul michaelsproul added bug Something isn't working val-client Relates to the validator client binary ready-for-review The code is ready for review labels Feb 26, 2024
// in the initial query. There was previously a bug here where we assumed the duties from the
// initial query were "new" and needed to be inserted into the map, which overrode information
// in the `subscription_slots` and caused expired subscriptions to be sent.
// Seeing as the initial batch is so small, the worst-case bandwidth wastage here is minimal,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't uninitialized_validators.len() == local_pubkeys.len() right after a restart?
So we would end up making 2 requests with indices_to_request.len() == uninitialized_validators.len() in that case.
I think this is okay, but just wanted to confirm that this would be the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you are right! I got a bit carried away with the 1 validator case

Will try to fix it up so we do reuse the info from the first query, but don't attempt any overrides

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed that update. I still want to roll the changes from the other PR into this one to handle the startup case. I noticed a bunch of warnings on restart when I deployed this PR.

@michaelsproul michaelsproul added the v5.1.0 Q2 2024 label Feb 27, 2024
@michaelsproul
Copy link
Member Author

This is ready for review. No more WARN Not enough time for a discovery search logs after running this PR on our Goerli validators. The new warning log that I added also hasn't been hit.

Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Really want to get this in soon so that it's less noisy to debug the InsufficientPeers issues

@michaelsproul
Copy link
Member Author

@Mergifyio queue

Copy link

mergify bot commented Mar 4, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at cff6258

mergify bot added a commit that referenced this pull request Mar 4, 2024
@mergify mergify bot merged commit cff6258 into sigp:unstable Mar 4, 2024
29 checks passed
@michaelsproul michaelsproul deleted the vc-past-subscriptions-alt branch March 5, 2024 01:20
pawanjay176 pushed a commit to pawanjay176/lighthouse that referenced this pull request Mar 5, 2024
* Fix duties override bug in VC

* Use initial request efficiently

* Prevent expired subscriptions by construction

* Clean up selection proof logic

* Add test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready-for-review The code is ready for review v5.1.0 Q2 2024 val-client Relates to the validator client binary
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants