-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase priority for validator HTTP requests #6292
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this is a low risk change. LGTM.
How often does charon query the BN for validators? My concern is if there is a large number of inactive validators, it could overwhelm the BN and making it P0
could make things worse.
We've had performance issue with this endpoint before - and changed our poll frequency to once per epoch. Our VC also avoid polling on the first slot of the epoch (change made in #5628)
I don't think Charon sets the poll frequency because it is just middleware between the VC and the BN. So Lighthouse VC is in control of the polling schedule and should only be making requests once per epoch as of v5.2.0 (when #5628 was included). I've asked Stakely to confirm which version of Lighthouse VC they were running. If they were running v5.1.3 or earlier then it will have been spamming every slot and making the problem worse. I also don't think the API performance was too bad here. We quite regularly hit 2s+ response times for P1 requests on Holesky, and this is usually fine for non-critical requests. The problem is that Charon times out after 2s and spews error logs. Because the validators are not active, Lighthouse VC (through Charon) will keep requesting, and even if some of these requests succeed, the error logs can come back the next time a request times out. So my original description of a "timeout loop" is not quite accurate: charon is likely just timing out repeatedly. Stakely reported that the issue disappeared completely when running with |
@mergify queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at ae83901 |
* Increase priority for validator HTTP requests
* Increase priority for validator HTTP requests
Issue Addressed
Address an issue reported by Stakely during Obol testing (for Lido) whereby
charon
gets stuck in a timeout loop trying to request the details of inactive validators from Lighthouse.The error from Charon is:
Proposed Changes
This PR increases the priority for validator info requests from P1 to P0, until we can do the hard work of overhauling the priority system:
This is a low-risk change, but will require a new LH release before it can be used in production by Obol users. I will try to get Stakely to run it on testnets.
Additional Info
There is a workaround that achieves a similar effect:
--http-enable-beacon-processor false
. However this comes with the downside of opening Lighthouse's HTTP API up to accidental DoS. With no limit to the number of concurrent requests, it is easy to overwhelm Lighthouse with requests and cause OOMs and other slowness.