-
Notifications
You must be signed in to change notification settings - Fork 20.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
health: expand health check functionality with health
package
#29056
base: master
Are you sure you want to change the base?
health: expand health check functionality with health
package
#29056
Conversation
@rjl493456442 failing check related to this PR rectified. The remaining failure on the |
The functions mentioned can be implemented by RPC method, such as |
Hey @learnerLj, thanks for taking the time to review! Could you elaborate a little bit on what you'd like to see? The problem with the RPC is health checks generally do not allow people running load balancers to use POST requests for checks and are limited in scope. It would of course be possible to have a separate application running to proxy GET requests and convert them to POST requests to the RPC but for those running many nodes in load balancers an integrated solution would be useful in our experience. |
Could you please introduce the structure of banlancers and geth? From my perspective, a new package for a specific business can bring unnessesary complexity to geth. Thus, I recomends that you can fork it and write your application in |
Ah ok that makes sense and I agree with your recommendations, I will make some updates! Forgive me if I misunderstand your initial question. A balancer is used when you have a pool (or multiple pools) of instances running behind a single url. These are used to distribute traffic across multiple servers to spread the workload and/or link users to their geographically closest services. The health checks automatically add/remove nodes from the pools and route traffic based on the result of health checks to ensure continuity of service. For geth this is most relevant to rpc services and those using geth as the execution client for validators. |
We like the proposal, but the integration needs to be done somewhat differently, also it would be good to coordinate this endpoint specification a bit better amongst client devs. |
@holiman Sure thing, sounds good. Do you have any advice on how you'd structure it? I had a go at implementing it as per @learnerLj's recommendations but got stuck at integrating a separate app to run via the main process as there doesn't seem to be a precedence for that with the current cmd apps. |
A few ides, just to do a braindump:
|
Agree that the concatenation of variable names and values does not feel right. Erigon was the most analogous project to Geth with this feature so we used their structure as the basis for this in an attempt to create some uniformity. Having individual headers for each option would make more sense to me. An example of this from Fortinet. Unfortunately returning 200 vs 50X HTTP codes depending on the result is a non-configurable standard for the major services (GCP, Azure, AWS) so it would require a caveat to the rule. In terms of header name formatting this is GCP's guidance on custom header names:
Erigon's implementation does seem to break that final rule. An alternative solution would be to use query strings and pass the values in the url where providing no value returns the current behaviour and adding values triggers the extra checks:
|
I agree that this seems like something that would be nice to standardize across EL clients. Maybe worth opening an issue to https://github.com/ethereum/execution-apis and discussing on ACD at some point? |
@lightclient sounds like a plan. Done 👍 |
Can anyone help get this moving in some way? @lightclient the issue raised on the execution-apis has not yet been ACK'd, do you have any advice for us 🙇♂️ |
Problem
Health checks for many infrastructure providers such as GCP, AWS and Cloudflare require an endpoint which accepts an HTTP GET request and returns status 200 if the service is healthy.
The current mechanism to address this was last updated in 2017 (#15496) where an exception was made to return status 200 for an empty HTTP GET request as the rpc only currently accepts POST requests of type
application/json
.It would be useful to expand this functionality to enable operators to define the state required to be considered healthy. Other execution clients such as Erigon and Nethermind have implemented similar solutions.
Proposed Solution
This PR seeks to add a new package called
health
which extends the functionality of the http server to add a/health
endpoint when invoked with the--http.health
flag.When enabled an operator can set parameters using the
X-GETH-HEALTHCHECK
header.Available options (these match Erigon for cross compatibility):
synced
- will check if the node has completed syncingmin_peer_count<count>
- will check that the node has at least<count>
many peerscheck_block<block>
- will check that the node is at least ahead of the<block>
specifiedmax_seconds_behind<seconds>
- will check that the node is no more than<seconds>
behind from its latest blockThe request will return:
When no value is provided for an option that check will not run. The status of the check will be
DISABLED
in the return object.Example Request
Alternatively you can send a POST request using the same parameters in JSON:
Example Healthy Response (
Status: 200
)Example Unhealthy Response (
Status: 500
)Structure
Due to the restriction on the rpc package only accepting POST requests of type
application/json
and issues with circular imports creating a separate package which can be enabled independently appeared to be the cleanest solution. The package is similar in implementation to thegraphql
package in that it extends the http server, so the new package is mimicking the structure of this package.Further Development
The work on this PR was done by the team at Republic Crypto. We're happy to accept input and apply requests for change where needed.