You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently opened a PR on the Go Ethereum repo to implement a health API that will work with cloud/loud balancer provider's health checks to determine if a node is in a suitable state to handle requests from clients.
It would be ideal for consensus on a universal standard for such a service to be acquired prior to implementation. @lightclient recommended we open an issue here to get the ball rolling.
How Health Checks Work
Health checks across large providers generally work the same way (GCP, Azure, AWS):
The provider sends a GET request to an endpoint running on the service.
The service responds either status 200 (healthy), 500 (unhealthy) or doesn't respond at all (unhealthy).
If the service is unhealthy traffic heading to that piece of infrastructure is routed elsewhere.
There is some nuance where a small number of providers accept 2XX/5XX statuses with different actions for different statuses, however the 200/500 model is universal.
Limitations
Health checks only allow simple GET requests. This is at odds with the current Ethereum standard which only allows POST requests of type application/json.
Because only GET requests are allowed you cannot post the parameters in the JSON-RPC 2.0 format. Users must utilise custom headers or query strings in the URL to pass parameters.
Checks
As a starting point these are the checks we implemented in our PR (derived from Erigon's solution):
syncing: Check if the node is in the syncing state.
max_seconds_behind: Check the timestamp of the latest block isn't greater than max_seconds_behind seconds ago.
min_peers: Confirm there are at least min_peers number of peers connected to the node.
check_block: Confirm the node has synced beyond the height of check_block.
In our implementation we used custom headers to pass the values for each variable. When a variable is not defined that check will not run.
Upon finishing the checks the node returns either status 200 if all checks passed or 500 if any failed. Additionally an object containing the results for each test (OK, DISABLED or the error message for the check) is returned.
Further Work
Any assistance on further developing this idea in to a solid standard and getting this moving forward would be hugely appreciated. Such an endpoint would be massively useful to us as node operators and likely many others. Our team has the resources and willingness to assist however possible.
The text was updated successfully, but these errors were encountered:
We recently opened a PR on the Go Ethereum repo to implement a
health
API that will work with cloud/loud balancer provider's health checks to determine if a node is in a suitable state to handle requests from clients.It would be ideal for consensus on a universal standard for such a service to be acquired prior to implementation. @lightclient recommended we open an issue here to get the ball rolling.
How Health Checks Work
Health checks across large providers generally work the same way (GCP, Azure, AWS):
GET
request to an endpoint running on the service.200
(healthy),500
(unhealthy) or doesn't respond at all (unhealthy).There is some nuance where a small number of providers accept
2XX
/5XX
statuses with different actions for different statuses, however the200
/500
model is universal.Limitations
Health checks only allow simple
GET
requests. This is at odds with the current Ethereum standard which only allowsPOST
requests of typeapplication/json
.Because only
GET
requests are allowed you cannot post the parameters in the JSON-RPC 2.0 format. Users must utilise custom headers or query strings in the URL to pass parameters.Checks
As a starting point these are the checks we implemented in our PR (derived from Erigon's solution):
syncing
: Check if the node is in the syncing state.max_seconds_behind
: Check the timestamp of the latest block isn't greater thanmax_seconds_behind
seconds ago.min_peers
: Confirm there are at leastmin_peers
number of peers connected to the node.check_block
: Confirm the node has synced beyond the height ofcheck_block
.In our implementation we used custom headers to pass the values for each variable. When a variable is not defined that check will not run.
Upon finishing the checks the node returns either status
200
if all checks passed or500
if any failed. Additionally an object containing the results for each test (OK
,DISABLED
or the error message for the check) is returned.Further Work
Any assistance on further developing this idea in to a solid standard and getting this moving forward would be hugely appreciated. Such an endpoint would be massively useful to us as node operators and likely many others. Our team has the resources and willingness to assist however possible.
The text was updated successfully, but these errors were encountered: