-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Watchtower to support multiple RPC sources #4621
Comments
I can help with this issue. Here are some questions for clarification:
|
I'll let Will comment on the other items but
Yes |
you're effectively implementing https://en.wikipedia.org/wiki/Triple_modular_redundancy. we want
yes
definitely do not ignore, that would defeat the purpose of the change
the cli config should be should be modified as follows:
|
@t-nelson Okay, this all makes total sense. Thanks for the detailed description! |
I think ties are fine... the desired behavior in the event of a tie would be dictated by the number of RPCs provided and the value of |
tbh there's probably little gain in accepting more than three rpc sources. anyone who wants to is probably missing the point and shouldn't be using more than one. it's not like we're sending this thing into space for centuries. this would actually simplify the cli config
then genericization of the exiting logic is identical |
@t-nelson I like the idea of just allowing |
"never go to sea with two chronometers. take one, or three" |
Shouldn't here be a |
i think the logic is correct as is. threshold zero (no appetite) should alert the first one |
It looks like I'm almost done with the implementation...
|
i think just raise watchtower reliability alert. we can't make any sense of the information we've collected
just reliability
i think having the log messages will be useful when a watchtower reliability alert is raised
i think you're thinking about this backwards. we want threshold endpoints to agree that our node and the cluster are healthy. we raise an alert any time that's not the case. attempting to get agreement on the failure mode will be very difficult to get right |
@t-nelson
|
I updated the pull request accordingly and moved it out of the draft state, although the CI still needs to be approved :) |
Problem
Watchtower monitors cluster health and node delinquency. It does this by polling some RPC endpoints, but the current implementation only uses a single RPC url. That RPC service becomes a single point of failure.
Proposed Solution
Upate Watchtower config and logic to support:
The text was updated successfully, but these errors were encountered: