-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustering: re-discover and re-join peers on interval in background #4465
Comments
Along with this issue, please add support to the chart for the additional flags along with the clustering flag that was recently added |
@braunsonm are you suggesting exposing the rest of the In the meantime, there's an extraArgs field on values.yaml you can use to pass any flags to the |
Yea I'm suggesting Can definitely be done manually in the meantime, but ideally this should allow more modes than just statefulset to be deployed. We're specifically looking for daemonset to be supported as we run node exporter with Grafana Agent |
Go 1.19 adds Windows support for the native Go network stack, but doesn't include support for resolving DNS short names. Other projects, including Prometheus, have updated their build process to exclude the netgo build tags when producing Windows binaries to work around this behavior. Fixes grafana#4465.
Request
Add a new flag,
--cluster.rejoin-interval
which specifies how often a node should rediscover peers and rejoin them to address split brain issues.--cluster.rejoin-interval
should default to some reasonable value, such as60s
.When set to
0s
, rediscovery/rejoining is disabled.This proposal should be paired with #4464 to avoid overwhelming the network on large clusters, as the state push/pull done on join is more expensive than other gossip traffic.
Use case
Today, for nodes to join a cluster successfully, one of the following must be true:
This is because the set of peers to join is determined once at startup.
In effect, this means that clustering on Kubernetes must only be enabled via a StatefulSet, and
podManagementPolicy
must not be set to Parallel. If neither of these conditions are met, clustering will end up in a split brain state.To avoid these constraints, nodes should rediscover and rejoin the cluster in the background on some timer to address split brain issues.
The text was updated successfully, but these errors were encountered: