-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probe Statefulset Pods until healthy upon scale up #71
Conversation
Looking into linting failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome 💯 LGTM. We just need to make CI green, I guess.
level.Info(logger).Log("msg", "caught interrupt") | ||
close(sig) | ||
}) | ||
g.Add(run.SignalHandler(context.Background(), os.Interrupt, syscall.SIGTERM)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
One question about this why not instead of this just look at endpoints of a service pointing to the thanos-receive statefulset? |
Hm. Interesting. That might just work the same, yeah. I feel like this current approach gives us a little bit more control though. WDYT? |
The behavior would be slightly different, especially if you were to also remove the instance from configmap once it's removed from endpoints. But it would also respect other things such as failureThreshold,periodSeconds,.. |
I'll think about it a bit more. |
Seems like this fails now, because the current master is broken too.. |
I will find some time for it over the weekend. Is that okay? |
For testing this PR (as the current tests are against a fake K8S client), I have maxed out cluster resources (on minikube each Thanos-receive pod starts with 512 MB memory) by scaling up Thanos-receive pods. With these cluster maxed out conditions, thanos-receive pods would not reach the indicated .spec.replicas and the Obs. hash-ring would need to reflect the URLs which are pointing only 'replicas in Ready status'. Here is the screenshot of the this test result where hash-ring is out of sync with the 'replicas in ready status'. |
I think you're talking about a slightly different problem. More of an addition to this existing PR which we should be able to handle in another PR building on top of this. What do you think? |
@@ -502,7 +507,26 @@ func (c *controller) sync() { | |||
continue | |||
} | |||
|
|||
// If there's an increase in replicas we poll for the new replicas to be ready | |||
if _, ok := c.replicas[hashring]; ok && c.replicas[hashring] < *sts.Spec.Replicas { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this type of thing to be truly safe to do, we'll need to eventually implement leader election to avoid that multiple controllers are reconciling this state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware that anyone has tried running multiple thanos-receive-controllers yet. If we are going to support that, then yes, that should be improved. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with Matthias. Right now, it's definitely assumed that the controller is only one replica. Even without this change, we would really need to implement some coordination to enable more replicas to have truly predictable results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while unlikely, the case that I worry about more is rollout/eviction/preemption or some other reason why there may be temporarily two .. as I said, eventually this would be good, not necessary in this PR
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
I'd be happy to go forward with this and take care of other improvements in future PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 🎉 👍 but would be good to have another pair of eyes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🥇
just curious, this would not work when multiple share the same hashring right? |
|
||
if err := c.waitForPod(podName); err != nil { | ||
level.Warn(c.logger).Log("msg", "failed polling until pod is ready", "pod", podName, "duration", time.Since(start), "err", err) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we continue here if failed polling? Shuoldn't we throw the error instead?
This is crucial to not send requests to receivers too early.
In combination with thanos-io/thanos#3845 for quicker reloading of Thanos Receive Routes I was able to achieve > 99% availability in errors+latency while scaling up from 3 replicas to 20.