Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probe Statefulset Pods until healthy upon scale up #71

Merged
merged 1 commit into from
Mar 26, 2021
Merged

Conversation

metalmatze
Copy link
Contributor

This is crucial to not send requests to receivers too early.
In combination with thanos-io/thanos#3845 for quicker reloading of Thanos Receive Routes I was able to achieve > 99% availability in errors+latency while scaling up from 3 replicas to 20.

2-receiver-route-improve-controller

@metalmatze metalmatze requested review from squat and kakkoyun and removed request for squat March 23, 2021 16:02
@metalmatze
Copy link
Contributor Author

Looking into linting failures.

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 💯 LGTM. We just need to make CI green, I guess.

level.Info(logger).Log("msg", "caught interrupt")
close(sig)
})
g.Add(run.SignalHandler(context.Background(), os.Interrupt, syscall.SIGTERM))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jmichalek132
Copy link
Contributor

One question about this why not instead of this just look at endpoints of a service pointing to the thanos-receive statefulset?

@metalmatze
Copy link
Contributor Author

metalmatze commented Mar 23, 2021

One question about this why not instead of this just look at endpoints of a service pointing to the thanos-receive statefulset?

Hm. Interesting. That might just work the same, yeah. I feel like this current approach gives us a little bit more control though. WDYT?

@jmichalek132
Copy link
Contributor

One question about this why not instead of this just look at endpoints of a service pointing to the thanos-receive statefulset?

Hm. Interesting. That might just work the same, yeah. I feel like this approach gives us a little bit more control though. WDYT?

The behavior would be slightly different, especially if you were to also remove the instance from configmap once it's removed from endpoints. But it would also respect other things such as failureThreshold,periodSeconds,..
Honestly I would prefer that but I an not sure about what consequences it had on thanos receive behavior.

@metalmatze
Copy link
Contributor Author

I'll think about it a bit more.
If you were to hack a bit on this code to make it somewhat work I'm happy to run it with the benchmark suite again so we get some numbers to compare 👍

@metalmatze
Copy link
Contributor Author

Seems like this fails now, because the current master is broken too..

@jmichalek132
Copy link
Contributor

jmichalek132 commented Mar 23, 2021

I will find some time for it over the weekend. Is that okay?

@spaparaju
Copy link

spaparaju commented Mar 24, 2021

Screenshot 2021-03-24 at 5 06 32 PM

For testing this PR (as the current tests are against a fake K8S client), I have maxed out cluster resources (on minikube each Thanos-receive pod starts with 512 MB memory) by scaling up Thanos-receive pods. With these cluster maxed out conditions, thanos-receive pods would not reach the indicated .spec.replicas and the Obs. hash-ring would need to reflect the URLs which are pointing only 'replicas in Ready status'. Here is the screenshot of the this test result where hash-ring is out of sync with the 'replicas in ready status'.

@metalmatze
Copy link
Contributor Author

I think you're talking about a slightly different problem. More of an addition to this existing PR which we should be able to handle in another PR building on top of this. What do you think?

@@ -502,7 +507,26 @@ func (c *controller) sync() {
continue
}

// If there's an increase in replicas we poll for the new replicas to be ready
if _, ok := c.replicas[hashring]; ok && c.replicas[hashring] < *sts.Spec.Replicas {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this type of thing to be truly safe to do, we'll need to eventually implement leader election to avoid that multiple controllers are reconciling this state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware that anyone has tried running multiple thanos-receive-controllers yet. If we are going to support that, then yes, that should be improved. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Matthias. Right now, it's definitely assumed that the controller is only one replica. Even without this change, we would really need to implement some coordination to enable more replicas to have truly predictable results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while unlikely, the case that I worry about more is rollout/eviction/preemption or some other reason why there may be temporarily two .. as I said, eventually this would be good, not necessary in this PR

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
@metalmatze
Copy link
Contributor Author

I'd be happy to go forward with this and take care of other improvements in future PRs.

Copy link
Contributor

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🎉 👍 but would be good to have another pair of eyes

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🥇

@kakkoyun kakkoyun merged commit 7770963 into master Mar 26, 2021
@metalmatze metalmatze deleted the scale-up branch March 30, 2021 12:02
@christopherzli
Copy link
Contributor

just curious, this would not work when multiple share the same hashring right?


if err := c.waitForPod(podName); err != nil {
level.Warn(c.logger).Log("msg", "failed polling until pod is ready", "pod", podName, "duration", time.Since(start), "err", err)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we continue here if failed polling? Shuoldn't we throw the error instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants