Ready state handling of sidecar #1677

jabbrwcky · 2019-10-23T12:14:50Z

We experienced some bumps upgrading to thanos 0.8.1 (docker image).

Looking into sidecar I noted that the ready state of the sidecar is only set in the initial loading of prometheus labels:

thanos/cmd/thanos/sidecar.go

Lines 167 to 186 in 64af185

 err := runutil.Retry(2*time.Second, ctx.Done(), func() error { 

 if err := m.UpdateLabels(ctx, logger); err != nil { 

 level.Warn(logger).Log( 

 "msg", "failed to fetch initial external labels. Is Prometheus running? Retrying", 

 "err", err, 

 ) 

 promUp.Set(0) 

 statusProber.SetNotReady(err) 

 return err 

 } 

 level.Info(logger).Log( 

 "msg", "successfully loaded prometheus external labels", 

 "external_labels", m.Labels().String(), 

 ) 

 promUp.Set(1) 

 statusProber.SetReady() 

 lastHeartbeat.Set(float64(time.Now().UnixNano()) / 1e9) 

 return nil 

 })

The recurring check updates the 'prometheus_up' metric, but not the sidecar ready state:

thanos/cmd/thanos/sidecar.go

Lines 197 to 210 in 64af185

 return runutil.Repeat(30*time.Second, ctx.Done(), func() error { 

 iterCtx, iterCancel := context.WithTimeout(context.Background(), 5*time.Second) 

 defer iterCancel() 

 if err := m.UpdateLabels(iterCtx, logger); err != nil { 

 level.Warn(logger).Log("msg", "heartbeat failed", "err", err) 

 promUp.Set(0) 

 } else { 

 promUp.Set(1) 

 lastHeartbeat.Set(float64(time.Now().UnixNano()) / 1e9) 

 } 

 return nil 

 })

Is this intentional?

I assume when prometheus is considered non-healthy/-ready, the sidecar should report the same.

Please correct me if I am wrong.

kakkoyun · 2019-10-24T06:52:42Z

Hey @jabbrwcky, I have worked on related issues, looks like I have missed this part. I'll have a look at it.

povilasv · 2019-10-24T07:30:37Z

I think this was intentional #1395 (comment) as we had a discussion on slack https://cloud-native.slack.com/archives/CL25937SP/p1565945595078500

bwplotka · 2019-10-24T10:39:54Z

Yes, readiness it's only assumed for the initial readiness. Blips are not changing the healthiness probe as there is not much that sidecar can do, how restart/container restart would help in this case (healthiness probe is for this case).

Also I don't see the difference in logic vs pre 0.8.1. What exactly you would expect here? (:

jabbrwcky · 2019-10-25T08:17:31Z

We experienced complete failures of queries on the querier because one prometheus instance stopped responding to HTTP request (i/o timeouts) and the querier included the prometheus/sidecar as source.

I would expect that the querier would return a partial result in such a case. Do I understand this correctly?

bwplotka · 2019-10-25T09:14:40Z

That's correct. This logic can be controlled by query.partial-response. Do you have it set to false maybe now?

jabbrwcky · 2019-10-25T09:27:50Z

It should be true. I cannot reproduce it at the moment because the prometheus instance in question is behaving at the moment.

So if query.partial-response should take care of this I'll verify that we have set it to true in the configs.

stale · 2020-01-11T04:43:03Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jojohappy added component: query component: sidecar and removed component: query labels Oct 29, 2019

stale bot added the stale label Jan 11, 2020

stale bot closed this as completed Jan 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ready state handling of sidecar #1677

Ready state handling of sidecar #1677

jabbrwcky commented Oct 23, 2019 •

edited

Loading

kakkoyun commented Oct 24, 2019

povilasv commented Oct 24, 2019

bwplotka commented Oct 24, 2019

jabbrwcky commented Oct 25, 2019

bwplotka commented Oct 25, 2019

jabbrwcky commented Oct 25, 2019

stale bot commented Jan 11, 2020

Ready state handling of sidecar #1677

Ready state handling of sidecar #1677

Comments

jabbrwcky commented Oct 23, 2019 • edited Loading

kakkoyun commented Oct 24, 2019

povilasv commented Oct 24, 2019

bwplotka commented Oct 24, 2019

jabbrwcky commented Oct 25, 2019

bwplotka commented Oct 25, 2019

jabbrwcky commented Oct 25, 2019

stale bot commented Jan 11, 2020

jabbrwcky commented Oct 23, 2019 •

edited

Loading