Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queue proxy should distinguish shutting down from not ready yet #8151

Closed
mattmoor opened this issue Jun 1, 2020 · 2 comments · Fixed by #8188
Closed

queue proxy should distinguish shutting down from not ready yet #8151

mattmoor opened this issue Jun 1, 2020 · 2 comments · Fixed by #8188
Assignees
Labels
area/API API objects and controllers area/networking kind/feature Well-understood/specified features, ready for coding.
Milestone

Comments

@mattmoor
Copy link
Member

mattmoor commented Jun 1, 2020

/area API
/area networking

Describe the feature

This came up digging into #8147

It seems like the intent of the high-frequency of probing was to speed shutdowns:

// We want to mark the service as not ready as soon as the
// PreStop handler is called, so we need to check a little
// bit more often than the default. It is a small
// sacrifice for a low rate of 503s.
PeriodSeconds: 1,

However, we don't distinguish between "not yet ready" and shutting down in the health-state management logic:

case h.IsShuttingDown():
sendNotAlive()
case prober != nil && !prober():
sendNotAlive()

I also don't see any matched logic here:

serving/cmd/queue/main.go

Lines 273 to 287 in 67f920c

res, lastErr := httpClient.Do(req)
if lastErr != nil {
// Return nil error for retrying
return false, nil
}
defer res.Body.Close()
success := health.IsHTTPProbeReady(res)
// The check for preferForScaledown() fails readiness faster
// in the presence of the label
if preferScaleDown, err := preferPodForScaledown(env.DownwardAPILabelsPath); err != nil {
fmt.Fprintln(os.Stderr, err)
} else if !success && preferScaleDown {
return false, errors.New("failing probe deliberately for pod scaledown")
}
return success, nil

So if I'm reading things correctly, we will probe until K8s times out even after we've received the shutdown signal.

This seems like an easy way to save 10s during shutdown to potentially offset a change like: #8148

@mattmoor mattmoor added the kind/feature Well-understood/specified features, ready for coding. label Jun 1, 2020
@mattmoor mattmoor added this to the Serving 0.16.x milestone Jun 1, 2020
@knative-prow-robot knative-prow-robot added area/API API objects and controllers area/networking labels Jun 1, 2020
@mattmoor
Copy link
Member Author

mattmoor commented Jun 1, 2020

/kind good-first-issue

@tcnghia
Copy link
Contributor

tcnghia commented Jun 1, 2020

/assign @rafaeltello

rafaeltello added a commit to rafaeltello/serving that referenced this issue Jun 3, 2020
Fixes: knative#8151

* Makes health_state return a different error code (currently 409) when it's shutting down.
* Makes shutting down fail fast (like preferPodForScaleDown).
* Unit tests to validate behavior.
rafaeltello added a commit to rafaeltello/serving that referenced this issue Jun 3, 2020
Fixes: knative#8151

* Makes health_state return a different error code (currently 409) when it's shutting down.
* Makes shutting down fail fast (like preferPodForScaleDown).
* Unit tests to validate behavior.
rafaeltello added a commit to rafaeltello/serving that referenced this issue Jun 4, 2020
Fixes: knative#8151

* Makes health_state return a different error code (currently 409) when it's shutting down.
* Makes shutting down fail fast (like preferPodForScaleDown).
* Unit tests to validate behavior.
rafaeltello added a commit to rafaeltello/serving that referenced this issue Jun 4, 2020
Fixes: knative#8151

* Makes health_state return a different error code (currently 409) when it's shutting down.
* Makes shutting down fail fast (like preferPodForScaleDown).
* Unit tests to validate behavior.
knative-prow-robot pushed a commit that referenced this issue Jun 4, 2020
* Separate shutting down v. not ready in queue proxy.

Fixes: #8151

* Makes health_state return a different error code (currently 409) when it's shutting down.
* Makes shutting down fail fast (like preferPodForScaleDown).
* Unit tests to validate behavior.

* Switching StatusConflict to StatusGone, per PR feedback

* Dropping nil check for IsHTTPProbeReady and IsHTTPProbeShuttingDown.

* Fix comment typo in health_state
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers area/networking kind/feature Well-understood/specified features, ready for coding.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants