queue proxy should distinguish shutting down from not ready yet #8151

mattmoor · 2020-06-01T17:48:40Z

/area API
/area networking

Describe the feature

This came up digging into #8147

It seems like the intent of the high-frequency of probing was to speed shutdowns:

serving/pkg/reconciler/revision/resources/queue.go

Lines 151 to 155 in e0b4b30

    
           // We want to mark the service as not ready as soon as the 
        
           // PreStop handler is called, so we need to check a little 
        
           // bit more often than the default.  It is a small 
        
           // sacrifice for a low rate of 503s. 
        
           PeriodSeconds: 1,

However, we don't distinguish between "not yet ready" and shutting down in the health-state management logic:

serving/pkg/queue/health/health_state.go

Lines 107 to 110 in e0b4b30

    
           case h.IsShuttingDown(): 
        
           	sendNotAlive() 
        
           case prober != nil && !prober(): 
        
           	sendNotAlive()

I also don't see any matched logic here:

serving/cmd/queue/main.go

Lines 273 to 287 in 67f920c

    
           res, lastErr := httpClient.Do(req) 
        
           if lastErr != nil { 
        
           	// Return nil error for retrying 
        
           	return false, nil 
        
           } 
        
           defer res.Body.Close() 
        
           success := health.IsHTTPProbeReady(res) 
        
           // The check for preferForScaledown() fails readiness faster 
        
           // in the presence of the label 
        
           if preferScaleDown, err := preferPodForScaledown(env.DownwardAPILabelsPath); err != nil { 
        
           	fmt.Fprintln(os.Stderr, err) 
        
           } else if !success && preferScaleDown { 
        
           	return false, errors.New("failing probe deliberately for pod scaledown") 
        
           } 
        
           return success, nil

So if I'm reading things correctly, we will probe until K8s times out even after we've received the shutdown signal.

This seems like an easy way to save 10s during shutdown to potentially offset a change like: #8148

mattmoor · 2020-06-01T17:57:45Z

/kind good-first-issue

tcnghia · 2020-06-01T20:18:32Z

/assign @rafaeltello

Fixes: knative#8151 * Makes health_state return a different error code (currently 409) when it's shutting down. * Makes shutting down fail fast (like preferPodForScaleDown). * Unit tests to validate behavior.

* Separate shutting down v. not ready in queue proxy. Fixes: #8151 * Makes health_state return a different error code (currently 409) when it's shutting down. * Makes shutting down fail fast (like preferPodForScaleDown). * Unit tests to validate behavior. * Switching StatusConflict to StatusGone, per PR feedback * Dropping nil check for IsHTTPProbeReady and IsHTTPProbeShuttingDown. * Fix comment typo in health_state

mattmoor added the kind/feature Well-understood/specified features, ready for coding. label Jun 1, 2020

mattmoor added this to the Serving 0.16.x milestone Jun 1, 2020

knative-prow-robot added area/API API objects and controllers area/networking labels Jun 1, 2020

knative-prow-robot added the kind/good-first-issue label Jun 1, 2020

knative-prow-robot assigned rafaeltello Jun 1, 2020

rafaeltello mentioned this issue Jun 3, 2020

Separate shutting down v. not ready in queue proxy. #8188

Merged

knative-prow-robot closed this as completed in #8188 Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

queue proxy should distinguish shutting down from not ready yet #8151

queue proxy should distinguish shutting down from not ready yet #8151

mattmoor commented Jun 1, 2020

mattmoor commented Jun 1, 2020

tcnghia commented Jun 1, 2020

queue proxy should distinguish shutting down from not ready yet #8151

queue proxy should distinguish shutting down from not ready yet #8151

Comments

mattmoor commented Jun 1, 2020

Describe the feature

mattmoor commented Jun 1, 2020

tcnghia commented Jun 1, 2020