lifecycle-sidecar at 100% of CPU limit #515

kpurdon · 2020-06-26T02:28:08Z

I'm seeing right at 100% of the requested limit CPU usage for the lifecycle sidecar. Would it be possible to increase, or allow configuration of the resource requests for this.

I think

consul-helm/templates/mesh-gateway-deployment.yaml

Lines 359 to 365 in e2fb73c

    
           resources: 
        
             requests: 
        
               memory: "25Mi" 
        
               cpu: "10m" 
        
             limits: 
        
               memory: "25Mi" 
        
               cpu: "10m"

is the spot that would need to allow for configuration.

lkysow · 2020-06-26T03:05:10Z

Hey Kyle, are you seeing this on your connect pods or on the mesh gateway? We can definitely make this configurable but because it should be the same for everyone we might instead just want to bump it up. Our testing showed it had lots of room but maybe that does depend on workload. Can you show us your graphs or kubectl top output as well please?

…

On Thu, Jun 25, 2020, 7:28 PM Kyle Purdon ***@***.***> wrote: I'm seeing right at peak memory usage for the lifecycle sidecar. Would it be possible to increase, or allow configuration of the resource requests for this. I *think* https://github.com/hashicorp/consul-helm/blob/e2fb73ca3fa00074ceec26f6b4b15dd860a094ad/templates/mesh-gateway-deployment.yaml#L359-L365 is the spot that would need to allow for configuration. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#515>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAH4RPI6CZCZA34GID7E6S3RYQBUJANCNFSM4OI5MFFA> .

kpurdon · 2020-06-26T03:20:23Z

This is on the connect enabled pods, no mesh gateway.

Here is a chart showing the CPU limit utilization for all lifecycle containers for the last hour:

Here is a single top for one of the pods:

POD                      NAME                               CPU(cores)   MEMORY(bytes)
ceweb-7848c69b66-smkxb   consul-connect-lifecycle-sidecar   5m           22Mi
ceweb-7848c69b66-smkxb   consul-connect-envoy-sidecar       3m           18Mi
ceweb-7848c69b66-smkxb   ceweb                              2m           493Mi

Happy to provide any additional metrics that may be useful.

kpurdon · 2020-06-26T21:36:44Z

Here is the same graph for the last day. There is no real difference in the pods in the top or bottom group, and both groups include pods from each of the services I have connect enabled.

lkysow · 2020-06-26T21:44:45Z

Okay thank you for this information. We're going to look to bump this up and we're also looking at another issue related to resource settings and OOM. If you need a workaround right now you'd need to bump these up yourself:

https://github.com/hashicorp/consul-k8s/blob/9a3a22edbabd4f935e8831f32afd55e816ebdd0b/connect-inject/lifecycle_sidecar.go#L10-L15

and build a custom consul-k8s image. To be clear this is a high priority for us and we're working on a fix as we speak.

lkysow · 2020-07-09T17:28:40Z

Will be addressed by #533

lkysow · 2020-07-09T22:18:55Z

This bugfix is available in 0.23.0.

kpurdon · 2020-07-10T02:51:14Z

Awesome @lkysow ... quick question. Is the same multi-step upgrade process required for using a newer helm chart version if the underlying consul version has not changed?

lkysow · 2020-07-10T18:16:30Z

Hey Kyle, it depends on whether the consul client daemonset pods will end up being restarted by the helm upgrade. The release (you should actually use 0.23.1, there was a TLS bug we just patched) changes the default version of consul-k8s to 0.17.0 in order to get this bugfix.

That shouldn't affect the client daemonset unless you have ACLs enabled. If you do, then consul-k8s is actually used as an init container in the client daemonset and so bumping the Docker image version will trigger a client daemonset restart and so for a no-downtime upgrade you would need to follow the multi-step upgrade process.

There's no built-in way to helm to see what will be updated but there is a helm diff plugin: https://github.com/databus23/helm-diff

helm repo update
helm diff upgrade <your release name> hashicorp/consul -f scratch/tls.yaml --version 0.23.1

If the output contains a line about the daemonset like

default, consul-consul, DaemonSet (apps) has changed:

Then the client will be updated.

lkysow added area/connect Related to Connect, e.g. injection bug Something isn't working labels Jun 26, 2020

lkysow changed the title ~~Configure lifecycle-sidecar resource requests~~ lifecycle-sidecar at 100% of CPU limit Jun 26, 2020

This was referenced Jul 7, 2020

lifecycle-sidecar and connect inject init container have mandatory resource requirements hashicorp/consul-k8s#289

Closed

refactor resource requests and limits for init containers and lifecycle sidecar #532

Closed

lkysow closed this as completed Jul 9, 2020

lkysow mentioned this issue Jul 10, 2020

Update our upgrade docs to detail no-downtime deployment #540

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lifecycle-sidecar at 100% of CPU limit #515

lifecycle-sidecar at 100% of CPU limit #515

kpurdon commented Jun 26, 2020 •

edited

Loading

lkysow commented Jun 26, 2020 via email

kpurdon commented Jun 26, 2020

kpurdon commented Jun 26, 2020

lkysow commented Jun 26, 2020

lkysow commented Jul 9, 2020

lkysow commented Jul 9, 2020

kpurdon commented Jul 10, 2020

lkysow commented Jul 10, 2020 •

edited

Loading

lifecycle-sidecar at 100% of CPU limit #515

lifecycle-sidecar at 100% of CPU limit #515

Comments

kpurdon commented Jun 26, 2020 • edited Loading

lkysow commented Jun 26, 2020 via email

kpurdon commented Jun 26, 2020

kpurdon commented Jun 26, 2020

lkysow commented Jun 26, 2020

lkysow commented Jul 9, 2020

lkysow commented Jul 9, 2020

kpurdon commented Jul 10, 2020

lkysow commented Jul 10, 2020 • edited Loading

kpurdon commented Jun 26, 2020 •

edited

Loading

lkysow commented Jul 10, 2020 •

edited

Loading