You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 25, 2021. It is now read-only.
The default resource limits for the lifecycle container (50MB RAM, 20m CPU) are so low the consul service register command is taking upwards of 30s in some cases.
I think this is largely down to the huge size of the Consul binary (100+MB) and the lifecycle process calling consul as a child process rather than using the API directly (e.g. hashicorp/consul-k8s#275).
It seems like the binary also never gets into page cache, the way page cache interacts with memory limits on containers seems... complex to say the least.
This graph shows read iops on 1 node
The first part (consistently <100 iops) has the default resource restrictions.
It appears the lifecycle container is CPU constrained here, the service register calls are taking 10s of seconds to complete.
The second part (spiking up to almost 500 iops) is when I increase the CPU resources to 150m.
At this point service register calls were reasonably quick, 1 or 2 seconds. However because the binary never got into page cache it was having to be read direct from disk every time.
The last part (zero iops) I left the CPU limit at 150m and increase the memory limit to 256MB.
Service register calls take <0.5s and iops goes away completely.
The ideal fix would probably be to stop calling out to the consul binary at all.
I imagine having the consul binary mounted as a volume rather than baked into the image doesn't help either, as it's now a unique file on the host node for every instance of the lifecycle container.
whereas if it were part of the image, overlayfs would allow sharing of the page cache
The text was updated successfully, but these errors were encountered:
The default resource limits for the lifecycle container (50MB RAM, 20m CPU) are so low the
consul service register
command is taking upwards of 30s in some cases.I think this is largely down to the huge size of the Consul binary (100+MB) and the lifecycle process calling consul as a child process rather than using the API directly (e.g. hashicorp/consul-k8s#275).
It seems like the binary also never gets into page cache, the way page cache interacts with memory limits on containers seems... complex to say the least.
This graph shows read iops on 1 node
The first part (consistently <100 iops) has the default resource restrictions.
It appears the lifecycle container is CPU constrained here, the service register calls are taking 10s of seconds to complete.
The second part (spiking up to almost 500 iops) is when I increase the CPU resources to 150m.
At this point service register calls were reasonably quick, 1 or 2 seconds. However because the binary never got into page cache it was having to be read direct from disk every time.
The last part (zero iops) I left the CPU limit at 150m and increase the memory limit to 256MB.
Service register calls take <0.5s and iops goes away completely.
The ideal fix would probably be to stop calling out to the consul binary at all.
I imagine having the consul binary mounted as a volume rather than baked into the image doesn't help either, as it's now a unique file on the host node for every instance of the lifecycle container.
whereas if it were part of the image, overlayfs would allow sharing of the page cache
The text was updated successfully, but these errors were encountered: