Add automaxprocs package #1443

hawwwdi · 2024-04-11T07:58:25Z

add package automaxprocs to automatically set GOMAXPROCS to match Linux container CPU quota on kubernetes and prevent CPU throttling .
Performance results are here.

if applied, package automaxprocs will respect the CPU quota on kubernetes cluster Signed-off-by: Hadi Abbasi <hawwwdi@gmail.com>

stefanprodan · 2024-04-11T08:30:04Z

@hawwwdi source-controller is I/O bound instead of being CPU bound. Did you observed CPU throttling for source-controller in your clusters?

hawwwdi · 2024-04-11T08:48:23Z

@hawwwdi source-controller is I/O bound instead of being CPU bound. Did you observed CPU throttling for source-controller in your clusters?

Yes, I have set the CPU request and limit to 12, but when the source controller uses nearly 6 cores, throttling occurs. Pod monitoring screenshot:

stefanprodan · 2024-04-11T09:16:06Z

Are you using HTTP/S HelmRepos that contain a huge number of charts? That would explain the CPU usage. Can you please post here: flux stats -A

hawwwdi · 2024-04-11T09:41:41Z

Are you using HTTP/S HelmRepos that contain a huge number of charts? That would explain the CPU usage. Can you please post here: flux stats -A

Yes, I've stored many of my Helm charts on a Git repository, and perhaps it's causing high CPU usage. However, it shouldn't be throttled with 6 cores CPU usage when the limit and request are set to 12 cores. The automaxprocs package should fix it.
For flux stats -A, I currently don't have access to my cluster, but I will post it here later.

souleb · 2024-04-11T10:14:04Z

The throttle should be only on limits not on requests which should be guaranteed or your pod would be rescheduled. If you have requested 12 cpu units and they have been guaranteed by the scheduler, automaxprocs would set GOMAXPROCS accordingly right? Are you sure using automaxprocs would fixe this particular case?

hawwwdi · 2024-04-11T10:52:17Z

The throttle should be only on limits not on requests which should be guaranteed or your pod would be rescheduled. If you have requested 12 cpu units and they have been guaranteed by the scheduler, automaxprocs would set GOMAXPROCS accordingly right? Are you sure using automaxprocs would fixe this particular case?

The automaxprocs package sets GOMAXPROCS according to the CPU limits, not the requests. I've tested the automaxprocs package on my services, which were throttled before reaching the CPU limit. After adding it, the issue was resolved.

hawwwdi · 2024-04-11T11:15:39Z

Are you using HTTP/S HelmRepos that contain a huge number of charts? That would explain the CPU usage. Can you please post here: flux stats -A

Here is the output of flux stats -A:

stefanprodan · 2024-04-11T11:23:25Z

I suggest enabling Helm index caching to see if this reduces the resources usage https://fluxcd.io/flux/installation/configuration/vertical-scaling/#enable-helm-repositories-caching

My guess is that those failing HelmCharts are one of the reasons for high CPU usage, if those come from Git then source-controller needs to compile them and it's running continuously as they fail.

hawwwdi · 2024-04-11T11:35:40Z

I suggest enabling Helm index caching to see if this reduces the resources usage https://fluxcd.io/flux/installation/configuration/vertical-scaling/#enable-helm-repositories-caching

My guess is that those failing HelmCharts are one of the reasons for high CPU usage, if those come from Git then source-controller needs to compile them and it's running continuously as they fail.

Yes, perhaps enabling this feature or moving Helm charts from Git to a Helm repository will reduce CPU usage. But, this pull request is not intended to solve the high CPU usage problem. Instead, it aims to solve the issue of throttling before reaching the CPU limit.

stefanprodan · 2024-04-11T11:56:27Z

I fail to understand how this PR would help you in any way. In your case, you've set 12 CPU as the limit and you've seen throttling at 6 cores. My understanding from automaxprocs docs, is that this defends against throttling when GOMAXPROCS is set above the limits, but in your case is the opposite, it never got to use all 12 cores.

stefanprodan · 2024-04-11T12:18:13Z

By the way, I see no issue with using automaxprocs in Flux as it should help reduce throttling on nodes with lots of CPU cores for when users don't bump the 1 core limit we set by default. Besides source-controller, we should add it to:

kustomize-controller
helm-controller

Could you please build source-controller from this PR and deploy it on your cluster to see if in your case throttling stops?

hawwwdi · 2024-04-11T12:18:39Z

I fail to understand how this PR would help you in any way. In your case, you've set 12 CPU as the limit and you've seen throttling at 6 cores. My understanding from automaxprocs docs, is that this defends against throttling when GOMAXPROCS is set above the limits, but in your case is the opposite, it never got to use all 12 cores.

According to the golang docs, the default value of GOMAXPROCS is set to the number of available logical CPUs. In a Kubernetes environment, it is set to the underlying node logical CPUs. In my case, the underlying node has 56 cores, which leads to setting GOMAXPROCS=56 in the source-controller process which is above the limit (12) i have set before. This causes CPU throttling before reaching the CPU limits.

hawwwdi · 2024-04-11T12:20:58Z

By the way, I see no issue with using automaxprocs in Flux as it should help reduce throttling on nodes with lots of CPU cores for when users don't bump the 1 core limit we set by default. Besides source-controller, we should add it to:

kustomize-controller

helm-controller

Could you please build source-controller from this PR and deploy it on your cluster to see if in your case throttling stops?

Yes, I will report the result here.

souleb · 2024-04-11T12:40:07Z

I also think that this makes sense, thanks @hawwwdi.

We could also use the downward-api e.g.

       env:
        - name: GOMAXPROCS
          valueFrom:
            resourceFieldRef:
              resource: limits.cpu

We could save a dependency that way.

stefanprodan · 2024-04-11T12:48:04Z

@souleb isn't GOMAXPROCS suppose to be set to an integer value? limits.cpu is in the format 1000m

stefanprodan · 2024-04-11T13:13:30Z

So looks like Go handles well the env var and converts it https://blog.howardjohn.info/posts/gomaxprocs/

We could set both GOMAXPROCS and GOMEMLIMIT in flux2 manifests for all controllers. I will test if it works Ok.

hawwwdi · 2024-04-11T13:20:22Z

So looks like Go handles well the env var and converts it https://blog.howardjohn.info/posts/gomaxprocs/

We could set both GOMAXPROCS and GOMEMLIMIT in flux2 manifests for all controllers. I will test if it works Ok.

It's a great way. I will test this method and share the results here.

souleb · 2024-04-11T13:20:24Z

yes, it converts it to an int32. automaxprocs does some conversion too, so I believe the behavior is the same.
See https://github.com/golang/go/blob/b6778c5230d554c1ba1a69b104513021467d32b2/src/runtime/proc.go#L825

Edit: I believe that if the conversion fails, it falls backs to the number of cores.

stefanprodan · 2024-04-11T14:26:25Z

I have added the env vars and I've run the Flux benchmarks a couple of times. My conclusion is that is safe to add the env vars to the main distribution as I see no regression in terms of reconciliation speed.

@hawwwdi please open a PR in the flux2 repo and add this patch to manifests/install https://github.com/fluxcd/flux-benchmark/pull/5/files Would be great if you could post in the new PR your test results for source-controller.

hawwwdi · 2024-04-11T14:45:16Z

I have added the env vars and I've run the Flux benchmarks a couple of times. My conclusion is that is safe to add the env vars to the main distribution as I see no regression in terms of reconciliation speed.

@hawwwdi please open a PR in the flux2 repo and add this patch to manifests/install https://github.com/fluxcd/flux-benchmark/pull/5/files Would be great if you could post in the new PR your test results for source-controller.

Thank you for testing. Okay, I will open a PR in the flux2 repo with the test results.

stefanprodan · 2024-04-12T10:41:01Z

Superseded by: fluxcd/flux2#4717

add automaxprocs

7d4471c

if applied, package automaxprocs will respect the CPU quota on kubernetes cluster Signed-off-by: Hadi Abbasi <hawwwdi@gmail.com>

hawwwdi force-pushed the main branch from 3da2ceb to 7d4471c Compare April 11, 2024 08:00

stefanprodan mentioned this pull request Apr 11, 2024

Set GOMAXPROCS and GOMEMLIMIT using downward API fluxcd/flux-benchmark#5

Closed

hawwwdi mentioned this pull request Apr 11, 2024

Set GOMAXPROCS and GOMEMLIMIT to all Flux controllers fluxcd/flux2#4717

Merged

stefanprodan closed this Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add automaxprocs package #1443

Add automaxprocs package #1443

hawwwdi commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

hawwwdi commented Apr 11, 2024 •

edited

Loading

souleb commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

souleb commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

stefanprodan commented Apr 11, 2024 •

edited

Loading

hawwwdi commented Apr 11, 2024

souleb commented Apr 11, 2024 •

edited

Loading

stefanprodan commented Apr 11, 2024 •

edited

Loading

hawwwdi commented Apr 11, 2024

stefanprodan commented Apr 12, 2024

Add automaxprocs package #1443

Add automaxprocs package #1443

Conversation

hawwwdi commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

hawwwdi commented Apr 11, 2024 • edited Loading

souleb commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

hawwwdi commented Apr 11, 2024

souleb commented Apr 11, 2024

stefanprodan commented Apr 11, 2024

stefanprodan commented Apr 11, 2024 • edited Loading

hawwwdi commented Apr 11, 2024

souleb commented Apr 11, 2024 • edited Loading

stefanprodan commented Apr 11, 2024 • edited Loading

hawwwdi commented Apr 11, 2024

stefanprodan commented Apr 12, 2024

hawwwdi commented Apr 11, 2024 •

edited

Loading

stefanprodan commented Apr 11, 2024 •

edited

Loading

souleb commented Apr 11, 2024 •

edited

Loading

stefanprodan commented Apr 11, 2024 •

edited

Loading