Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add some more info to the Scaling Doc #11731

Merged
merged 18 commits into from
Sep 5, 2023
1 change: 1 addition & 0 deletions .spelling
Original file line number Diff line number Diff line change
@@ -224,6 +224,7 @@ v3.3
v3.3.
v3.4
v3.4.
v3.5
validator
versioning
webHDFS
33 changes: 29 additions & 4 deletions docs/scaling.md
Original file line number Diff line number Diff line change
@@ -12,12 +12,37 @@ As of v3.0, the controller supports having a hot-standby for [High Availability]

## Vertically Scaling

You can scale the controller vertically:
You can scale the controller vertically in these ways:

- If you have many workflows, increase `--workflow-workers` and `--workflow-ttl-workers`.
- Increase both `--qps` and `--burst`.
### Container Resource Requests

You will need to increase the controller's memory and CPU.
If you observe the Controller using its total request CPU or memory, you should increase those.
juliev0 marked this conversation as resolved.
Show resolved Hide resolved

### Adding Goroutines to Increase Concurrency

If you have sufficient CPU you can take advantage of it with more goroutines:
juliev0 marked this conversation as resolved.
Show resolved Hide resolved

- If you have many Workflows and you notice they're not being reconciled fast enough, increase `--workflow-workers`.
- If you're using `TTLStrategy` in your Workflows and you notice they're not being deleted fast enough, increase `--workflow-ttl-workers`.
- If you're using `PodGC` in your Workflows and you notice the Pods aren't being deleted fast enough, increase `--pod-cleanup-workers`.

>= v3.5
juliev0 marked this conversation as resolved.
Show resolved Hide resolved

- If you're using a lot of `CronWorkflows` and they don't seem to be firing on time, increase `--cron-workflow-workers`.

### K8S API Client Side Rate Limiting

The K8S client library rate limits the messages that can go out. The default values are fairly low. If you frequently see a message similar to this in the Controller log (issued by the library):
juliev0 marked this conversation as resolved.
Show resolved Hide resolved

`Waited for 7.090296384s due to client-side throttling, not priority and fairness, request: GET:https://10.100.0.1:443/apis/argoproj.io/v1alpha1/namespaces/argo/workflowtemplates/s2t`
agilgur5 marked this conversation as resolved.
Show resolved Hide resolved

or for >= v3.5: a warning like this (could be any CR, not just `WorkflowTemplate`):
juliev0 marked this conversation as resolved.
Show resolved Hide resolved

`Waited for 7.090296384s, request:GET:https://10.100.0.1:443/apis/argoproj.io/v1alpha1/namespaces/argo/workflowtemplates/s2t`
agilgur5 marked this conversation as resolved.
Show resolved Hide resolved

then assuming your K8S API Server can handle it:
juliev0 marked this conversation as resolved.
Show resolved Hide resolved

- Increase both `--qps` and `--burst`. The `qps` value indicates the average number of queries per second allowed by the K8S Client. The `--burst` value is the number of queries/sec the Client receives before it starts enforcing `qps`, so typically `--burst` > `qps`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Burst may not work as expected. See #8576

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, that's actually a different "Burst". That's for the RateLimiter configured in the workflow-controller-configmap controlling the Pod creation rate.

This setting just gets passed directly into the Kubernetes Client to control the rate of outgoing K8S API requests.

juliev0 marked this conversation as resolved.
Show resolved Hide resolved

## Sharding