Skip to content

Commit

Permalink
docs: Add some more info to the Scaling Doc (#11731)
Browse files Browse the repository at this point in the history
Signed-off-by: Julie Vogelman <julievogelman0@gmail.com>
Co-authored-by: Anton Gilgur <4970083+agilgur5@users.noreply.github.com>
  • Loading branch information
juliev0 and Anton Gilgur authored Sep 5, 2023
1 parent c31132d commit 849f09c
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 4 deletions.
1 change: 1 addition & 0 deletions .spelling
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ v3.3
v3.3.
v3.4
v3.4.
v3.5
validator
versioning
webHDFS
Expand Down
39 changes: 35 additions & 4 deletions docs/scaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,43 @@ As of v3.0, the controller supports having a hot-standby for [High Availability]

## Vertically Scaling

You can scale the controller vertically:
You can scale the controller vertically in these ways:

- If you have many workflows, increase `--workflow-workers` and `--workflow-ttl-workers`.
- Increase both `--qps` and `--burst`.
### Container Resource Requests

You will need to increase the controller's memory and CPU.
If you observe the Controller using its total CPU or memory requests, you should increase those.

### Adding Goroutines to Increase Concurrency

If you have sufficient CPU cores, you can take advantage of them with more goroutines:

- If you have many Workflows and you notice they're not being reconciled fast enough, increase `--workflow-workers`.
- If you're using `TTLStrategy` in your Workflows and you notice they're not being deleted fast enough, increase `--workflow-ttl-workers`.
- If you're using `PodGC` in your Workflows and you notice the Pods aren't being deleted fast enough, increase `--pod-cleanup-workers`.

> v3.5 and after
- If you're using a lot of `CronWorkflows` and they don't seem to be firing on time, increase `--cron-workflow-workers`.

### K8S API Client Side Rate Limiting

The K8S client library rate limits the messages that can go out.

If you frequently see messages similar to this in the Controller log (issued by the library):

```txt
Waited for 7.090296384s due to client-side throttling, not priority and fairness, request: GET:https://10.100.0.1:443/apis/argoproj.io/v1alpha1/namespaces/argo/workflowtemplates/s2t
```

Or, in >= v3.5, if you see warnings similar to this (could be any CR, not just `WorkflowTemplate`):

```txt
Waited for 7.090296384s, request:GET:https://10.100.0.1:443/apis/argoproj.io/v1alpha1/namespaces/argo/workflowtemplates/s2t
```

Then, if your K8S API Server can handle more requests:

- Increase both `--qps` and `--burst` arguments for the Controller. The `qps` value indicates the average number of queries per second allowed by the K8S Client. The `burst` value is the number of queries/sec the Client receives before it starts enforcing `qps`, so typically `burst` > `qps`. If not set, the default values are `qps=20` and `burst=30` (as of v3.5 (refer to `cmd/workflow-controller/main.go` in case the values change)).

## Sharding

Expand Down

0 comments on commit 849f09c

Please sign in to comment.