Argo is slow #6265

justinmchase · 2021-07-01T19:04:43Z

justinmchase
Jul 1, 2021

Firing up a pod and container for every step in a workflow is pretty slow. Is there anyway to have a bank of dedicated pods and containers fired up and waiting with all of the code and just have them processing steps as fast as they can?

When I've done this in the past the workflow engine used a queue to coordinate with horizontally scaled workers. The workers would pull a message out of the queue, hydrate the state of the workflow step, crunch the logic and progress the workflow.

We were able to crunch literally thousands of steps per minute like this. An entire workflow could run in under a second. The pods horizontally scaled so if the queue got backed up it would fire up a few more pods.

But here with argo, with even the most trivial of hello world workflows I am seeing it take seconds to fire up a new pod and container for every step. What if I want a workflow that crunches 20 million rows and each has to run a workflow? How long is that going to take me?

Or what kind of options do I have to scale argo to handle running 20M workflows in under an hour, for example?

Answered by terrytangyuan

Jul 20, 2021

Perhaps check out https://github.com/argoproj-labs/argo-dataflow which re-uses pods for processing each data item and will scale automatically based on the HPA or queue length at runtime.

View full answer

justinmchase · 2021-07-20T14:36:04Z

justinmchase
Jul 20, 2021
Author

I just was reading a csv file with 100 lines in it and I essentially used:

withSequence:
  start: 0
  count: 100

To process each line in the file and it fired up 100 pods... which the node could only handle so many at a time and each one takes about 4 seconds just to create the pod and execute the logic, the logic took 100ms to run but because of all of the pod launching coordination the whole thing took minutes to run. its pretty crazy how slow it is.

What I think I need is a way to fire up N pods and then have them be long running and pulling from a queue, I don't see how this can scale by executing every step as its own pod. I know this seems like a fundamental design of Argo but I don't see how it can work. As it is right now its so slow even for the most basic workflows.

Can someone talk me down from a ledge? How can I get my entire workflow, of say 10 steps, down to sub second times?

0 replies

terrytangyuan · 2021-07-20T14:43:07Z

terrytangyuan
Jul 20, 2021
Maintainer

Perhaps check out https://github.com/argoproj-labs/argo-dataflow which re-uses pods for processing each data item and will scale automatically based on the HPA or queue length at runtime.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Argo is slow #6265

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Argo is slow #6265

justinmchase Jul 1, 2021

Replies: 2 comments

justinmchase Jul 20, 2021 Author

terrytangyuan Jul 20, 2021 Maintainer

justinmchase
Jul 1, 2021

justinmchase
Jul 20, 2021
Author

terrytangyuan
Jul 20, 2021
Maintainer