Argo is slow #6265
-
Firing up a pod and container for every step in a workflow is pretty slow. Is there anyway to have a bank of dedicated pods and containers fired up and waiting with all of the code and just have them processing steps as fast as they can? When I've done this in the past the workflow engine used a queue to coordinate with horizontally scaled workers. The workers would pull a message out of the queue, hydrate the state of the workflow step, crunch the logic and progress the workflow. We were able to crunch literally thousands of steps per minute like this. An entire workflow could run in under a second. The pods horizontally scaled so if the queue got backed up it would fire up a few more pods. But here with argo, with even the most trivial of hello world workflows I am seeing it take seconds to fire up a new pod and container for every step. What if I want a workflow that crunches 20 million rows and each has to run a workflow? How long is that going to take me? Or what kind of options do I have to scale argo to handle running 20M workflows in under an hour, for example? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
I just was reading a csv file with 100 lines in it and I essentially used: withSequence:
start: 0
count: 100 To process each line in the file and it fired up 100 pods... which the node could only handle so many at a time and each one takes about 4 seconds just to create the pod and execute the logic, the logic took 100ms to run but because of all of the pod launching coordination the whole thing took minutes to run. its pretty crazy how slow it is. What I think I need is a way to fire up N pods and then have them be long running and pulling from a queue, I don't see how this can scale by executing every step as its own pod. I know this seems like a fundamental design of Argo but I don't see how it can work. As it is right now its so slow even for the most basic workflows. Can someone talk me down from a ledge? How can I get my entire workflow, of say 10 steps, down to sub second times? |
Beta Was this translation helpful? Give feedback.
-
Perhaps check out https://github.com/argoproj-labs/argo-dataflow which re-uses pods for processing each data item and will scale automatically based on the HPA or queue length at runtime. |
Beta Was this translation helpful? Give feedback.
Perhaps check out https://github.com/argoproj-labs/argo-dataflow which re-uses pods for processing each data item and will scale automatically based on the HPA or queue length at runtime.