Support passing activity task rate limit on worker options #311

jazev-stripe · 2024-07-26T20:00:20Z

Summary

This PR exposes the activity_max_tasks_per_second option when constructing a new Temporal::Worker instance. This setting sets the max_tasks_per_second option when making PollActivityTaskQueue RPCs to the server, which has the effect of limiting how fast the server will give out activity tasks (both new activities and retries of already-running activities).

This same option is exposed in other SDKs; for example in Java it's available on the WorkerOptions struct as the setMaxTaskQueueActivitiesPerSecond option. The doc comment added in this PR was based off of the doc comment in the Java SDK. The semantics of the values activity_max_tasks_per_second can be (non-negative decimal number) and when it sets set on the wire (if it's > 0) was copied directly from the Java SDK's implementation, here (although it diverges slightly from the Go SDK).

Motivation

Allow Ruby workers to control the rate of processing activities, to control impact on downstream systems.

Test plan

Unit tests

Existing unit tests/example unit tests pass, and the new tests I added validate that, at each step, the option is making its way to the underlying gRPC connection:

bundle exec rspec spec/
cd examples && bundle exec rspec spec/

Manual tests

I also manually validated that the activity task rate limit works as expected. To do so, I modified the examples/bin/worker script to include a rate-limit on the Worker.new call:

worker = Temporal::Worker.new(binary_checksum: `git show HEAD -s --format=%H`.strip, activity_max_tasks_per_second: 0.5)

Then, I ran the AsyncHelloWorldWorkflow workflow with 10 activities, and validated that it took around 20-30 seconds (10 activities * 0.5 activities/second + a bit of overhead/inaccuracy) for the workflow to complete:

cd examples
docker-compose up
bin/worker # with special code patch
bin/trigger AsyncHelloWorldWorkflow 10

Repeating the test without the change to examples/bin/worker (i.e. an unlimited activity task queue) shows that the rate-limit makes a meaningful difference:

gytisgreitai · 2024-08-16T12:37:55Z

@jazev-stripe can you elaborate a bit more of how this should work?

I've launched a single worker:

Temporal::Worker.new(
    activity_max_tasks_per_second: 0.1,
    activity_thread_pool_size: 1,
  )

and have one activity + one workflow:

class HelloWorldWorkflow < Temporal::Workflow
  def execute
    HelloWorldActivity.execute!
  end
end

class HelloWorldActivity < Temporal::Activity
  def execute
    Net::HTTP.get_response(URI.parse('http://localhost:8000'))

    "Hello World"
  end
end

then I've locally launched a simple python server on the 8000 port. And instantly scheduled 20 workflows with simple (1..20).each do ....And what I'm seeing is a bit strange to me:

::ffff:127.0.0.1 - - [16/Aug/2024 15:23:01] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:01] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:04] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:04] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:08] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:08] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:12] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:12] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:16] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:20] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:24] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:28] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:23:40] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:24:20] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:25:00] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:25:40] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:26:20] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:27:00] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:27:40] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [16/Aug/2024 15:28:20] "GET / HTTP/1.1" 200 -

I would somehow expect it to run once every 10 seconds? But what I get is two parallel activities every 4 seconds which seem to then slow down until almost 40 seconds between runs?

jazev-stripe · 2024-08-16T16:53:53Z

I would somehow expect it to run once every 10 seconds? But what I get is two parallel activities every 4 seconds which seem to then slow down until almost 40 seconds between runs?

The rate-limiting is done on the server-side, so unfortunately I think this is just a result of the rate-limiting not being super accurate for very small values (and also possibly not accurate at short time scales). I'm not familiar with the server implementation, but if you ask in the Temporal communnity Slack they would probably be able to give you more details.

If I'm right, you should be able to reproduce that test/the results with any of the official SDKs, since they all use the same mechanism to implement activity task queue rate-limiting (that one Protobuf field when polling for activity tasks).

I also noticed a similar phenomenon when testing locally ("+ a bit of overhead/inaccuracy"):

it took around 20-30 seconds (10 activities * 0.5 activities/second + a bit of overhead/inaccuracy) for the workflow to complete...

One other thing to keep in mind is that the "current" value of the rate-limit seems to be stored on the server side in a way that takes O(minutes) for a new value to propagate. So, make sure you haven't run the worker on the same task queue without a rate-limit shortly before your test. In other words, make sure your test runs with a "clean slate" on that task queue, to make sure it's not being affected by anything else (you could try cleaning all of the server state before running your test to be sure).

jazev-stripe added 2 commits July 25, 2024 15:38

support passing activity task rate limit on worker options

61c6154

remove extra space in README

afbb762

jeffschoner approved these changes Jul 26, 2024

View reviewed changes

cj-cb approved these changes Jul 30, 2024

View reviewed changes

cj-cb merged commit 0d3a8bb into coinbase:master Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support passing activity task rate limit on worker options #311

Support passing activity task rate limit on worker options #311

Uh oh!

jazev-stripe commented Jul 26, 2024 •

edited

Loading

Uh oh!

gytisgreitai commented Aug 16, 2024

Uh oh!

jazev-stripe commented Aug 16, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Support passing activity task rate limit on worker options #311

Support passing activity task rate limit on worker options #311

Uh oh!

Conversation

jazev-stripe commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Unit tests

Manual tests

Uh oh!

gytisgreitai commented Aug 16, 2024

Uh oh!

jazev-stripe commented Aug 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

jazev-stripe commented Jul 26, 2024 •

edited

Loading

jazev-stripe commented Aug 16, 2024 •

edited

Loading