Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support throttling on arbitrary resource names #27

Open
myronmarston opened this issue Aug 13, 2013 · 6 comments
Open

Support throttling on arbitrary resource names #27

myronmarston opened this issue Aug 13, 2013 · 6 comments

Comments

@myronmarston
Copy link
Contributor

Currently Qless supports throttling at a per-queue level. We have a need to due throttling on an arbitrary named resource (in our case a MySQL host in our shard ring). To prevent our MySQL hosts from getting overloaded, we've set a hard connection limit of 30 connections for our shard building jobs. We rescue and retry "too many connections" errors, but it would be more efficient if we could set a max concurrency per host, w/o having to put jobs in a per-host queue.

So...here's an idea for how we could refactor the current concurrency throttling to be more general:

  • Each job can have a set of named throttlable resources. When enqueing a job you can specify a list of these: queue.put(MyJobClass, { data: 15 }, throttlable_resources: ['foo', 'bar']).
  • The queue name and klass name are implicitly included in the list of throttlable resources, but not actually stored in the redis set qless-core will use for this. (The internal QlessJob#throttlable_resources qless-core API will take care of adding the queue and klass names to this list when things request the throttlable resources).
  • qless-core will maintain a set of counters for each named resource, that indicates the current number of jobs that have the named resource in its list of throttable_resources. In Pop() it will increment the counter for each throttlable resource of the popped job.
  • When a job completes, fails or times out, it will decrement the counters for each throttlable resource.
  • In Pop() it will also check that a potentially popped job's throttlabe resources all have available capacity by looking at the counters. If any of the counters are full, it won't pop that job, moving on in the queue to the next job.
  • We might consider using sets of jids (rather than counters) for each throttlable resource, as the set of jids gives us more information: it tells us what all the jobs that are using that resource are. scard can be used to get the count in O(1) time.
  • qless-core would provide a way to set limits on these throttlable resources, potentially using its config API.

In our use case, we would use MySQL host names as our throttable resources. This could supercede the existing per-queue throttling (as a queue name would be an implicit throttled resource and this could easily support that use case). It would also nicely support per-job-class throttling.

Thoughts, @dlecocq?

/cc @proby

@databus23
Copy link
Contributor

This just made my day. I like the idea a lot.I think it generalizes the throttling in a very useful way allowing much finer grained control over concurrency when jobs are using external resources.

@wr0ngway
Copy link

I like it as well.
+1 to storing the list of jids instead of just a count
Also, you mention a resource is released "When a job completes, fails or times out", are retries considered a fail, or will the resource be released while a job is waiting for its retry period to pass?

@myronmarston
Copy link
Contributor Author

+1 to storing the list of jids instead of just a count

Yeah, the more I think about it, the more I like it being a set (not a list) of jids. If we used the counter and had a "counter leak", it doesn't provide the data to be able to troubleshoot what jobs are holding the resource. A jid set gives the you the details to be able to inspect all the jobs holding that resource.

Also, you mention a resource is released "When a job completes, fails or times out", are retries considered a fail, or will the resource be released while a job is waiting for its retry period to pass?

I think the job's jid should be in the resource set only while it is in the running state.

@wr0ngway
Copy link

On Aug 14, 2013, at 11:14 AM, Myron Marston wrote:

I think the job's jid should be in the resource set only while it is in the running state.

Agreed

@stuartcarnie
Copy link

I'm going to create a fork of qless-core and implement this feature. Any feedback on the proposed implementation welcomed.

Introduce a new class called QlessResource using the following keys

ql:r:[id]-jids
A sorted set of job identifiers requiring the specified resource

ql:r:[id]-pending
A sorted set of job identifiers waiting on the specified resource becoming available. The is used when existing jobs are releasing a resource to assign to the next job and move it to the -locks set

ql:r:[id]-locks
A set of job identifiers which have an active lock for the specified resource

ql:r:[id]
A hash table identifying various properties for this resource. The only key is max, which indicates the maximum usage available

Extend the put command so that an additional options parameter can be included for specifying an array of resource identifiers. If the resources key is present, call the Qless.resource(id):acquire(jid) API for each resource. The acquire API will either add an entry to the ql:resource:[id]-locks or ql:resource:[id]-pending sets depending on the availability. The set will be sorted based on priority to ensure correct FIFO ordering.

Extend the pop command, so invalidated locks release their resources and scheduled or recurring jobs that require resources appropriately acquire them before being added to the work queue, otherwise they are added to the ql:resource:[id]-pending set.

Extend the complete command, so that if the completing job has active resource locks, releases them, assigning them to the next pending job and moving it to the ql:q:[name]-work set if it has successfully acquired all required resources.

Extend the fail and retry commands, so they release active resource locks when a job transitions from the running state, and enqueue pending jobs waiting for the specified resource.

The patch will be documentation of the implementation details, but I'll be conscious of keeping things as efficient as possible.

@wr0ngway
Copy link

Was there ever a PR for this? I would love to see it added to master, so let me know if there is something I can do to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants