-
-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bulk enqueue functionality #790
Conversation
@julik thanks for opening this PR. You must have been reading my mind because I was just thinking: "hmm. my Batch PR has stalled a minute, maybe I should extract the Bulk API out of that". 🧠 Could you take a look at that implementation and see if it would meet your needs: https://github.com/bensheldon/good_job/pull/712/files#diff-660985340da42f7f83f1a5e03b05f511aa5a1a7221c07be3cbf05c150060dca5 Here's what I'm thinking:
|
Hey @bensheldon no worries - it is a spike after all. The The |
Yep! I was thinking you could copy-paste the I agree with you that the the Rails interface is subject to change, but I think the spirit of it (hopefully) is pretty solid: So the steps I'm imagining are:
|
I'll try that once I have time. Meanwhile we'll field test the implementation that is already in this PR and see what the improvements will be ;-) |
@julik sounds good 👍 do you mind if I push up some changes to this branch? I can also make my own branch that builds on this one for discussion. No worries either way. I really appreciate the collaboration! |
Sure go ahead, do you want me to add you as collaborator on my fork? might save some rebasings later |
Couple of other things:
|
Ok - did some manipulations which should set us off in the right direction, and you are in the collaborator list 🚀 |
This reverts commit 1765959.
…` method # Conflicts: # app/models/good_job/execution.rb # lib/good_job/adapter.rb
@julik I just pushed up a bunch of changes, I think I may have removed some of the instrumentation you just added this morning. That wasn't intentional to reject them, just expedient to get some of the more major changes I wanted to see in this. I think we can get inline-enqueing working, it will just involve creating a transaction within which to take the advisory lock. I can work on that. |
Lovely, thanks for the changes! My worry with the advisory lock was that if we I'll stop tinkering for now, sorry - this was tasty work. |
Give a shout when I could continue and whether you want a review |
@julik yes! This is fun. I'm still working on it. I think I feel ok with I still do need to work on the Adapter side:
This has definitely surfaced for me a lot of messiness in the Adapter and Execution boundaries. Stuff to work though for me :D |
What do we want to do with enqueue concurrency? It seems we would need to do a SELECT for all the concurrency keys to not INSERT jobs which would be overly concurrent |
thread_mattr_accessor :current_buffer | ||
|
||
# @return [nil, Array<ActiveJob::Base>] The ActiveJob instances that have been buffered; nil if no active buffer | ||
def self.capture(active_jobs = nil, queue_adapter: nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under which conditions will the active_jobs
be nil? and under which conditions will the queue_adapter
be nil?
active_jobs_by_queue_adapter.each do |adapter, jobs| | ||
jobs = jobs.reject(&:provider_job_id) # Do not re-enqueue already enqueued jobs | ||
|
||
if adapter.respond_to?(:enqueue_all) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the Bulk
API is used, the adapter involved will always be GoodJob's - and if we have Bulk
then we support enqueue_all
by definition. Is there a need to copy the check from ActiveJob here, which they designed in so that multiple adapters can be used in the same app?
end | ||
end | ||
|
||
def active_jobs_by_queue_adapter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned above - all adapters here will be GoodJob, just potentially with different settings/different DBs?
buffer.enqueue | ||
buffer.active_jobs | ||
elsif current_buffer.present? | ||
current_buffer.enqueue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#enqueue
and #active_jobs
seem to always be used together - maybe return them from enqueue
and reset the buffer?
@values = [] | ||
end | ||
|
||
def add(active_jobs, queue_adapter: nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the queue_adapter
need to be overridable via a parameter?
I'm planning to work on the Bulk interface this week.
|
Also, I think this needs to cover the Edit: oops, I already did this: https://github.com/bensheldon/good_job/pull/790/files#diff-660985340da42f7f83f1a5e03b05f511aa5a1a7221c07be3cbf05c150060dca5R75 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julik thank you for getting this started and working through it with me! 🚀
Glad to see it got released! sorry I didn't stick around till the end - some other stuff took priority |
In a number of situations we've seen cases when there are large numbers of jobs generated from one task. The most typical example is sending out notifications or emails to groups of users of an application. When hundreds or thousands of jobs need to be enqueued, doing one
INSERT
per job can become expensive. If oneINSERT
takes 1ms, doing 1K inserts makes it a second. ActiveJob does not support bulk-enqueue by itself, but we can actually use the thread locals to intercept the enqueued job records and delay persisting them until the end of the block.Note that I've picked "bulk" as name for this to avoid conflating it with "batches" which are in progress. It serves a different purpose though: rapidly inserting large numbers of jobs and nothing else.
The usage is straightforward: