-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support batch record collections #402
Conversation
I think that's unfortunate. Like you said, records and batches both go through |
@etiennebarrie I can put up a PR that accepts |
end | ||
|
||
def process(batch_of_posts) | ||
Post.where(id: batch_of_posts.map(&:id)).update_all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get the batches, we shouldn't need the call to where
, shouldn't we be able to call update_all
directly on the batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately (again), Job Iteration yields batches of records as arrays:
https://github.com/Shopify/job-iteration/blob/75657194442c9ebde3873d5702029fd3a7743b8a/lib/job-iteration/active_record_enumerator.rb#L29
https://github.com/Shopify/job-iteration/blob/75657194442c9ebde3873d5702029fd3a7743b8a/lib/job-iteration/active_record_cursor.rb#L67-L69
This is something that would maybe be nice to fix upstream, but is a breaking change to JobIteration's API (see the BatchesJob
example in their README: https://github.com/Shopify/job-iteration/tree/75657194442c9ebde3873d5702029fd3a7743b8a#getting-started)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although there's no reason we couldn't turn this back into a Relation / Batch Enumerator again on our end, so it probably makes sense for us to do that (despite being annoying to be moving this back and forth between a BatchEnumerator / Array / Relation 😛 )
I think that's much better than introducing a class-level method because we're actually changing the items to be processed, and not just something about them, like in the example task, we're now getting a A DSL would be fine it was just changing the size of the batches that loads the records, but still passing the record one by one to the process method, e.g.: module Maintenance
class UpdatePostsInBatchesTask < MaintenanceTasks::Task
batch_size 50 # here adding or removing this line doesn't change anything about the rest of the code
# we load 50 records at a time, but still feed them one at a time to `process`
def collection
Post.all
end
def count
collection.count
end
def process(post)
post.update(content: "New content added on #{Time.now.utc}")
end
end
end In our proposition here, adding a call to the Whereas if we choose whether to batch inside the |
@etiennebarrie okay fair, that's a sensible argument to me 👍 I'll make the changes |
Closing in favour of #409 |
Closes: #401
This PR allows users to specify batch collections via the following API:
What are you trying to accomplish?
#process
will accept a batch of records instead of a single recordApproach Taken
in_batches
when defining their batch Task in order to denote it as a batch task@batch_size
, defining the batch size for the Task#in_batches?
and it's#batch_size
at the instance level#active_record_on_batches
method with the Task's batch size when building the enumerator inTaskJob#build_enumerator
Ticker#tick
to take an increment, defaulting to 1.TaskJob#each_iteration
, I've tweaked the call to#tick
to useinput.size
. This makes it so that tick count is still based on the number of records, rather than the number of batches. I felt that having progress in terms of records processed made more sense than the number of batches processed. Does this make sense?What about using
ActiveRecord::Batches::BatchEnumerator
?EnumeratorBuilder#build_active_record_enumerator_on_records
because it explicitly requires anActiveRecord::Relation
#active_record_on_batches