Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process payloads in batches #93

Open
grantr opened this issue Oct 3, 2014 · 6 comments
Open

Process payloads in batches #93

grantr opened this issue Oct 3, 2014 · 6 comments

Comments

@grantr
Copy link
Collaborator

grantr commented Oct 3, 2014

Some jobs would benefit from being able to process a bunch of payloads at once. Specifically, a job that writes documents to a data store could see a dramatic performance increase by sending 50 or 100 documents per request instead of 1. Since it may not be possible for the producer to enqueue entire batches in a single payload, the queue framework needs to handle popping multiple payloads and collecting them into a batch job.

This could work well with batch pop support in the backend, but doesn't require it. Even if the backend only supports popping one payload at a time, a job might still want a batch of payloads.

@mauricio
Copy link
Collaborator

mauricio commented Oct 3, 2014

well -> #82

@grantr
Copy link
Collaborator Author

grantr commented Oct 3, 2014

Ah ha! I will take a look @mauricio

@mauricio
Copy link
Collaborator

mauricio commented Oct 3, 2014

👍

@grantr
Copy link
Collaborator Author

grantr commented Oct 3, 2014

@mauricio Here's what I would do differently:

  • Implement batch push separately.
  • Give the backend control over how many payloads to pop. The backend may want to implement adaptive batch sizes, for example. Batch-aware backends return a BatchPayload object that contains multiple payloads. The rest of Qu can remain blissfully unaware of batches (for now).
  • add a BatchJob class specifically for handling multiple payloads.

BatchPayload is an array wrapper that knows how to dispatch multiple payloads. For standard Qu::Job classes it processes each payload one at a time. Payloads for BatchJob classes are bundled up and sent to the job as an array. The job implements each to iterate through the payload list.

class BulkWrite < Qu::BatchJob
  batch_size 50

  def perform
    set_up_batch_write
    each do |arg1, arg2|
      do_something_with_payload(arg1, arg2)
    end
    complete_batch_write
  end
end

@mauricio
Copy link
Collaborator

mauricio commented Oct 3, 2014

@grantr that would simplify the implementation a lot.

Another problem is, how do you know if a queue produces batch jobs or not?

This was one of the main complications for my implementation, having to pull and push stuff back to the queue when they are "single" jobs instead of batch jobs. My usage back at the time was a single use queue, so I didn't have to care about this much, but if you're running on top of a general queue this could give you trouble as clients mix batch and non-batch jobs at the same place.

Why would you implement batch pushes separately?

Seems like a very simple solution to have given you have a backend that supports them, you just push many messages at once instead of one at a time.

@grantr
Copy link
Collaborator Author

grantr commented Oct 3, 2014

how do you know if a queue produces batch jobs or not?

IMO this is the backend's responsibility. The backend can decide whether it will pull payloads in bulk from the queue service. If it so decides, then the BatchPayload it creates may contain payloads for multiple jobs. When it is processing the payloads, it can look at each job to see whether it accepts batches or not. If so, it groups all the like payloads into a single job and performs once. If not, it performs each job individually.

This decouples batch pop from batch process, and keeps the perform logic in the *Payload classes. There's never a need to return jobs to the queue, because the fallback is to perform all payloads in sequence as if they were not batched.

Batch push is separate because it doesn't have anything to do with batch processing (IMHO). Consumers don't need to know if producers have batch push, and producers don't need to know if consumers have batch pop.

@grantr grantr mentioned this issue Oct 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants