Can't batch.enqueue the callback after retrying a job within the batch #822

v2kovac · 2023-02-05T08:43:45Z

Awesome work with the batches release!

Happy to solve this problem myself, just wondering if you can point me to where in the codebase batch.enqueue doesn't re-initiate the batch callback if you retry a single job then call batch.enqueue again.

eg:
batch.enqueue works fine when all jobs are done, i can just make the callback happen over and over
job_inside_batch.retry_job + batch.enqueue doesn't work, it thinks i'm in the case where we've already enqueued the callback

Why would i do this? Well sometimes I want to retry all the failed jobs on a custom basis, and still have the callback happen again if i ask via batch.enqueue when the retries are done. Think of it as a special case of batch.add where i'm just retrying existing jobs instead of adding new ones.

It seems like the callback enqueue isn't actually checking if a callback has already been enqueued, but rather checking the status of the jobs. I would like to put up a pr modifying that. Let me know if this wasn't enough context. Thanks again!

The text was updated successfully, but these errors were encountered:

v2kovac · 2023-02-05T09:13:41Z

Ok I found the spot: _continue_discard_or_finish, i'll look into it.

I was under the impression for some reason that adding jobs or calling enqueue again re-initiated the callback but that isn't happening now when i try. I'll look into this and maybe put up a pr if i think it's interesting enough.

bensheldon · 2023-02-05T18:02:11Z

@v2kovac super happy to have you trying Batches out!

If I'm understanding your example, job_inside_batch.retry_job + batch.enqueue should trigger the job callback again I think. If it doesn't, that seems like a bug.

If you wanted to try opening a PR, even if it simply has a failing test-case, that would be really helpful 🙏🏻

v2kovac · 2023-02-05T19:12:37Z

Ok the real problem here i figured it out, not sure why it's happening yet.
All has to do with finished_at being set or not on the underlying batch_record.

steps:

batch.enqueue { TestJob.perform_later }  # 1st time triggers callback
batch.enqueue { TestJob.perform_later }  # 2nd time does not trigger callback (finished_at didn't update to nil on the stale reference to batch_record)
batch.reload; batch.enqueue { TestJob.perform_later } # callback runs! finished_at was updated to nil

trying to figure out why the stale record doesn't still just get updated to nil

v2kovac · 2023-02-05T19:18:11Z

Ok duh, the stale record has finished_at -> nil, so setting nil -> nil doesn't trigger an activerecord update.

bensheldon · 2023-02-05T22:57:36Z

@v2kovac right! I can make a fix for that. I think it's as simple as adding a finished_at_will_update!. I don't think it needs to take a lock on the batch 🤔

v2kovac · 2023-02-06T01:03:20Z

Yeah not sure how it would handle this:
previous "final" job is finishing up, it checks if !finished_at && enqueued_at && jobs.where(finished_at: nil).count.zero? in _continue_discard_or_finish and gets true. But then enqueue is called adds all the new jobs and finished_at is already nil. And then the previous "final" job updates finished_at right after, callback is never initiated after the new set of jobs were added.

Not sure if this is already addressed by the advisory lock, maybe it's blocking job adds until _continue_discard_or_finish is done.

FYI the solution to the topline problem for me is just setting finished_at manually to nil on the batch, and then retrying all failed jobs. I always only retry when the batch is finished. Exactly what i needed!

bensheldon · 2023-02-06T02:55:54Z

This is really helpful for you to think through this with me! Thank you 🙌🏻

I think I need to also reset enqueued_at = nil when calling enqueue multiple times. If I remember right, my hope was enqueued_at would be more reference to know when the batch was first enqueued, but you've helped me realize it's actually part of the logic for determining whether the set of jobs has been fully inserted/enqueued, so it will also need to be set to nil before enqueuing the jobs (as if it was a freshly initialized batch), and then set after the jobs are added/enqueued/inserted.

Hopefully I have my thinking correct in #824 by now 😀

v2kovac closed this as completed Feb 5, 2023

bensheldon reopened this Feb 5, 2023

bensheldon mentioned this issue Feb 5, 2023

Ensure batch is reloaded before updating on multiple enqueues #824

Merged

bensheldon closed this as completed in #824 Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't batch.enqueue the callback after retrying a job within the batch #822

Can't batch.enqueue the callback after retrying a job within the batch #822

v2kovac commented Feb 5, 2023

v2kovac commented Feb 5, 2023

bensheldon commented Feb 5, 2023

v2kovac commented Feb 5, 2023

v2kovac commented Feb 5, 2023

bensheldon commented Feb 5, 2023

v2kovac commented Feb 6, 2023

bensheldon commented Feb 6, 2023

Can't batch.enqueue the callback after retrying a job within the batch #822

Can't batch.enqueue the callback after retrying a job within the batch #822

Comments

v2kovac commented Feb 5, 2023

v2kovac commented Feb 5, 2023

bensheldon commented Feb 5, 2023

v2kovac commented Feb 5, 2023

v2kovac commented Feb 5, 2023

bensheldon commented Feb 5, 2023

v2kovac commented Feb 6, 2023

bensheldon commented Feb 6, 2023