Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't batch.enqueue the callback after retrying a job within the batch #822

Closed
v2kovac opened this issue Feb 5, 2023 · 7 comments · Fixed by #824
Closed

Can't batch.enqueue the callback after retrying a job within the batch #822

v2kovac opened this issue Feb 5, 2023 · 7 comments · Fixed by #824

Comments

@v2kovac
Copy link

v2kovac commented Feb 5, 2023

Awesome work with the batches release!

Happy to solve this problem myself, just wondering if you can point me to where in the codebase batch.enqueue doesn't re-initiate the batch callback if you retry a single job then call batch.enqueue again.

eg:
batch.enqueue works fine when all jobs are done, i can just make the callback happen over and over
job_inside_batch.retry_job + batch.enqueue doesn't work, it thinks i'm in the case where we've already enqueued the callback

Why would i do this? Well sometimes I want to retry all the failed jobs on a custom basis, and still have the callback happen again if i ask via batch.enqueue when the retries are done. Think of it as a special case of batch.add where i'm just retrying existing jobs instead of adding new ones.

It seems like the callback enqueue isn't actually checking if a callback has already been enqueued, but rather checking the status of the jobs. I would like to put up a pr modifying that. Let me know if this wasn't enough context. Thanks again!

@v2kovac
Copy link
Author

v2kovac commented Feb 5, 2023

Ok I found the spot: _continue_discard_or_finish, i'll look into it.

I was under the impression for some reason that adding jobs or calling enqueue again re-initiated the callback but that isn't happening now when i try. I'll look into this and maybe put up a pr if i think it's interesting enough.

@v2kovac v2kovac closed this as completed Feb 5, 2023
@bensheldon
Copy link
Owner

@v2kovac super happy to have you trying Batches out!

If I'm understanding your example, job_inside_batch.retry_job + batch.enqueue should trigger the job callback again I think. If it doesn't, that seems like a bug.

If you wanted to try opening a PR, even if it simply has a failing test-case, that would be really helpful 🙏🏻

@bensheldon bensheldon reopened this Feb 5, 2023
@v2kovac
Copy link
Author

v2kovac commented Feb 5, 2023

Ok the real problem here i figured it out, not sure why it's happening yet.
All has to do with finished_at being set or not on the underlying batch_record.

steps:

batch.enqueue { TestJob.perform_later }  # 1st time triggers callback
batch.enqueue { TestJob.perform_later }  # 2nd time does not trigger callback (finished_at didn't update to nil on the stale reference to batch_record)
batch.reload; batch.enqueue { TestJob.perform_later } # callback runs! finished_at was updated to nil

trying to figure out why the stale record doesn't still just get updated to nil

@v2kovac
Copy link
Author

v2kovac commented Feb 5, 2023

Ok duh, the stale record has finished_at -> nil, so setting nil -> nil doesn't trigger an activerecord update.

@bensheldon
Copy link
Owner

@v2kovac right! I can make a fix for that. I think it's as simple as adding a finished_at_will_update!. I don't think it needs to take a lock on the batch 🤔

@v2kovac
Copy link
Author

v2kovac commented Feb 6, 2023

Yeah not sure how it would handle this:
previous "final" job is finishing up, it checks if !finished_at && enqueued_at && jobs.where(finished_at: nil).count.zero? in _continue_discard_or_finish and gets true. But then enqueue is called adds all the new jobs and finished_at is already nil. And then the previous "final" job updates finished_at right after, callback is never initiated after the new set of jobs were added.

Not sure if this is already addressed by the advisory lock, maybe it's blocking job adds until _continue_discard_or_finish is done.

FYI the solution to the topline problem for me is just setting finished_at manually to nil on the batch, and then retrying all failed jobs. I always only retry when the batch is finished. Exactly what i needed!

@bensheldon
Copy link
Owner

This is really helpful for you to think through this with me! Thank you 🙌🏻

I think I need to also reset enqueued_at = nil when calling enqueue multiple times. If I remember right, my hope was enqueued_at would be more reference to know when the batch was first enqueued, but you've helped me realize it's actually part of the logic for determining whether the set of jobs has been fully inserted/enqueued, so it will also need to be set to nil before enqueuing the jobs (as if it was a freshly initialized batch), and then set after the jobs are added/enqueued/inserted.

Hopefully I have my thinking correct in #824 by now 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

2 participants