-
-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
race condition for batches callback #832
Comments
@v2kovac could you share how you're enqueuing them? Also, I see "Aurora" and "MySQL" which might be the cause. GoodJob is only compatible with Postgres. I'm surprised it works at all, but the Advisory Locks might not function the same. |
Aurora PostgreSQL sorry, also it's an on_finish callback and i guess it's just being called like a ton of times all at the same time. I think this is just a real race condition somehow happening, very consistent btw with the setup here. i'm still on 3.10.0, enqueuing like so:
|
Btw i just ran a massive batch job 1 hour long 15k plus jobs at full throttle and the callback only got called once. This might just be because of the sleep here, something weird. |
hmm yeah i had it happen again for a real job this time:
enqueued the callback 70 times, the only thing i can think of is sometimes i get these errors:
we switched off serverless aurora to r6g.larges because of this error, but this multiple callback issue still just happened. It doesn't happen every time though pretty rare so far, but enough to be concerning. |
Hmm. Have you upgraded to 3.10.1 or later? I think this PR might help because it has the |
i'll upgrade to latest and stress test again |
looks good! haven't had it happen after a dozen stress tests |
this still looks good now i consider it solved, thanks |
@v2kovac Yay! Thanks for working through this with me 🙌🏻 |
i have a TestJob
I run like 20 of these at once in the enqueue and i get almost as many callback notifications from slack 10 seconds later (the callback pings slack). Works for a small amount of jobs too. This is only in production where we've got 2 processes with 25 max_threads each, running aurora with a multiple database setup with mysql. Might be a config issue on our part i need to look into the devops side more.
The text was updated successfully, but these errors were encountered: