-
-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queued tasks (OrmQ) are not always acknowledged #545
Comments
We have fairly frequent deploys, we'll check first how the numbers look like if we don't deploy for a day. We suspect that might be the culprit. |
Cool to see you guys use it this much. I have no idea what it could be just
from looking at this.
Do you have some kind of replicating database cluster?
Probably unrelated; I would highly recommend using at least Redis as the
broker.
The ORM one was really only added as a convenience for development, by the
request of users.
…On Thu, Apr 29, 2021 at 2:11 PM Kenny Heinonen ***@***.***> wrote:
We have fairly frequent deploys, we'll check first how the numbers look
like if we don't deploy for a day.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#545 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6AQNOUF4AK6RPQVETJOSLTLFEFJANCNFSM43ZXAHQQ>
.
|
Thanks, we'll look into that. I came up with a fix for the with db.transaction.atomic():
last = Success.objects.select_for_update().last()
if task["success"] and 0 < Conf.SAVE_LIMIT <= Success.objects.count():
last.delete() Related to #225. Comments? Should I make a PR? @Koed00 |
Update to the issue with tasks getting stuck: I created a Lowest available memory (%) for one of the clusters was around ~2% which is pretty low. One of those clusters also has 15min uptime while the others have about 6 hours (since last deployment). We also have one specific task that usually gets stuck to the queue more than the others and during its execution it is reading a big file -> uses lots of memory compared to others. This seems very promising. I'll test locally if I can recreate this issue by running Django Q out of memory 🙂 If that's the case, I'll probably have to lower |
@kennyhei great to see you made so much progress. If you want to make PR's for this that would be cool. |
@Koed00 Created two PR's, one for |
We haven't encountered any problems after lowering the |
@kennyhei already merged the limit fix, but need a bit more time to review the qmemory pr. Thanks for work, I'm sure you helped out a bunch of other people. |
Here's our current config, we are using Django 2.2.16:
We have 3 clusters and 12 workers in total. QInfo:
![Screenshot 2021-04-29 at 14 25 21](https://user-images.githubusercontent.com/1525463/116543683-d55d1300-a8f6-11eb-9cd6-0cdece2fbed1.png)
QMonitor:
![Screenshot 2021-04-28 at 17 57 30](https://user-images.githubusercontent.com/1525463/116544719-291c2c00-a8f8-11eb-9d75-73a12a2769fb.png)
Sometimes queued tasks are not acknowledged and relevant Task instance (from the OrmQ payload) does not exist. This seems to happen mostly with tasks that have long execution time (200-400 seconds).
TIMEOUT
should be big enough and we get no errors from worker (we are using Sentry for error reporting). Any ideas? Oh and even though theSAVE_LIMIT
is set to 10000, the limit doesn't always hold i.e. it seems to sometimes ignore this part:As you can see from
qinfo
, at the moment there are 10485 successful tasks in the database.The text was updated successfully, but these errors were encountered: