Getting "MySQL server has gone away" on new tasks after idling #76

LightJockey · 2015-09-28T08:15:54Z

Hello,

I've been trying Django Q for a new project as an alternative to Celery, and so far I love it, but I keep getting a "MySQL server has gone away" error on each new task received by my cluster after it idled for at least 8 hours, which corresponds to the wait_timeout of my MySQL server. Also, to make sure it was related to wait_timeout I tried lowering it to a minute and the same thing keeps happening until I restart the cluster.

Is this behavior intended? Shouldn't Django or Django Q handle this and at least try to re-establish a connection?

I tried changing Django's CONN_MAX_AGE in my settings.py to both 0 and a value higher but lower than MySQL's wait_timeout, but no luck.
So after a bit of googling I found this: https://code.djangoproject.com/ticket/21597#comment:29, they recommend using connection.close() so that Django can connect again, and it works for my task, meaning it can change stuff on one of my models and save it, but the task itself isn't getting saved and doesn't appear under successful tasks.

Is there any workaround other than periodically restarting the cluster or increasing wait_timeout to insane values?

My Django Q settings are vanilla, using redis as broker.

08:56:23 [Q] INFO Q Cluster-26132 starting.
08:56:23 [Q] INFO Process-1:3 ready for work at 26139
08:56:23 [Q] INFO Process-1:5 monitoring at 26141
08:56:23 [Q] INFO Process-1 guarding cluster at 26138
08:56:23 [Q] INFO Process-1:6 pushing tasks at 26142
08:56:23 [Q] INFO Process-1:4 ready for work at 26140
08:56:23 [Q] INFO Q Cluster-26132 running.

08:57:46 [Q] INFO Process-1:3 processing [ink-bakerloo-michigan-finch]
08:57:47 [Q] INFO Processed [ink-bakerloo-michigan-finch]

08:59:19 [Q] INFO Process-1:4 processing [rugby-five-skylark-lake]
08:59:19 [Q] ERROR (2006, 'MySQL server has gone away')
08:59:19 [Q] INFO Processed [rugby-five-skylark-lake]

09:01:24 [Q] INFO Process-1:3 processing [uniform-one-hydrogen-december]
09:01:25 [Q] ERROR (2006, 'MySQL server has gone away')
09:01:25 [Q] INFO Processed [uniform-one-hydrogen-december]

The text was updated successfully, but these errors were encountered:

Koed00 · 2015-09-28T08:23:48Z

I just read through that ticket. I'll see if I can somehow check the db connection.
I assume you're not seeing the saved results in the database?

LightJockey · 2015-09-28T08:30:23Z

Thank you for your quick response. Yes, no results in the database as soon as that error pops out.

Koed00 · 2015-09-28T08:50:03Z

I can check for stale connections and reset them on every save, but this might affect performance a bit.
So I first want to try if it's enough to have the scheduler ping reset stale connections every minute or so.
I've added that to the dev branch just now. Are you able to run the dev branch for a while to see if it solves the problem?

If not we'll have to resort to testing the db connection on every save or closing it like the guys over at Django seem to prefer.

Koed00 · 2015-09-28T08:53:57Z

Hmm. I just realized this probably won't work when the scheduler and saver run in different threads. I need more coffee.

LightJockey · 2015-09-28T09:25:29Z

Yeah, no change whatsoever running with the dev branch, tried several times.

Adds a check for old connections in both the workers and the monitor , every DB_TIMEOUT seconds

Koed00 · 2015-09-28T09:44:37Z

I really shouldn't be coding on a Monday morning before coffee.

Meanwhile I made a version that checks connections both in the worker and the saver every x seconds.
This can be controlled by setting db_timeout in the configuration dict, but it defaults to 60 seconds.

It turns out that checking stale connections on a timer takes between 1-2 times as long as just checking them always. This also has the benefit of catching timeouts that happen between timer loops.

Koed00 · 2015-09-28T10:53:18Z

The coffee helped. I did some performance testing and it turns out that the timed checking of connections can actually takes up to two times longer, than just checking for stale connections before every transaction. Always checking of course also circumvents missing any timeouts that happen between the loops. So I removed the db_timeout and added a check before every db transaction.
Let me know if this latest commit solves it for you.

LightJockey · 2015-09-28T11:38:09Z

Hey don't worry, it's Monday for everyone :D
Anyway, many thanks! I can confirm it's working as it should now, and good thing you got rid of timed checks if you found out that performance is even better this way :)

Koed00 · 2015-09-28T11:41:12Z

Ok cool. I'll probably do a release at the end of the day.

Adds a check for old connections in both the workers and the monitor , every DB_TIMEOUT seconds

It turns out that checking stale connections on a timer takes between 1-2 times as long as just checking them always. This also has the benefit of catching timeouts that happen between timer loops.

Koed00 added a commit that referenced this issue Sep 28, 2015

#76 resets stale db connection on every scheduler ping

4ceee42

Koed00 added a commit that referenced this issue Sep 28, 2015

#76 adds connection checked with timeout

7075a91

Adds a check for old connections in both the workers and the monitor , every DB_TIMEOUT seconds

LightJockey closed this as completed Sep 28, 2015

Koed00 added a commit that referenced this issue Sep 30, 2015

#76 resets stale db connection on every scheduler ping

c34766a

Koed00 added a commit that referenced this issue Sep 30, 2015

#76 adds connection checked with timeout

4e1b15e

Adds a check for old connections in both the workers and the monitor , every DB_TIMEOUT seconds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting "MySQL server has gone away" on new tasks after idling #76

Getting "MySQL server has gone away" on new tasks after idling #76

LightJockey commented Sep 28, 2015

Koed00 commented Sep 28, 2015

LightJockey commented Sep 28, 2015

Koed00 commented Sep 28, 2015

Koed00 commented Sep 28, 2015

LightJockey commented Sep 28, 2015

Koed00 commented Sep 28, 2015

Koed00 commented Sep 28, 2015

LightJockey commented Sep 28, 2015

Koed00 commented Sep 28, 2015

Getting "MySQL server has gone away" on new tasks after idling #76

Getting "MySQL server has gone away" on new tasks after idling #76

Comments

LightJockey commented Sep 28, 2015

Koed00 commented Sep 28, 2015

LightJockey commented Sep 28, 2015

Koed00 commented Sep 28, 2015

Koed00 commented Sep 28, 2015

LightJockey commented Sep 28, 2015

Koed00 commented Sep 28, 2015

Koed00 commented Sep 28, 2015

LightJockey commented Sep 28, 2015

Koed00 commented Sep 28, 2015