Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting "MySQL server has gone away" on new tasks after idling #76

Closed
LightJockey opened this issue Sep 28, 2015 · 9 comments
Closed

Getting "MySQL server has gone away" on new tasks after idling #76

LightJockey opened this issue Sep 28, 2015 · 9 comments

Comments

@LightJockey
Copy link

Hello,

I've been trying Django Q for a new project as an alternative to Celery, and so far I love it, but I keep getting a "MySQL server has gone away" error on each new task received by my cluster after it idled for at least 8 hours, which corresponds to the wait_timeout of my MySQL server. Also, to make sure it was related to wait_timeout I tried lowering it to a minute and the same thing keeps happening until I restart the cluster.

Is this behavior intended? Shouldn't Django or Django Q handle this and at least try to re-establish a connection?

I tried changing Django's CONN_MAX_AGE in my settings.py to both 0 and a value higher but lower than MySQL's wait_timeout, but no luck.
So after a bit of googling I found this: https://code.djangoproject.com/ticket/21597#comment:29, they recommend using connection.close() so that Django can connect again, and it works for my task, meaning it can change stuff on one of my models and save it, but the task itself isn't getting saved and doesn't appear under successful tasks.

Is there any workaround other than periodically restarting the cluster or increasing wait_timeout to insane values?

My Django Q settings are vanilla, using redis as broker.

08:56:23 [Q] INFO Q Cluster-26132 starting.
08:56:23 [Q] INFO Process-1:3 ready for work at 26139
08:56:23 [Q] INFO Process-1:5 monitoring at 26141
08:56:23 [Q] INFO Process-1 guarding cluster at 26138
08:56:23 [Q] INFO Process-1:6 pushing tasks at 26142
08:56:23 [Q] INFO Process-1:4 ready for work at 26140
08:56:23 [Q] INFO Q Cluster-26132 running.

08:57:46 [Q] INFO Process-1:3 processing [ink-bakerloo-michigan-finch]
08:57:47 [Q] INFO Processed [ink-bakerloo-michigan-finch]

08:59:19 [Q] INFO Process-1:4 processing [rugby-five-skylark-lake]
08:59:19 [Q] ERROR (2006, 'MySQL server has gone away')
08:59:19 [Q] INFO Processed [rugby-five-skylark-lake]

09:01:24 [Q] INFO Process-1:3 processing [uniform-one-hydrogen-december]
09:01:25 [Q] ERROR (2006, 'MySQL server has gone away')
09:01:25 [Q] INFO Processed [uniform-one-hydrogen-december]
@Koed00
Copy link
Owner

Koed00 commented Sep 28, 2015

I just read through that ticket. I'll see if I can somehow check the db connection.
I assume you're not seeing the saved results in the database?

@LightJockey
Copy link
Author

Thank you for your quick response. Yes, no results in the database as soon as that error pops out.

@Koed00
Copy link
Owner

Koed00 commented Sep 28, 2015

I can check for stale connections and reset them on every save, but this might affect performance a bit.
So I first want to try if it's enough to have the scheduler ping reset stale connections every minute or so.
I've added that to the dev branch just now. Are you able to run the dev branch for a while to see if it solves the problem?

If not we'll have to resort to testing the db connection on every save or closing it like the guys over at Django seem to prefer.

@Koed00
Copy link
Owner

Koed00 commented Sep 28, 2015

Hmm. I just realized this probably won't work when the scheduler and saver run in different threads. I need more coffee.

@LightJockey
Copy link
Author

Yeah, no change whatsoever running with the dev branch, tried several times.

Koed00 added a commit that referenced this issue Sep 28, 2015
Adds a check for old connections in both the workers and the monitor , every DB_TIMEOUT seconds
@Koed00
Copy link
Owner

Koed00 commented Sep 28, 2015

I really shouldn't be coding on a Monday morning before coffee.

Meanwhile I made a version that checks connections both in the worker and the saver every x seconds.
This can be controlled by setting db_timeout in the configuration dict, but it defaults to 60 seconds.

Koed00 added a commit that referenced this issue Sep 28, 2015
It turns out that checking stale connections on a timer takes between 1-2 times as long as just checking them always. This also has the benefit of catching timeouts that happen between timer loops.
@Koed00
Copy link
Owner

Koed00 commented Sep 28, 2015

The coffee helped. I did some performance testing and it turns out that the timed checking of connections can actually takes up to two times longer, than just checking for stale connections before every transaction. Always checking of course also circumvents missing any timeouts that happen between the loops. So I removed the db_timeout and added a check before every db transaction.
Let me know if this latest commit solves it for you.

@LightJockey
Copy link
Author

Hey don't worry, it's Monday for everyone :D
Anyway, many thanks! I can confirm it's working as it should now, and good thing you got rid of timed checks if you found out that performance is even better this way :)

@Koed00
Copy link
Owner

Koed00 commented Sep 28, 2015

Ok cool. I'll probably do a release at the end of the day.

Koed00 added a commit that referenced this issue Sep 30, 2015
Adds a check for old connections in both the workers and the monitor , every DB_TIMEOUT seconds
Koed00 added a commit that referenced this issue Sep 30, 2015
It turns out that checking stale connections on a timer takes between 1-2 times as long as just checking them always. This also has the benefit of catching timeouts that happen between timer loops.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants