Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster dies when started with Postgres #79

Closed
aaronkurtz opened this issue Sep 29, 2015 · 8 comments
Closed

Cluster dies when started with Postgres #79

aaronkurtz opened this issue Sep 29, 2015 · 8 comments

Comments

@aaronkurtz
Copy link

Using Python 2.7.9, Django 1.8.4, Postgres 9.4.4, django-q 0.7.5 with the 'orm' broker.

  18:17:21 [Q] INFO Q Cluster-9399 starting.
  18:17:21 [Q] INFO Process-1:1 ready for work at 9404
  18:17:21 [Q] INFO Process-1:2 ready for work at 9405
  18:17:21 [Q] INFO Process-1:3 ready for work at 9406
  18:17:21 [Q] INFO Process-1:4 ready for work at 9407
  18:17:21 [Q] INFO Process-1 guarding cluster at 9402
  18:17:21 [Q] INFO Process-1:5 monitoring at 9408
  18:17:21 [Q] INFO Process-1:6 pushing tasks at 9409
  18:17:21 [Q] INFO Q Cluster-9399 running.
  18:17:21 [Q] ERROR SSL error: decryption failed or bad record mac

Disabling SSL in the database options gets

  18:24:07 [Q] INFO Q Cluster-9827 running.
  18:24:07 [Q] ERROR server closed the connection unexpectedly
      This probably means the server terminated abnormally
      before or while processing the request.

Django_q is part of INSTALLED_APPS, the DB migrated without a problem, I don't see any output from the devserver - I'm not sure where to go from here.

@Koed00
Copy link
Owner

Koed00 commented Sep 29, 2015

I use almost the same setup for tests but with Postgresql 9.3.5. To be sure I just ran some tests on that and it was working fine with or without a configured cache. So we'll need to dig deeper to replicate this.

Can you test if the same thing is happening when you downgrade to version 0.7.4?
The latest version added some db connection checks and it might well be that that is the problem.

Also if I could have a look at your Q_CLUSTER, DATABASES and CACHES settings, I might be able to reproduce it. Just x-out any sensitive information.

@aaronkurtz
Copy link
Author

Downgraded to 0.7.4,
00:21:22 [Q] ERROR SSL connection has been closed unexpectedly
or non-SSL:

00:24:12 [Q] ERROR server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
  DATABASES = { 
      'default': {
          'ENGINE': 'django.db.backends.postgresql_psycopg2',
          'NAME': 'XXX',
          'USER': 'XXX',
          'PASSWORD': 'XXX',
          'HOST': 'localhost',
          'PORT': '5432',
          # 'OPTIONS': {
          #     'sslmode': 'disable',
          # },
      },  
  }

  CACHES = { 
      'default': {
          'BACKEND': 'django.core.cache.backends.db.DatabaseCache',
          'LOCATION': 'cache_table'
      }   
  }

 Q_CLUSTER = { 
     'orm': 'default'
 }

Things I've checked:

  • Nothing shows up in postgresql-9.4-main.log when qcluster is run
  • Deliberately bad database auth fails in a different way
    00:37:02 [Q] ERROR FATAL: password authentication failed for user "XXX"
  • Deliberately bad cache settings fail in a different way
00:37:33 [Q] ERROR relation "BADcache_table" does not exist
LINE 1: SELECT cache_key, value, expires FROM "BADcache_table" ...
  • Same with stopping postgres server

00:41:02 [Q] ERROR could not connect to server: Connection refused

@Koed00
Copy link
Owner

Koed00 commented Sep 29, 2015

Ok. So we can write off the changes in 0.7.5.
I will replicate your settings locally and see if I can find whats's happening. I doubt it's something specific to Django Q though, but I'd like to get to the bottom of this.

@Koed00
Copy link
Owner

Koed00 commented Sep 29, 2015

Of course always when I actually type 'it is not specific to Django Q' it turns out it is.
I've managed to replicate the problem, it's definitely the cache that is causing this.
I'll let you know what I find out.

Koed00 added a commit that referenced this issue Sep 29, 2015
The sentinel saves a statistic to the cache provider just before forking the workers. If this cache provider is a database backend, the connection stays open and gets forked and it crashes. Solution is to close any db connection before forking.
@Koed00
Copy link
Owner

Koed00 commented Sep 29, 2015

Let me know if the dev branch works for you. This seems to solve it for me.

The sentinel saves some statistics to the cache just before it spawns the workers et all. When the cache uses the database backend, this db connection is still open and gets forked with the other processes. Not a problem for other cache backends, but Postgres will invalidate the connection. The simple solution is to close any db connections before forking.

The reason I never saw this happen on my perfomance test rig, is that I use pgbouncer to pool the connections for the flood tests.

@aaronkurtz
Copy link
Author

Yes, the dev branch works properly. Thanks for taking a look at the problem.

@Koed00
Copy link
Owner

Koed00 commented Sep 29, 2015

No problem. Thanks for making the effort to report the problem.
I will usually do a release at the end of my day in about 7 hours.

Koed00 added a commit that referenced this issue Sep 29, 2015
#79  close django db connection before fork
@Koed00 Koed00 closed this as completed Sep 29, 2015
Koed00 added a commit that referenced this issue Sep 30, 2015
The sentinel saves a statistic to the cache provider just before forking the workers. If this cache provider is a database backend, the connection stays open and gets forked and it crashes. Solution is to close any db connection before forking.
@marcelolima
Copy link

@Koed00 I'm getting the same with the new version: 1.1.0 (with 1.0.2 it was working fine)

psycopg2.OperationalError: SSL error: decryption failed or bad record mac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants