Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Worker Timeouts when starting Redash #2453

Closed
NoelzeN opened this issue Apr 11, 2018 · 3 comments
Closed

Random Worker Timeouts when starting Redash #2453

NoelzeN opened this issue Apr 11, 2018 · 3 comments

Comments

@NoelzeN
Copy link

NoelzeN commented Apr 11, 2018

I have an issue with my Redash Setup. When I try to start Redash using the command
/opt/redash/current/bin/run gunicorn -b 127.0.0.1:5000 --name redash -w 4 --max-requests 1000 --log-level debug redash.wsgi:app
Sometimes it works and sometimes I randomly get errors showing "Worker Timeouts":

[2018-04-11 11:33:11 +0000] [19694] [INFO] Starting gunicorn 19.7.1
[2018-04-11 11:33:11 +0000] [19694] [DEBUG] Arbiter booted
[2018-04-11 11:33:11 +0000] [19694] [INFO] Listening at: http://127.0.0.1:5000 (19694)
[2018-04-11 11:33:11 +0000] [19694] [INFO] Using worker: sync
[2018-04-11 11:33:11 +0000] [19702] [INFO] Booting worker with pid: 19702
[2018-04-11 11:33:11 +0000] [19707] [INFO] Booting worker with pid: 19707
[2018-04-11 11:33:11 +0000] [19712] [INFO] Booting worker with pid: 19712
[2018-04-11 11:33:11 +0000] [19717] [INFO] Booting worker with pid: 19717
[2018-04-11 11:33:11 +0000] [19694] [DEBUG] 4 workers
[2018-04-11 11:33:41 +0000] [19694] [CRITICAL] WORKER TIMEOUT (pid:19712)
[2018-04-11 11:33:41 +0000] [19694] [CRITICAL] WORKER TIMEOUT (pid:19707)
[2018-04-11 11:33:41 +0000] [19694] [CRITICAL] WORKER TIMEOUT (pid:19717)
[2018-04-11 11:33:41 +0000] [19694] [CRITICAL] WORKER TIMEOUT (pid:19702)
[2018-04-11 11:33:41 +0000] [19707] [INFO] Worker exiting (pid: 19707)
[2018-04-11 11:33:41 +0000] [19712] [INFO] Worker exiting (pid: 19712)
[2018-04-11 11:33:41 +0000] [19717] [INFO] Worker exiting (pid: 19717)
[2018-04-11 11:33:41 +0000] [19702] [INFO] Worker exiting (pid: 19702)
[2018-04-11 11:33:42 +0000] [19726] [INFO] Booting worker with pid: 19726
[2018-04-11 11:33:42 +0000] [19694] [DEBUG] 1 workers
[2018-04-11 11:33:42 +0000] [19727] [INFO] Booting worker with pid: 19727
[2018-04-11 11:33:42 +0000] [19736] [INFO] Booting worker with pid: 19736
[2018-04-11 11:33:42 +0000] [19737] [INFO] Booting worker with pid: 19737
[2018-04-11 11:33:42 +0000] [19694] [DEBUG] 4 workers

And so on...
The issue occurs in I'd say 90% of the cases when I try to start Redash but for some reason sometimes it works so I don't know where to start looking for the issue. Any hints where I shall have a look to debug the issue?
Thanks,
Nils

@deecay
Copy link
Contributor

deecay commented Apr 11, 2018

Hi

Does this patch help?

--- a/redash/worker.py
+++ b/redash/worker.py
@@ -11,6 +11,9 @@ from celery.signals import worker_process_init
 from redash import __version__, create_app, settings
 from redash.metrics import celery as celery_metrics

+from celery.concurrency import asynpool
+asynpool.PROC_ALIVE_TIMEOUT = 60.0
+

@NoelzeN
Copy link
Author

NoelzeN commented Apr 11, 2018

Unfortunately this didn't solve the issue:

patch < patchfile
can't find file to patch at input line 3
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
|/redash/worker.py
|+++ b/redash/worker.py
File to patch: redash/worker.py
patching file redash/worker.py
Hunk #1 succeeded at 58 with fuzz 2 (offset 47 lines).

diff redash/worker.py redash/worker.py.nilsbak
61,63d60
< from celery.concurrency import asynpool
< asynpool.PROC_ALIVE_TIMEOUT = 60.0
<

[2018-04-11 12:03:16 +0000] [20117] [INFO] Starting gunicorn 19.7.1
[2018-04-11 12:03:16 +0000] [20117] [DEBUG] Arbiter booted
[2018-04-11 12:03:16 +0000] [20117] [INFO] Listening at: http://127.0.0.1:5000 (20117)
[2018-04-11 12:03:16 +0000] [20117] [INFO] Using worker: sync
[2018-04-11 12:03:16 +0000] [20125] [INFO] Booting worker with pid: 20125
[2018-04-11 12:03:16 +0000] [20128] [INFO] Booting worker with pid: 20128
[2018-04-11 12:03:17 +0000] [20135] [INFO] Booting worker with pid: 20135
[2018-04-11 12:03:17 +0000] [20140] [INFO] Booting worker with pid: 20140
[2018-04-11 12:03:17 +0000] [20117] [DEBUG] 4 workers
[2018-04-11 12:03:47 +0000] [20117] [CRITICAL] WORKER TIMEOUT (pid:20128)
[2018-04-11 12:03:47 +0000] [20117] [CRITICAL] WORKER TIMEOUT (pid:20140)
[2018-04-11 12:03:47 +0000] [20117] [CRITICAL] WORKER TIMEOUT (pid:20125)
[2018-04-11 12:03:47 +0000] [20117] [CRITICAL] WORKER TIMEOUT (pid:20135)
[2018-04-11 12:03:47 +0000] [20140] [INFO] Worker exiting (pid: 20140)
[2018-04-11 12:03:47 +0000] [20128] [INFO] Worker exiting (pid: 20128)
[2018-04-11 12:03:47 +0000] [20125] [INFO] Worker exiting (pid: 20125)
[2018-04-11 12:03:47 +0000] [20135] [INFO] Worker exiting (pid: 20135)

I understand that this should set some timeout to 60 seconds though the error comes after 30 seconds already...

@arikfr
Copy link
Member

arikfr commented Apr 11, 2018

The timeout comes from gunicorn and not Celery, hence why the patch had no effect.

But this looks like a setup issue and not an issue with Redash, it's a better fit for our forum. When creating a post there, it will be great if you could include some more details on your setup that will help debugging this:

  • Redash version you're running.
  • How did you set it up.
  • The kind of server (RAM/CPU) you're using.

And anything else that might be relevant.

Thanks.

@arikfr arikfr closed this as completed Apr 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants