Random Worker Timeouts when starting Redash #2453

NoelzeN · 2018-04-11T09:41:20Z

I have an issue with my Redash Setup. When I try to start Redash using the command
/opt/redash/current/bin/run gunicorn -b 127.0.0.1:5000 --name redash -w 4 --max-requests 1000 --log-level debug redash.wsgi:app
Sometimes it works and sometimes I randomly get errors showing "Worker Timeouts":

[2018-04-11 11:33:11 +0000] [19694] [INFO] Starting gunicorn 19.7.1
[2018-04-11 11:33:11 +0000] [19694] [DEBUG] Arbiter booted
[2018-04-11 11:33:11 +0000] [19694] [INFO] Listening at: http://127.0.0.1:5000 (19694)
[2018-04-11 11:33:11 +0000] [19694] [INFO] Using worker: sync
[2018-04-11 11:33:11 +0000] [19702] [INFO] Booting worker with pid: 19702
[2018-04-11 11:33:11 +0000] [19707] [INFO] Booting worker with pid: 19707
[2018-04-11 11:33:11 +0000] [19712] [INFO] Booting worker with pid: 19712
[2018-04-11 11:33:11 +0000] [19717] [INFO] Booting worker with pid: 19717
[2018-04-11 11:33:11 +0000] [19694] [DEBUG] 4 workers
[2018-04-11 11:33:41 +0000] [19694] [CRITICAL] WORKER TIMEOUT (pid:19712)
[2018-04-11 11:33:41 +0000] [19694] [CRITICAL] WORKER TIMEOUT (pid:19707)
[2018-04-11 11:33:41 +0000] [19694] [CRITICAL] WORKER TIMEOUT (pid:19717)
[2018-04-11 11:33:41 +0000] [19694] [CRITICAL] WORKER TIMEOUT (pid:19702)
[2018-04-11 11:33:41 +0000] [19707] [INFO] Worker exiting (pid: 19707)
[2018-04-11 11:33:41 +0000] [19712] [INFO] Worker exiting (pid: 19712)
[2018-04-11 11:33:41 +0000] [19717] [INFO] Worker exiting (pid: 19717)
[2018-04-11 11:33:41 +0000] [19702] [INFO] Worker exiting (pid: 19702)
[2018-04-11 11:33:42 +0000] [19726] [INFO] Booting worker with pid: 19726
[2018-04-11 11:33:42 +0000] [19694] [DEBUG] 1 workers
[2018-04-11 11:33:42 +0000] [19727] [INFO] Booting worker with pid: 19727
[2018-04-11 11:33:42 +0000] [19736] [INFO] Booting worker with pid: 19736
[2018-04-11 11:33:42 +0000] [19737] [INFO] Booting worker with pid: 19737
[2018-04-11 11:33:42 +0000] [19694] [DEBUG] 4 workers

And so on...
The issue occurs in I'd say 90% of the cases when I try to start Redash but for some reason sometimes it works so I don't know where to start looking for the issue. Any hints where I shall have a look to debug the issue?
Thanks,
Nils

The text was updated successfully, but these errors were encountered:

deecay · 2018-04-11T09:49:33Z

Hi

Does this patch help?

--- a/redash/worker.py
+++ b/redash/worker.py
@@ -11,6 +11,9 @@ from celery.signals import worker_process_init
 from redash import __version__, create_app, settings
 from redash.metrics import celery as celery_metrics

+from celery.concurrency import asynpool
+asynpool.PROC_ALIVE_TIMEOUT = 60.0
+

NoelzeN · 2018-04-11T10:07:01Z

Unfortunately this didn't solve the issue:

patch < patchfile
can't find file to patch at input line 3
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
|/redash/worker.py
|+++ b/redash/worker.py
File to patch: redash/worker.py
patching file redash/worker.py
Hunk #1 succeeded at 58 with fuzz 2 (offset 47 lines).

diff redash/worker.py redash/worker.py.nilsbak
61,63d60
< from celery.concurrency import asynpool
< asynpool.PROC_ALIVE_TIMEOUT = 60.0
<

[2018-04-11 12:03:16 +0000] [20117] [INFO] Starting gunicorn 19.7.1
[2018-04-11 12:03:16 +0000] [20117] [DEBUG] Arbiter booted
[2018-04-11 12:03:16 +0000] [20117] [INFO] Listening at: http://127.0.0.1:5000 (20117)
[2018-04-11 12:03:16 +0000] [20117] [INFO] Using worker: sync
[2018-04-11 12:03:16 +0000] [20125] [INFO] Booting worker with pid: 20125
[2018-04-11 12:03:16 +0000] [20128] [INFO] Booting worker with pid: 20128
[2018-04-11 12:03:17 +0000] [20135] [INFO] Booting worker with pid: 20135
[2018-04-11 12:03:17 +0000] [20140] [INFO] Booting worker with pid: 20140
[2018-04-11 12:03:17 +0000] [20117] [DEBUG] 4 workers
[2018-04-11 12:03:47 +0000] [20117] [CRITICAL] WORKER TIMEOUT (pid:20128)
[2018-04-11 12:03:47 +0000] [20117] [CRITICAL] WORKER TIMEOUT (pid:20140)
[2018-04-11 12:03:47 +0000] [20117] [CRITICAL] WORKER TIMEOUT (pid:20125)
[2018-04-11 12:03:47 +0000] [20117] [CRITICAL] WORKER TIMEOUT (pid:20135)
[2018-04-11 12:03:47 +0000] [20140] [INFO] Worker exiting (pid: 20140)
[2018-04-11 12:03:47 +0000] [20128] [INFO] Worker exiting (pid: 20128)
[2018-04-11 12:03:47 +0000] [20125] [INFO] Worker exiting (pid: 20125)
[2018-04-11 12:03:47 +0000] [20135] [INFO] Worker exiting (pid: 20135)

I understand that this should set some timeout to 60 seconds though the error comes after 30 seconds already...

arikfr · 2018-04-11T10:10:25Z

The timeout comes from gunicorn and not Celery, hence why the patch had no effect.

But this looks like a setup issue and not an issue with Redash, it's a better fit for our forum. When creating a post there, it will be great if you could include some more details on your setup that will help debugging this:

Redash version you're running.
How did you set it up.
The kind of server (RAM/CPU) you're using.

And anything else that might be relevant.

Thanks.

arikfr closed this as completed Apr 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random Worker Timeouts when starting Redash #2453

Random Worker Timeouts when starting Redash #2453

NoelzeN commented Apr 11, 2018

deecay commented Apr 11, 2018

NoelzeN commented Apr 11, 2018 •

edited

Loading

arikfr commented Apr 11, 2018

Random Worker Timeouts when starting Redash #2453

Random Worker Timeouts when starting Redash #2453

Comments

NoelzeN commented Apr 11, 2018

deecay commented Apr 11, 2018

NoelzeN commented Apr 11, 2018 • edited Loading

arikfr commented Apr 11, 2018

NoelzeN commented Apr 11, 2018 •

edited

Loading