-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Closed
Labels
Description
Apache Airflow version
main (development)
What happened
Worker seems to be stuck in a catatonic state where queued tasks instance messages are not consumed from redis.
Redis did restart while the worker remained as is. The worker did output logs that indicated a loss in connection but was able to reconnect after redis came back online.
[2022-09-19 23:58:00,794: ERROR/MainProcess] consumer: Cannot connect to redis://:**@accurate-axis-9558-redis:6379/0: Error 111 connecting to accurate-axis-9558-redis:6379. Connection refused..
Trying again in 2.00 seconds... (1/100)
[2022-09-19 23:58:03,802: ERROR/MainProcess] consumer: Cannot connect to redis://:**@accurate-axis-9558-redis:6379/0: Error 111 connecting to accurate-axis-9558-redis:6379. Connection refused..
Trying again in 4.00 seconds... (2/100)
[2022-09-19 23:58:08,830: ERROR/MainProcess] consumer: Cannot connect to redis://:**@accurate-axis-9558-redis:6379/0: Error 111 connecting to accurate-axis-9558-redis:6379. Connection refused..
Trying again in 6.00 seconds... (3/100)
[2022-09-19 23:58:15,866: ERROR/MainProcess] consumer: Cannot connect to redis://:**@accurate-axis-9558-redis:6379/0: Error 111 connecting to accurate-axis-9558-redis:6379. Connection refused..
Trying again in 8.00 seconds... (4/100)
[2022-09-19 23:58:24,890: ERROR/MainProcess] consumer: Cannot connect to redis://:**@accurate-axis-9558-redis:6379/0: Error 111 connecting to accurate-axis-9558-redis:6379. Connection refused..
Trying again in 10.00 seconds... (5/100)
[2022-09-19 23:58:34,907: INFO/MainProcess] Connected to redis://:**@accurate-axis-9558-redis:6379/0
[2022-09-19 23:58:34,915: INFO/MainProcess] mingle: searching for neighbors
[2022-09-19 23:58:35,923: INFO/MainProcess] mingle: all alone
What you think should happen instead
After redis comes back online and the worker connected again, the worker should consume the messages and execute queued task instances.
How to reproduce
- Delete the existing redis pod and the worker should be unable to connect to redis
- Redis restarts and the worker connects as expected
- Worker does not consume new messages (queued task instances)
Operating System
N/A
Versions of Apache Airflow Providers
No response
Deployment
Astronomer
Deployment details
No response
Anything else
There was a Github Discussion earlier this year about this behaviour.
This didn't seem to be an issue on an early version of celery (4.4.7).
The current installed version is celery==5.2.7 and I use redis versioned at 6.2.6.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct