-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate no comms in docker #1113
base: master
Are you sure you want to change the base?
Investigate no comms in docker #1113
Conversation
This function is called now before flower.start() is called to make sure we have a working broker and starting Flower will set it up correctly. This is a first attempt at solving the issue of Flower booting up in a broken state. There can be a race condition from the moment we checked the condition to the moment we call flower.start() although unlikely. I will ponder on that...
As of celery 5 the command line invocation no longer supports specifying celery args after flower command. Also add depends_on for flower.
We no longer have `flower` as a separate command and this entrypoint was incorrect. We also need to be able to fully override the arguments for celery and flower via command line arguments when using the image.
@@ -48,7 +50,12 @@ def sigterm_handler(signal, frame): | |||
sys.exit(0) | |||
|
|||
signal.signal(signal.SIGTERM, sigterm_handler) | |||
|
|||
if not is_broker_connected(celery_app=app): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here in theory the connection is checked but can drop before we call flower.start().
The chances are super small though.
I will think about better way of doing this but for now it is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at kombu code:
def ensure_connection(self, *args, **kwargs):
"""Public interface of _ensure_connection for retro-compatibility.
Returns kombu.Connection instance.
"""
self._ensure_connection(*args, **kwargs)
return self
and they follow similar pattern (check connection and then return value) so I think we should be fine with making a check for connection and then calling flower.start()
conn.ensure_connection(errback=_error_handler, max_retries=max_retries) | ||
logger.info(f'Established connection to broker: {broker_url}. Starting Flower...') | ||
is_connected = True | ||
except OperationalError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to find if kombu can raise other errors and add them here.
logger.error(f'Cannot connect to broker: {broker_url}. Error: {exc}. {next_step}') | ||
|
||
try: | ||
conn.ensure_connection(errback=_error_handler, max_retries=max_retries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken from the way celery does it so all the credit goes to them :): https://github.com/celery/celery/blob/master/celery/worker/consumer/consumer.py#L435
Kombu code for reference: https://docs.celeryproject.org/projects/kombu/en/stable/_modules/kombu/connection.html#Connection.ensure_connection
We should start adding those in this file as they are merged - I had hard time remembering what I have done over the last few months.
@mher I have made a tiny change inspired by other usage of celery connections - basically when checking presence of the broker at startup of Flower we can reuse connection from the pool of connections on the This whole set of changes is now active in our production and staging kubernetes clusters and working fine. Please let me know if you are happy with this PR :). I was asked for more Prometheus metrics plus I see a fix needed for the queuing time one so I will open another PR for it. I also wanted to ask here if you would be OK with adding some project wide standards in another PR:
Finally if you are aware of anything that needs fixing please let me know and I will try to help. |
# Conflicts: # CHANGES
…into investigate-no-comms-in-docker
@Tomasz-Kluczkowski I'm not @mher but I guess nobody is against what you mentioned these days and I would even go further and add:
If you're considering black I would go and do it from day 0 with a length of 120 to avoid later noise |
@Tomasz-Kluczkowski can you move docker updates into a separate pull request? |
Of course, good idea!
|
Add guard in Flower's boot sequence to wait for broker. This prevents situation where Flower boots in incorrect state and is never able to communicate with the broker even if it is available later on.
I do appreciate the fact that I did not solve the real issue of Flower not able to call the broker's url despite that fact that it is callable but:
A smarter solution (for which I am afraid I do not have time) would be to understand why Flower/Tornado cannot call a simple url using async http client after Flower booted without the broker. I tried but lost my patience :). Tornado is not something I deal with in work life but I am opened to enlightenment :)
I also changed the Dockerfile and removed ENTRYPOINT since we no longer have
flower
as a standalone command and also we need to be able fully override the command that Flower's image starts with to be able to specify all the broker's and so on.Small fixes to some docs where I noticed the old way of doing things was still used.
@mher
I have tested this code locally and by replacing the image that dockerfile uses with an image based on this branch in my fork of Flower and running docker-compose up with various options - I used rabbitmq, redis and checked booting Flower with them running and without - the new code makes Flower wait until broker is running or exits if max_retries is exceeded.
Image showing the new behaviour:
Related to issue: #1112