Option to not send a `connection_close` on SIGTERM #170

mruell · 2024-08-22T10:40:46Z

Hi there!

The way we use AMQProxy, we depend on it being able to perform a rolling restart (in a kubernetes cluster).
For that we send a SIGTERM to the AMQProxy.

This commit changed the SIGTERM behaviour though: 98c0bc7 (The else in L81 moved in amqproxy.cr)

Now the shutdown is not graceful anymore, since it does not only wait for all clients to disconnect voluntary, but sends a connection close message proactively to the clients.

According to the AMQP reference, that means:

After sending this method, any received methods except Close and Close-OK MUST be discarded. The response to receiving a Close after sending Close must be to send Close-Ok.

Since this is the official reference, it is also implemented like this in most AMQP client libraries and one can't ignore a connection close message. (See for example in the most popular PHP AMQP client library)

That means one is not able to e.g. finish processing the current message before shutting down, but instead it's possible, that a worker is forced to shutdown while processing a message. This can lead to messages being processed multiple times.

As a quick fix (since we build from source anyway) we use this patch on the file amqproxy/src/amqproxy.cr

--- amqproxy_original.cr	2024-08-16 16:28:59
+++ amqproxy.cr	2024-08-16 16:36:12
@@ -90,14 +90,17 @@
       if first_shutdown
         first_shutdown = false
         server.stop_accepting_clients
-        server.disconnect_clients
         if @term_timeout >= 0
           spawn do
             sleep @term_timeout
+            server.disconnect_clients
+            sleep 5
             abort "Exiting with #{server.client_connections} client connections still open"
           end
         end
       else
+        server.disconnect_clients
+        sleep 5
         abort "Exiting with #{server.client_connections} client connections still open"
       end
     end

The solutions that come to my mind would be one of these:

The changes above to be merged (which would be a breaking change though)
Like other servers do, there could be multiple signals - e.g. a SIGTERM could send the connection close, while maybe SIGINT could be really graceful by not sending the connection close
A CLI option, to change the SIGTERM behaviour of whether to send a connection close or not

Would be happy to hear what you think about this :)

Thanks a lot in advance and best,
Marvin

The text was updated successfully, but these errors were encountered:

carlhoerberg · 2024-08-24T23:54:28Z

Ok, thank you for your feedback! It makes sense. We will look at this.

orlandothoeny · 2024-08-27T15:41:21Z

We experience the same issue.
As mentioned in #112 (comment) & #112 (comment)

spuun · 2024-09-27T06:28:27Z

@orlandothoeny @mruell would #175 be a good solution for you?

orlandothoeny · 2024-09-27T08:22:58Z

@spuun Yes that looks like it would to the job

mruell · 2024-09-27T12:45:30Z

Hi @orlandothoeny

that solution looks like it will work!

The way I understand the changes, there will be a new option, to wait after not accepting new connections anymore until we forcefully close them by sending a close message.

That's pretty much the same logic as the terminationGracePeriodSeconds property for kubernetes pods.
We use 4 hours for that.

So if we set the AMQProxy grace period just a little lower than the pod grace period, the workers should have enough time to finish their work and reconnect to a not-terminating pod (they are reconnecting every 15 minutes in-between processing messages, to reconnect to a server accepting new connections).

The only downside is, that the worker will not know, that the AMQProxy tries to terminate (without additional monitors), so we have to always reconnect every 15 minutes, just in case. But looks to me that the AMQP protocol just has no way of "softly" notifying a worker/consumer that it wants to shutdown.

But the reconnects do not really impact the performance in any meaningful way, so this is the best way to do it I think.

Thanks for that and best,
Marvin

intxcc · 2024-09-27T14:49:46Z

sorry, last comment was supposed to mention @spuun

spuun linked a pull request Sep 25, 2024 that will close this issue

Add client close timeout on term #175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to not send a `connection_close` on SIGTERM #170

Option to not send a `connection_close` on SIGTERM #170

mruell commented Aug 22, 2024 •

edited

Loading

carlhoerberg commented Aug 24, 2024

orlandothoeny commented Aug 27, 2024

spuun commented Sep 27, 2024

orlandothoeny commented Sep 27, 2024

mruell commented Sep 27, 2024 •

edited

Loading

intxcc commented Sep 27, 2024

Option to not send a connection_close on SIGTERM #170

Option to not send a connection_close on SIGTERM #170

Comments

mruell commented Aug 22, 2024 • edited Loading

carlhoerberg commented Aug 24, 2024

orlandothoeny commented Aug 27, 2024

spuun commented Sep 27, 2024

orlandothoeny commented Sep 27, 2024

mruell commented Sep 27, 2024 • edited Loading

intxcc commented Sep 27, 2024

Option to not send a `connection_close` on SIGTERM #170

Option to not send a `connection_close` on SIGTERM #170

mruell commented Aug 22, 2024 •

edited

Loading

mruell commented Sep 27, 2024 •

edited

Loading