Feature: Termination Timeout option #112

orlandothoeny · 2023-03-24T11:47:31Z

A configuration option that adds a configurable wait time before the proxy is shutting down would be useful for use in Kubernetes environments.

Use Case

Our concrete case:

Kubernetes Pod

Supervisor container
- Ensures that a PHP RabbitMQ consumer command is running all the time, the PHP command exits after x seconds, then Supervisor starts it again
AMQProxy sidecar container
- Used to connect to CloudAMQP RabbitMQ instance

Problem
A new application version gets deployed. The old Pod is stopped and a new one is started.
When the old Pod is stopped, all containers of that Pod receive a SIGTERM signal:

AMQProxy sidecar container stops immediately
Supervisor waits until PHP consumer command exits. The command might still need a RabbitMQ connection to finish processing its current batch of messages. It raises errors when trying to interact with the RabbitMQ server because the proxy already stopped.

This could be avoided if the AMQProxy sidecar container waited long enough until the Supervisor container stopped.

Current solution/workaround
Using a preStop hook for the AMQProxy sidecar container

lifecycle:
        preStop:
          exec:
            command: ["sleep", "30"]

Feature

Configuration option: Termination timeout, in seconds
- ENV variable + CLI option
When the proxy receives a SIGTERM signal and the option is configured, it waits for x seconds before stopping
- Current connections are not closed
- New connections can still be opened

--

Google's CloudSQL Proxy has something like this:

$ docker run gcr.io/cloudsql-docker/gce-proxy /cloud_sql_proxy -help
...
-term_timeout
    When set, the proxy will wait for existing connections to close before
    terminating. Any connections that haven't closed after the timeout will be
    dropped

The text was updated successfully, but these errors were encountered:

carlhoerberg · 2024-02-21T20:38:34Z

Are you sure about "New connections can still be opened"? Right now the behavior is to close the listening socket at the first TERM, send Connection::Close frame to all client connections and wait indefinitely for them to close the sockets. On the second TERM all client sockets are forcefully closed.

orlandothoeny · 2024-02-22T09:29:41Z

Hi @carlhoerberg,

For our particular sidecar case, it would have been the desired behaviour. Since the proxy instance has no other client that is opening connections to it than the "main" container inside the Kubernetes Pod that is consuming RabbitMQ messages. When the Pod gets terminated, the "main" container is as well. After it's dead, nobody is opening proxy connections and it will be able to stop gracefully without having to forcefully terminate any open connections.

However I can see that this might not be desireable in a setup where a proxy is used by multiple independent clients. For example in a separate Kubernetes Deployment (that's what we switched to from our initial sidecar setup, mainly because of #108 though). In that case it would still accept new connections and kill them when the second TERM is received. To avoid this, the proxy would have to provide some dedicated health check endpoint that can be used for it's Kubernetes readiness probe. (Although I'm not 100% sure if Kubernetes stops routing traffic to a Pod when it's being terminated anyway)

orlandothoeny · 2024-02-22T10:15:27Z

Just checked the Kubernetes Service behavior when a Pod gets terminated. As far as I can see, it won't receive new traffic anymore. Regardless of the readiness probe status: https://kubernetes.io/docs/tutorials/services/pods-and-endpoint-termination-flow/

carlhoerberg · 2024-02-22T10:17:31Z

Ok, so might as well stop the listerner too then at first TERM, won't affect k8s users, but make a more logical behavior for others, right?

carlhoerberg · 2024-02-22T10:18:37Z

btw, #108 is fixed, sorry for the long delay, should have been fixed way earlier...

orlandothoeny · 2024-02-22T10:23:23Z

Ok, so might as well stop the listerner too then at first TERM, won't affect k8s users, but make a more logical behavior for others, right?

You mean stop accepting new connections? I guess it depends on the topology

Sidecar: There you would still be able to run into the case where the consumer container still tries to open a new connection if the timing between the consumer stopping to consume and the proxy stopping to accept connections is a bit off. Another case that is possible: The consumer finishes the current message that is being consumed, during message processing, the code tries to publish a RabbitMQ message
Separate Deployment with a Service: There I think it should not matter, since the Service stops routing traffic on termination. Not sure if there might be timing issues though (Maybe the Service stopping traffic takes a few ms and the proxy already stopped accepting new connections)

fixes #112

orlandothoeny · 2024-03-11T08:30:02Z

Just tested a little and managed to get AMQProxy shutdown(0, 0) errors from our consumers while deploying a new version of our Kubernetes AMQProxy Deployment (rolling update).

So this would be a nice feature for the case where AMQProxy is deployed as a separate component (e.g. Kubernetes Deployment).

orlandothoeny · 2024-08-20T10:49:01Z

@carlhoerberg Just upgraded to the newest version (2.0.1) and tested the shutdown behavior. I configured the new --term-timeout option (set to 5 mins).

In our case, we still receive errors when the proxy is terminating.
Since - as far as I understand - it immediately closes all client connections when it receives the TERM signal.
Thus, the connection is taken away from the client immediately, with the message AMQProxy shutdown(0, 0) (https://github.com/cloudamqp/amqproxy/blob/main/src/amqproxy/client.cr#L195-L200).
If I understand the code correctly, this is done by this line: https://github.com/cloudamqp/amqproxy/pull/147/files#diff-60939064684e0ee6ef504d4b72e12d6cf0febd05f0426f4189b799c522816399R93

Would there be negative consequences when the client connections are only closed after the timeout has passed (after the sleep https://github.com/cloudamqp/amqproxy/blob/main/src/amqproxy.cr#L96)?

orlandothoeny · 2024-08-20T10:59:19Z

A nice-to-have feature would be that it checks if all clients have closed their connections by themselves and terminates early if that is the case.
Looping over all connections (until term-timeout is reached) and checking each one if it is still open or something like that.

Since there might be no connections around anymore before the --term-timeout value is reached. Making the proxy unnecessarily wait before exiting.

dentarg added the enhancement label May 10, 2023

carlhoerberg added a commit that referenced this issue Mar 6, 2024

option --term-timeout force closes server X seconds after TERM

dda5942

fixes #112

carlhoerberg mentioned this issue Mar 6, 2024

option --term-timeout force closes server X seconds after TERM #147

Merged

carlhoerberg closed this as completed in #147 May 6, 2024

orlandothoeny mentioned this issue Aug 27, 2024

Option to not send a connection_close on SIGTERM #170

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Termination Timeout option #112

Feature: Termination Timeout option #112

orlandothoeny commented Mar 24, 2023

carlhoerberg commented Feb 21, 2024

orlandothoeny commented Feb 22, 2024

orlandothoeny commented Feb 22, 2024

carlhoerberg commented Feb 22, 2024

carlhoerberg commented Feb 22, 2024

orlandothoeny commented Feb 22, 2024 •

edited

Loading

orlandothoeny commented Mar 11, 2024 •

edited

Loading

orlandothoeny commented Aug 20, 2024

orlandothoeny commented Aug 20, 2024

Feature: Termination Timeout option #112

Feature: Termination Timeout option #112

Comments

orlandothoeny commented Mar 24, 2023

Use Case

Feature

carlhoerberg commented Feb 21, 2024

orlandothoeny commented Feb 22, 2024

orlandothoeny commented Feb 22, 2024

carlhoerberg commented Feb 22, 2024

carlhoerberg commented Feb 22, 2024

orlandothoeny commented Feb 22, 2024 • edited Loading

orlandothoeny commented Mar 11, 2024 • edited Loading

orlandothoeny commented Aug 20, 2024

orlandothoeny commented Aug 20, 2024

orlandothoeny commented Feb 22, 2024 •

edited

Loading

orlandothoeny commented Mar 11, 2024 •

edited

Loading