Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing graceful shutdown #105

Merged
merged 21 commits into from
Aug 8, 2022
Merged

Conversation

zainkabani
Copy link
Contributor

@zainkabani zainkabani commented Aug 2, 2022

This PR attempts to implement a graceful shutdown mechanism for pgcat like pgbouncer.

  • Initiated on SIGINT signal.
  • Waits for clients to finish work then exits
  • Respects transactions and allows them to complete

src/main.rs Outdated
_ = signal::ctrl_c() => (),
_ = term_signal.recv() => (),
};
// initiate graceful shutdown sequence on sig int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// initiate graceful shutdown sequence on sig int
// Initiate graceful shutdown sequence on SIGINT

@zainkabani
Copy link
Contributor Author

Current issue is that a client can be connected to pgcat and not be sending any messages and pgcat will be waiting for this client to complete it's work. Need some ideas to help solve this

let mut term_signal = unix_signal(SignalKind::terminate()).unwrap();

tokio::select! {
_ = signal::ctrl_c() => (),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why you removed the ctrl-c and the SIGTERM handler? SIGTERM is commonly used in Docker containers to stop a container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pgbouncer interprets sigterm and sigint differently. since the sigint is graceful shutdown we want to catch and handle that. for sigterm, which is immediate shutdown we don't need to do anything special

SIGINT
Safe shutdown. Same as issuing PAUSE and SHUTDOWN on the console.
SIGTERM
Immediate shutdown. Same as issuing SHUTDOWN on the console.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the reasoning here. In Kuberentes SIGTERM is meant to initiate a graceful shutdown. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/

src/main.rs Outdated
stream.recv().await;
client::SHUTTING_DOWN.store(true, Ordering::Relaxed);
warn!("Got SIGINT, waiting for client connection drain now");
wg.wait();
Copy link
Contributor

@levkk levkk Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if just save tokio::task::spawn handles in a Vec in main.rs and then waiting for them all to exit task.await is best?

Cargo.toml Outdated
@@ -31,3 +31,4 @@ base64 = "0.13"
stringprep = "0.1"
tokio-rustls = "0.23"
rustls-pemfile = "1"
wg = "0.1.0"
Copy link
Contributor

@levkk levkk Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this, it doesn't look like a well-known crate. Are we sure we need it? I think tokio should give us the tools we need to do this task.

@levkk
Copy link
Contributor

levkk commented Aug 2, 2022

Current issue is that a client can be connected to pgcat and not be sending any messages and pgcat will be waiting for this client to complete it's work. Need some ideas to help solve this

Interesting problem. What about using select! while listening for client messages, and if we receive a shutdown signal, we close the connection and exit the loop?

Tokio docs have some ideas for us: https://tokio.rs/tokio/topics/shutdown

@zainkabani
Copy link
Contributor Author

Used channel transmitter shutdown logic found here: https://tokio.rs/tokio/topics/shutdown

Cargo.toml Outdated
@@ -30,4 +30,4 @@ sha2 = "0.10"
base64 = "0.13"
stringprep = "0.1"
tokio-rustls = "0.23"
rustls-pemfile = "1"
rustls-pemfile = "1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a terminal new line

src/client.rs Show resolved Hide resolved
src/main.rs Outdated
// Client connection loop.
tokio::task::spawn(async move {
// Creates event subscriber for shutdown event, this is dropped when shutdown event is broadcast
let mut listener_rx = shutdown_event_tx_clone.subscribe();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to shutdown_event_rx because it can be confused with TCP listener by the reader.

src/main.rs Outdated
// Closes transmitter
drop(shutdown_event_tx);

loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you looping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be two events that the receiver gets, the 1st is the event sent by shutdown_event_tx.send(()).unwrap(); and then the second event would be when all senders are closed and there is an error on the receive channel. made it a loop so that the timeout could be applied to all the operations.

cleanup_conn(conn, cur)


def test_shutdown_logic():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use bash for these kind of tests, why Python?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're operating on the same postgres connection and doing things in between queries and transactions it felt easiest to do it in python.

Copy link
Contributor

@levkk levkk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@levkk levkk merged commit 3719c22 into postgresml:main Aug 8, 2022
let mut bytes = BytesMut::with_capacity(5);
let mut bytes = BytesMut::with_capacity(
mem::size_of::<u8>() + mem::size_of::<i32>() + mem::size_of::<u8>(),
);

bytes.put_u8(b'Z');
bytes.put_i32(5);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zainkabani I think you may want to update this 5 to also be mem::size_of::<u8>() + mem::size_of::<i32>() + mem::size_of::<u8>()

@zainkabani zainkabani deleted the graceful-shutdown branch August 9, 2022 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants