Adds handler to set socket option in rumqttd. #777

nathansizemore · 2023-12-20T21:24:46Z

This is useful for setting lower level settings on the socket, such as SO_KEEPALIVE, TCP_KEEPCNT, and various socket level flags.

A use case we have is currently when a client is connect, the network drops on the client end, rumqttd will not show as disconnected until the TCP keep alive settings from the OS kick in, this defaults to 2 hours on Linux. This allows a spot so the connections that need specific TCP conditions can have that.

This adds extra configs, which reduce zombie connections that can exist.

ref #775

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

Formatted with cargo fmt
Make an entry to CHANGELOG.md if it's relevant to the users of the library. If it's not relevant mention why.

swanandx · 2024-01-04T08:36:35Z

Hey, thanks for the PR!

This is useful for setting lower level settings on the socket, such as SO_KEEPALIVE, TCP_KEEPCNT, and various socket level flags.

std as well as tokio's ( which we use ) TcpStream doesn't have fn to set SO_KEEPALIVE :( ( check discussions: rust-lang/rust#69774 )

if we want to set those flags, we would need to pull in additional dependency like socket2 or use unsafe libc as you did in #775

I think configurations like this, which are for fine tuning, are better done / set at OS level, e.g. using sysctl on linux, instead of in broker itself. wdyt?

cc: @henil

nathansizemore · 2024-01-04T11:56:46Z

np, I'm sure you'll be getting more PRs from us when various needs come up :)

I think configurations like this, which are for fine tuning, are better done / set at OS level, e.g. using sysctl on linux, instead of in broker itself. wdyt?

This approach only works when the OS is only a broker. It breaks once the OS is used for varying network operations. Say the broker is running on a machine, with socket settings tuned for the MQTT clients over TCP. Now say the that machine also makes some HTTP call for an auth service, or a db connection, now those sockets also inherit the settings from the OS, which you only wanted to apply to connections within MQTT.

Ideally, this is even more configurable in the future with passing in client id, but I hadn't found a way to do that without more changes, so for the time this is the first steps that gets us closer to where we need to be.

We have several client types that connect, but one particular model (derived by client id) is prone to network failure. This allows us to detect quicker when the connection has dropped.

if we want to set those flags, we would need to pull in additional dependency like socket2 or use unsafe libc as you did in #775

Yes, one draw back of this is that now as a user of rumqttd, I need to pull in tokio, because the broker uses tokio, and tokio has it's own socket impl, if I want to use this method.

If you need to set these features, you are probably already familiar with doing so in C, so grabbing the file descriptor and pulling in libc isn't that crazy of a thing with Rust and sockets. If you want to do these types of things in Rust, you've had to do it this way since the language released.

Ideally, tokio wouldn't need exposed at at. But for that, rumqttd would need to have it's own socket type that wraps things or this method would need to take a impl AsRawFd, which more or less forces the end user to pull in libc and removes using tokio methods that exist already. I chose to pull in tokio vs write it with AsRawFd, because I assumed more people are familiar with that, but I can adjust it so that the method exposed something that implments AsRawFd that way no specific socket type leaking occurs?

swanandx · 2024-01-04T12:45:00Z

We have several client types that connect, but one particular model (derived by client id) is prone to network failure. This allows us to detect quicker when the connection has dropped.

Interesting 🤔

A use case we have is currently when a client is connect, the network drops on the client end, rumqttd will not show as disconnected until the TCP keep alive settings from the OS kick in,

can you elaborate more on wdym by OS keep alive to kick in? for me, I get this log as soon as I close the connection, which should imply that we are knowing that client is disconnected right?

 ERROR rumqttd::server::broker: Disconnected!!, error: Network(Io(Custom { kind: ConnectionAborted, error: "connection closed by peer" }))

swanandx · 2024-01-04T12:46:19Z

I'm sure you'll be getting more PRs from us when various needs come up :)

That's awesome 🚀 we will try our best to keep up with those needs :)

nathansizemore · 2024-01-04T14:39:08Z

can you elaborate more on wdym by OS keep alive to kick in? for me, I get this log as soon as I close the connection, which should imply that we are knowing that client is disconnected right?
 ERROR rumqttd::server::broker: Disconnected!!, error: Network(Io(Custom { kind: ConnectionAborted, error: "connection closed by peer" })) 

Sure, let's go through the following examples...

MQTT Graceful Disconnect

Client sends mqtt connect packet.
Client is connected.
Client sends mqtt disconnect packet.
Broker reports disconnected.

TCP Graceful Disconnect

Client sends mqtt connect packet.
Client is connect.
Client force closes socket, causing the OS to send TCP FIN packet.
Broker reports disconnected.

Non-Graceful Disconnect

Client sends mqtt connect packet.
Client is connect.
Router blows up, device resets, power is lost at facility, someone unplugs the ethernet cable, etc...
No mqtt close is sent, no TCP FIN is sent.
Broker thinks client is still alive.

In the non graceful scenario, TCP sockets being no longer active require some settings. It is a combination of a lot of factors, TCP_KEEPALIVE being the top most option, which others are derived from when that kicks off. This enables keep alives at the TCP level, mananged by the network stack in the OS, or the OS itself. You can think of them as MQTT pings, but at the TCP level. By default, these are set to very high at the OS level, so that a general all settings approach can work for all things, and each application can set its own settings for the sockets it cares about, for its logic.

You can test the non-graceful not being reported in the broker by starting up a broker locally, downloading any mqtt client on a mobile device, connecting, then disabling the network connection on the device. The broker sits there until the tcp keep alive settings report the disconnect, or pings from the broker to the client start to error, if they are enabled.

swanandx · 2024-01-04T15:20:17Z

oh understood! thanks for such detailed explanation 💯

You can think of them as MQTT pings, but at the TCP level

just curious, if we set MQTT keep-alive value to something low, then this should work as well right? it would know when client isn't sending pings. ( though it would increase number of pings overall 👀 )

nathansizemore · 2024-01-04T15:50:40Z

just curious, if we set MQTT keep-alive value to something low, then this should work as well right? it would know when client isn't sending pings. ( though it would increase number of pings overall 👀 )

Right, this also works. I believe right now that is set from something in the client connect packet? (Not super versed in mqtt). Or a spot in the config where this could be forced instead of reading from whatever the client sends, if that is indeed how it currently works.

swanandx · 2024-01-04T17:08:13Z

I believe right now that is set from something in the client connect packet?

Yup, keep alive time is set while connecting in CONNECT packet.

If broker doesn't get any packet from client within 1.5 times the keep alive time, it assumes that client is disconnected.

a spot in the config where this could be forced instead of reading from whatever the client sends

If I'm not wrong, currently there isn't a way to force keepalive via config😅

nathansizemore · 2024-01-04T17:51:22Z

If I'm not wrong, currently there isn't a way to force keepalive via config😅

Hence, this PR :)

swanandx · 2024-01-05T04:48:44Z

Hence, this PR :)

I was talking about MQTT Keepalive, this PR is for low level socket's SO_KEEPALIVE which are quite different 😅

nathansizemore · 2024-01-05T12:28:55Z

Hence, this PR :)

I was talking about MQTT Keepalive, this PR is for low level socket's SO_KEEPALIVE which are quite different 😅

Right. I was just trying to say this PR exists because we need a mechanism that is configurable, on a connection basis, that allows us to detect faster when a non-graceful shutdown has occurred. Both adjusting the socket level flags and forcing the broker to override mqtt ping time solve that issue. Having the ability to set lower level socket flags is less important atm than being able to quickly detect shutdowns, to us at least.

nathansizemore · 2024-01-08T16:22:58Z

Closing in favor of a different approach which will be PR'd at a later date.

nathansizemore added 2 commits December 20, 2023 16:19

asdf

2e67d52

asdf

c9c82ad

swanandx mentioned this pull request Jan 8, 2024

rumqttd: Non graceful disconnects do not send last will #783

Closed

nathansizemore closed this Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds handler to set socket option in rumqttd. #777

Adds handler to set socket option in rumqttd. #777

nathansizemore commented Dec 20, 2023

swanandx commented Jan 4, 2024

nathansizemore commented Jan 4, 2024 •

edited

Loading

swanandx commented Jan 4, 2024

swanandx commented Jan 4, 2024

nathansizemore commented Jan 4, 2024

swanandx commented Jan 4, 2024

nathansizemore commented Jan 4, 2024

swanandx commented Jan 4, 2024

nathansizemore commented Jan 4, 2024

swanandx commented Jan 5, 2024

nathansizemore commented Jan 5, 2024 •

edited

Loading

nathansizemore commented Jan 8, 2024

Adds handler to set socket option in rumqttd. #777

Adds handler to set socket option in rumqttd. #777

Conversation

nathansizemore commented Dec 20, 2023

Type of change

Checklist:

swanandx commented Jan 4, 2024

nathansizemore commented Jan 4, 2024 • edited Loading

swanandx commented Jan 4, 2024

swanandx commented Jan 4, 2024

nathansizemore commented Jan 4, 2024

MQTT Graceful Disconnect

TCP Graceful Disconnect

Non-Graceful Disconnect

swanandx commented Jan 4, 2024

nathansizemore commented Jan 4, 2024

swanandx commented Jan 4, 2024

nathansizemore commented Jan 4, 2024

swanandx commented Jan 5, 2024

nathansizemore commented Jan 5, 2024 • edited Loading

nathansizemore commented Jan 8, 2024

nathansizemore commented Jan 4, 2024 •

edited

Loading

nathansizemore commented Jan 5, 2024 •

edited

Loading