Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds handler to set socket option in rumqttd. #777

Closed
wants to merge 2 commits into from

Conversation

nathansizemore
Copy link
Contributor

This is useful for setting lower level settings on the socket, such as SO_KEEPALIVE, TCP_KEEPCNT, and various socket level flags.

A use case we have is currently when a client is connect, the network drops on the client end, rumqttd will not show as disconnected until the TCP keep alive settings from the OS kick in, this defaults to 2 hours on Linux. This allows a spot so the connections that need specific TCP conditions can have that.

This adds extra configs, which reduce zombie connections that can exist.

ref #775

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

  • Formatted with cargo fmt
  • Make an entry to CHANGELOG.md if it's relevant to the users of the library. If it's not relevant mention why.

@swanandx
Copy link
Member

swanandx commented Jan 4, 2024

Hey, thanks for the PR!

This is useful for setting lower level settings on the socket, such as SO_KEEPALIVE, TCP_KEEPCNT, and various socket level flags.

std as well as tokio's ( which we use ) TcpStream doesn't have fn to set SO_KEEPALIVE :( ( check discussions: rust-lang/rust#69774 )

if we want to set those flags, we would need to pull in additional dependency like socket2 or use unsafe libc as you did in #775

I think configurations like this, which are for fine tuning, are better done / set at OS level, e.g. using sysctl on linux, instead of in broker itself. wdyt?

cc: @henil

@nathansizemore
Copy link
Contributor Author

nathansizemore commented Jan 4, 2024

np, I'm sure you'll be getting more PRs from us when various needs come up :)

I think configurations like this, which are for fine tuning, are better done / set at OS level, e.g. using sysctl on linux, instead of in broker itself. wdyt?

This approach only works when the OS is only a broker. It breaks once the OS is used for varying network operations. Say the broker is running on a machine, with socket settings tuned for the MQTT clients over TCP. Now say the that machine also makes some HTTP call for an auth service, or a db connection, now those sockets also inherit the settings from the OS, which you only wanted to apply to connections within MQTT.

Ideally, this is even more configurable in the future with passing in client id, but I hadn't found a way to do that without more changes, so for the time this is the first steps that gets us closer to where we need to be.

We have several client types that connect, but one particular model (derived by client id) is prone to network failure. This allows us to detect quicker when the connection has dropped.

if we want to set those flags, we would need to pull in additional dependency like socket2 or use unsafe libc as you did in #775

Yes, one draw back of this is that now as a user of rumqttd, I need to pull in tokio, because the broker uses tokio, and tokio has it's own socket impl, if I want to use this method.

If you need to set these features, you are probably already familiar with doing so in C, so grabbing the file descriptor and pulling in libc isn't that crazy of a thing with Rust and sockets. If you want to do these types of things in Rust, you've had to do it this way since the language released.

Ideally, tokio wouldn't need exposed at at. But for that, rumqttd would need to have it's own socket type that wraps things or this method would need to take a impl AsRawFd, which more or less forces the end user to pull in libc and removes using tokio methods that exist already. I chose to pull in tokio vs write it with AsRawFd, because I assumed more people are familiar with that, but I can adjust it so that the method exposed something that implments AsRawFd that way no specific socket type leaking occurs?

@swanandx
Copy link
Member

swanandx commented Jan 4, 2024

We have several client types that connect, but one particular model (derived by client id) is prone to network failure. This allows us to detect quicker when the connection has dropped.

Interesting 🤔

A use case we have is currently when a client is connect, the network drops on the client end, rumqttd will not show as disconnected until the TCP keep alive settings from the OS kick in,

can you elaborate more on wdym by OS keep alive to kick in? for me, I get this log as soon as I close the connection, which should imply that we are knowing that client is disconnected right?

 ERROR rumqttd::server::broker: Disconnected!!, error: Network(Io(Custom { kind: ConnectionAborted, error: "connection closed by peer" })) 

@swanandx
Copy link
Member

swanandx commented Jan 4, 2024

I'm sure you'll be getting more PRs from us when various needs come up :)

That's awesome 🚀 we will try our best to keep up with those needs :)

@nathansizemore
Copy link
Contributor Author

can you elaborate more on wdym by OS keep alive to kick in? for me, I get this log as soon as I close the connection, which should imply that we are knowing that client is disconnected right?

 ERROR rumqttd::server::broker: Disconnected!!, error: Network(Io(Custom { kind: ConnectionAborted, error: "connection closed by peer" })) 

Sure, let's go through the following examples...

MQTT Graceful Disconnect

  1. Client sends mqtt connect packet.
  2. Client is connected.
  3. Client sends mqtt disconnect packet.
  4. Broker reports disconnected.

TCP Graceful Disconnect

  1. Client sends mqtt connect packet.
  2. Client is connect.
  3. Client force closes socket, causing the OS to send TCP FIN packet.
  4. Broker reports disconnected.

Non-Graceful Disconnect

  1. Client sends mqtt connect packet.
  2. Client is connect.
  3. Router blows up, device resets, power is lost at facility, someone unplugs the ethernet cable, etc...
  4. No mqtt close is sent, no TCP FIN is sent.
  5. Broker thinks client is still alive.

In the non graceful scenario, TCP sockets being no longer active require some settings. It is a combination of a lot of factors, TCP_KEEPALIVE being the top most option, which others are derived from when that kicks off. This enables keep alives at the TCP level, mananged by the network stack in the OS, or the OS itself. You can think of them as MQTT pings, but at the TCP level. By default, these are set to very high at the OS level, so that a general all settings approach can work for all things, and each application can set its own settings for the sockets it cares about, for its logic.

You can test the non-graceful not being reported in the broker by starting up a broker locally, downloading any mqtt client on a mobile device, connecting, then disabling the network connection on the device. The broker sits there until the tcp keep alive settings report the disconnect, or pings from the broker to the client start to error, if they are enabled.

@swanandx
Copy link
Member

swanandx commented Jan 4, 2024

oh understood! thanks for such detailed explanation 💯

You can think of them as MQTT pings, but at the TCP level

just curious, if we set MQTT keep-alive value to something low, then this should work as well right? it would know when client isn't sending pings. ( though it would increase number of pings overall 👀 )

@nathansizemore
Copy link
Contributor Author

just curious, if we set MQTT keep-alive value to something low, then this should work as well right? it would know when client isn't sending pings. ( though it would increase number of pings overall 👀 )

Right, this also works. I believe right now that is set from something in the client connect packet? (Not super versed in mqtt). Or a spot in the config where this could be forced instead of reading from whatever the client sends, if that is indeed how it currently works.

@swanandx
Copy link
Member

swanandx commented Jan 4, 2024

I believe right now that is set from something in the client connect packet?

Yup, keep alive time is set while connecting in CONNECT packet.

If broker doesn't get any packet from client within 1.5 times the keep alive time, it assumes that client is disconnected.

a spot in the config where this could be forced instead of reading from whatever the client sends

If I'm not wrong, currently there isn't a way to force keepalive via config😅

@nathansizemore
Copy link
Contributor Author

If I'm not wrong, currently there isn't a way to force keepalive via config😅

Hence, this PR :)

@swanandx
Copy link
Member

swanandx commented Jan 5, 2024

Hence, this PR :)

I was talking about MQTT Keepalive, this PR is for low level socket's SO_KEEPALIVE which are quite different 😅

@nathansizemore
Copy link
Contributor Author

nathansizemore commented Jan 5, 2024

Hence, this PR :)

I was talking about MQTT Keepalive, this PR is for low level socket's SO_KEEPALIVE which are quite different 😅

Right. I was just trying to say this PR exists because we need a mechanism that is configurable, on a connection basis, that allows us to detect faster when a non-graceful shutdown has occurred. Both adjusting the socket level flags and forcing the broker to override mqtt ping time solve that issue. Having the ability to set lower level socket flags is less important atm than being able to quickly detect shutdowns, to us at least.

@nathansizemore
Copy link
Contributor Author

Closing in favor of a different approach which will be PR'd at a later date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants