Not able to know the connection failure reason in certain situation while using MQTT client with auto reconnect option #1148

ashish-250 · 2021-08-16T07:42:06Z

Scenario
We attempt connecting the the server using MQTTAsync_connect with MQTTAsync_connectOptions autoreconnect.
We get a successful connection after couple of retries. We get a onFailure callback set using MQTTAsync_connectOptions once, log the reason for failure and then get a connected callback set using MQTTAsync_setConnected API. Any future disconnections should be handled properly as we have configured auto-reconnect.
Issue
Suppose, we disconnect after 12 hours and get a connectionLostCallback set using MQTTAsync_setCallbacks. The cause for the lost connection is always set to null as per the docs. So, we do not know why we lost the connection. Now the client will attempt to reconnect and properly call the connected callback if succeeded but in case the connection attempt fails now with a different new reason, one would never get to know the reason outside of paho source code as paho does not call the onFailure and onSuccess callbacks more than once, refer #972. Someone may like to log the new failure reason for debugging purposes. Also, onFailure callback supplies MQTTAsync_failureData with a response code which can help us differentiate between a fatal and recoverable connection attempt failures. We do not get that information as well to make certain decisions.
We also, cannot reinitiate a new MQTTAsync_connect with the fresh set of MQTTAsync_connectOptions as the reconnect logic puts a connect command on paho queue and we get MQTTASYNC_COMMAND_IGNORED for our manual attempt.

I tried looking up any alternative apis for setting onFailure callback here, but had no luck.

Is there anything, you can suggest to get the onFailure callback when the client disconnect after a certain interval of time and then fails to connect back?

icraggs · 2021-08-17T12:10:09Z

There is always the option of reconnecting entirely by using the connectionLost callback and then calling connect when you are ready and with the options you choose.

Once you are in a series of reconnect attempts, the updateConnectOptions callback can be used to modify username and password.

If you want to try an alternative host/port to connect to, then you can use the serverURIs connect option.

You should only get COMMAND_IGNORED while there is a connect command currently on the command queue, which should be a pretty short amount of time compared to the times in between reconnect attempts. Still, calling connect while in reconnect mode was not something I'd planned for or tested so I'm not sure what the outcome would be.

You don't say what type of alternative way of connecting you would be trying if you did have the failure information.

ashish-250 · 2021-08-17T12:53:35Z

There is always the option of reconnecting entirely by using the connectionLost callback and then calling connect when you are ready and with the options you choose.

That is what we were using earlier. We were calling connect again with a fresh set of connectOptions using the connectionLost callback.

Still, calling connect while in reconnect mode was not something I'd planned for or tested so I'm not sure what the outcome would be.

As per our experience with the v1.3.9, if we call connect while in reconnect mode, either of the connect command can end up on the queue first causing the other one to get COMMAND_IGNORED return code and ignores the second command. It was ignoring the second command in older version as well. the only difference was that it was returning SUCCESS return code. The return code is not a problem. The issue is that there is no definite sequence. So if our manual connect call from connectionLost callback makes it first to the queue with the fresh connectOptions, we get the onFailure callback fine as it has not been nullified. But if the reconnect connect command makes it first, we would not get onFailure callback. There was no such issue with v1.3.1 as irrespective of which command made it to queue first, we would always get a onFailure callback.

You don't say what type of alternative way of connecting you would be trying if you did have the failure information.

We decide from response code, whether to wait for a successful connection or exit the program based on the fact that the failure is a transient error (e.g. Server down) or some fatal error.

Once you are in a series of reconnect attempts, the updateConnectOptions callback can be used to modify username and password.

I am not sure if it is possible to add the callbacks again to the connectOptions like username and password. I will test it on my end. Still not sure if it will be threadsafe as the reconnect logic may keep running in the background and it may lead to undefined behavior.

I feel it would be good to get the onFailure callback on all connection failures and control the number of times you respond to the callbacks in your own code only instead of only getting it once from paho. There could be a change in the messages between two consecutive failures. What do you think?

icraggs · 2021-08-18T22:34:21Z

That sounds to me like a use case where you don't want to use automatic reconnect.

The change to return COMMAND_IGNORED if there was already a connect command on the queue was implemented to fix another issue raised. You could repeat the connect call after a small interval when the other one had been taken off the queue, but this still sounds like a conflict with automatic reconnect that you wouldn't want.

icraggs mentioned this issue May 22, 2023

No way to receive failure data when reconnecting #1360

Open

icraggs added the enhancement label May 22, 2023

Ivan-Bolshakov mentioned this issue Jan 22, 2024

Add Connection error callback to fix issue 1148 #1442

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to know the connection failure reason in certain situation while using MQTT client with auto reconnect option #1148

Not able to know the connection failure reason in certain situation while using MQTT client with auto reconnect option #1148

ashish-250 commented Aug 16, 2021

icraggs commented Aug 17, 2021

ashish-250 commented Aug 17, 2021

icraggs commented Aug 18, 2021

Not able to know the connection failure reason in certain situation while using MQTT client with auto reconnect option #1148

Not able to know the connection failure reason in certain situation while using MQTT client with auto reconnect option #1148

Comments

ashish-250 commented Aug 16, 2021

icraggs commented Aug 17, 2021

ashish-250 commented Aug 17, 2021

icraggs commented Aug 18, 2021