Description
I'm occasionally seeing AWSIoTMQTTClient.connect()
indefinitely hang. Seems to be the same issue as reported in #40, but there wasn't a proper resolution found there.
Logs when this happen:
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,819 - AWSIoTPythonSDK.core.protocol.internal.clients - DEBUG - Initializing MQTT layer...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,824 - AWSIoTPythonSDK.core.protocol.internal.clients - DEBUG - Registering internal event callbacks to MQTT layer...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,825 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - MqttCore initialized
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,825 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Client id: 020000035
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,826 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Protocol version: MQTTv3.1.1
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,827 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Authentication type: TLSv1.2 certificate based Mutual Auth.
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,828 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring endpoint...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,828 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring certificates...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,830 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring offline requests queueing: max queue size: 0
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,834 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring offline requests queue draining interval: 0.500000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,836 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring connect/disconnect time out: 10.000000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,837 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring MQTT operation time out: 30.000000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,838 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Performing sync connect...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,839 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Performing async connect...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,839 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Keep-alive: 600.000000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,846 - AWSIoTPythonSDK.core.protocol.internal.workers - DEBUG - Event consuming thread started
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,847 - AWSIoTPythonSDK.core.protocol.mqtt_core - DEBUG - Passing in general notification callbacks to internal client...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,848 - AWSIoTPythonSDK.core.protocol.internal.clients - DEBUG - Filling in fixed event callbacks: CONNACK, DISCONNECT, MESSAGE
Comparing this to the logs when everything works, it looks as though it's hanging before the Starting network I/O thread...
log message is printed from clients.py
:
Which leads me to believe it's probably hanging on one of the Lock.acquire
calls in either reconnect
:
connect_async
:
It's a frustrating issue as it's difficult to detect when it has occurred. If there are contention issues for the Locks, I'd much rather the SDK throw an exception than hang forever so that my application can still recover.
Frustratingly, I haven't found a way to try to replicate this yet.
Appreciate any help or insight!