Skip to content

Occasional hang on AWSIoTMQTTClient.connect() #197

Closed
@samvrlewis

Description

@samvrlewis

I'm occasionally seeing AWSIoTMQTTClient.connect() indefinitely hang. Seems to be the same issue as reported in #40, but there wasn't a proper resolution found there.

Logs when this happen:

Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,819 - AWSIoTPythonSDK.core.protocol.internal.clients - DEBUG - Initializing MQTT layer...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,824 - AWSIoTPythonSDK.core.protocol.internal.clients - DEBUG - Registering internal event callbacks to MQTT layer...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,825 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - MqttCore initialized
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,825 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Client id: 020000035
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,826 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Protocol version: MQTTv3.1.1
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,827 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Authentication type: TLSv1.2 certificate based Mutual Auth.
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,828 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring endpoint...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,828 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring certificates...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,830 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring offline requests queueing: max queue size: 0
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,834 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring offline requests queue draining interval: 0.500000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,836 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring connect/disconnect time out: 10.000000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,837 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring MQTT operation time out: 30.000000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,838 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Performing sync connect...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,839 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Performing async connect...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,839 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Keep-alive: 600.000000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,846 - AWSIoTPythonSDK.core.protocol.internal.workers - DEBUG - Event consuming thread started
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,847 - AWSIoTPythonSDK.core.protocol.mqtt_core - DEBUG - Passing in general notification callbacks to internal client...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,848 - AWSIoTPythonSDK.core.protocol.internal.clients - DEBUG - Filling in fixed event callbacks: CONNACK, DISCONNECT, MESSAGE

Comparing this to the logs when everything works, it looks as though it's hanging before the Starting network I/O thread... log message is printed from clients.py:

self._logger.debug("Starting network I/O thread...")

Which leads me to believe it's probably hanging on one of the Lock.acquire calls in either reconnect:

or connect_async:
def connect_async(self, host, port=1883, keepalive=60, bind_address=""):

It's a frustrating issue as it's difficult to detect when it has occurred. If there are contention issues for the Locks, I'd much rather the SDK throw an exception than hang forever so that my application can still recover.

Frustratingly, I haven't found a way to try to replicate this yet.

Appreciate any help or insight!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions