-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional hang on AWSIoTMQTTClient.connect() #197
Comments
Greetings! Sorry to say but this is a very old issue that is probably not getting as much attention as it deservers. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one. |
@samvrlewis Did you ever resolve this? I am about to timeout the call and retry on a slightly longer interval than Additionally, I think you may be right as I observe this failure non-deterministically after a publish failure. Deep in the call stack of a publish there is the acquisition of the '_out_pack_mutex' around:
I think if the timeout occurs with this acquired, we would see the hanging. I am not sure what the probability of that is though. Seem like the fix would require a fairly large refactor of how mutex's are handled or an additional wrapper on top of this internal to the lib. Ill continue looking for a workaround. |
@jackhamburger when I came across this issue I think AWS was in the process of writing the v2 python version (https://github.com/aws/aws-iot-device-sdk-python-v2) which looks like it's ready for general use now. Maybe it's worth trying that library instead? My "solution" here at the time was to migrate to using Golang (with a non-AWS MQTT library) instead, which did work for my use case but potentially isn't very helpful for you.. sorry! |
I'm occasionally seeing
AWSIoTMQTTClient.connect()
indefinitely hang. Seems to be the same issue as reported in #40, but there wasn't a proper resolution found there.Logs when this happen:
Comparing this to the logs when everything works, it looks as though it's hanging before the
Starting network I/O thread...
log message is printed fromclients.py
:aws-iot-device-sdk-python/AWSIoTPythonSDK/core/protocol/internal/clients.py
Line 126 in 832f074
Which leads me to believe it's probably hanging on one of the
Lock.acquire
calls in eitherreconnect
:aws-iot-device-sdk-python/AWSIoTPythonSDK/core/protocol/paho/client.py
Line 736 in 832f074
connect_async
:aws-iot-device-sdk-python/AWSIoTPythonSDK/core/protocol/paho/client.py
Line 704 in 832f074
It's a frustrating issue as it's difficult to detect when it has occurred. If there are contention issues for the Locks, I'd much rather the SDK throw an exception than hang forever so that my application can still recover.
Frustratingly, I haven't found a way to try to replicate this yet.
Appreciate any help or insight!
The text was updated successfully, but these errors were encountered: