Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openssl wrapper gets stuck if client is started before network is ready #5

Closed
larsonmpdx opened this issue May 2, 2017 · 3 comments

Comments

@larsonmpdx
Copy link
Contributor

in my client I set up a connection using code copied from the pubsub sample, which uses the openssl wrapper:

https://github.com/aws/aws-iot-device-sdk-cpp/tree/master/samples/PubSub

if I start my program before the system's network is available then it gets stuck forever on the iot_client->connect() call (see this line for example):

https://github.com/aws/aws-iot-device-sdk-cpp/blob/master/samples/PubSub/PubSub.cpp#L181

the error returned is -300, NETWORK_TCP_CONNECT_ERROR, with a log line "TCP Connection error" which means it came from this bit of code:

https://github.com/aws/aws-iot-device-sdk-cpp/blob/master/network/OpenSSL/OpenSSLConnection.cpp#L290

and this specific system call:

https://github.com/aws/aws-iot-device-sdk-cpp/blob/master/network/OpenSSL/OpenSSLConnection.cpp#L158

I'm going to instrument the system call to get its error code. What happens is the connection fails, my program waits N seconds and retries (with a fresh network::OpenSSLConnection object), and continues to get the same error even though my computer can ping and the network is now up. If I restart my client after the network is up it works fine.

@larsonmpdx
Copy link
Contributor Author

error from connect() is ECONNREFUSED

@larsonmpdx
Copy link
Contributor Author

figured this out. glibc caches /etc/resolv.conf from the program's start and you need a call to res_init() to update it. gethostbyname() was returning a local IP for my aws endpoint instead of failing, and will do so forever because res_init() is never called. I am working on a patch that I'll submit as a PR. I suggest a review of all the C system calls being used because I found a few other details wrong (not calling close() after connect() failed, for example, and some deprecated syscalls).

@chaurah
Copy link
Contributor

chaurah commented May 3, 2017

Hi @larsonmpdx,
Thanks for figuring this out. We have decided we need to simulate this in our release pipeline somehow and make sure we cover this issue for our future releases. I will also be spending some time working only on the network code to make sure your C system call issues are addressed. I think the wrappers can be improved substantially for each of the platforms. Please do let us know if you have any further suggestions.

Rahul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants