Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP8266 blocks when using TLS and tries to publish disconnected from broker #78

Closed
martinren87 opened this issue Aug 24, 2017 · 11 comments

Comments

@martinren87
Copy link

Hello Joël,

I am using the library for a project where I need to connect my ESP8266 to a private broker through a secure connection.
I discovered that when my internet connection was working, everything was fine, but when my internet connection failed (the ESP8266 was connected to WIFI, but there was no internet available) the device stopped responding.
After a lot of tests, i came to the conclusion that the problem was only if I used a secure connection.
I copy an example code so you can test it:

#include <MQTTClient.h>
#include <SPI.h>
#include <WiFiClient.h>
#include <ESP8266WiFi.h>      

const char ssid[] = "MyWifi";
const char pass[] = "MyPassword";

WiFiClientSecure net;
MQTTClient client(128);

unsigned long lastMillis = 0;

void setup() {
  Serial.begin(115200);
  WiFi.begin(ssid, pass);

  // Note: Local domain names (e.g. "Computer.local" on OSX) are not supported by Arduino.
  // You need to set the IP address directly.
  client.begin("myserver.com", 8883, net);
  client.onMessage(messageReceived);

  connect();
}

void connect() {
  Serial.print("checking wifi...");
  while (WiFi.status() != WL_CONNECTED) {
    Serial.print(".");
    delay(1000);
  }

  Serial.print("\nconnecting...");
  while (!client.connect("arduino", "user", "123456")) {
    Serial.print(".");
    delay(1000);
  }

  Serial.println("\nconnected!");

  client.subscribe("/hello");
}

void loop() {
  client.loop();

  if (!client.connected()) {
    Serial.println("Disconnected");
    delay(1000);
    connect();
  }

  // publish a message roughly every second.
  if (millis() - lastMillis > 1000) {
    lastMillis = millis();
    Serial.println("Publishing!!");
    client.publish("/hello", "world");
  }
}

void messageReceived(String &topic, String &payload) {
  Serial.println("incoming: " + topic + " - " + payload);
}

The way I tested was generating a Wifi network with my mobile phone, and sharing my mobile data connection. As soon as I turn off the mobile data transmission, the ESP8266 stops writing on the serial port.

If you do the same test but not using a secure connection, everything works fine:

WiFiClientSecure net; replaced by WiFiClient net;
client.begin("myserver.com", 8883, net); replaced by client.begin("myserver.com", 1883, net);

Am I doing something wrong??

@256dpi
Copy link
Owner

256dpi commented Aug 30, 2017

That's weird. Somehow the client does not properly switch into the disconnected state. I suspect a bug in the underlying WiFiClientSecure implementation as the properly working WiFiClient shows. I need to test this myself when I'm back from my holidays in two weeks.

In the meantime: Can you ensure you have the latest Arduino, ESP8266-Core, and arduino-mqtt library versions installed and if not test again?

@martinren87
Copy link
Author

martinren87 commented Sep 3, 2017

I have checked and I have the latest versions.
Anyway, taking into account your comment I have been looking to see if anyone else has this problem in the WiFiClientSecure layer.
I found this esp8266/Arduino#3537 that I suspect that might have something to do with the problem I am having.
Maybe you can try it when you are back from your holidays and see if there is something that can be done.

@martinren87
Copy link
Author

Hi Joël, Have you been able to test yourself where the problem is?

@256dpi
Copy link
Owner

256dpi commented Oct 18, 2017

I just released v2.2.0 (might take some time to appear in the library manager). Can you check this again with the latest version?

@martinren87
Copy link
Author

I will try it this week and I tell you if the problem has been solved. Thank you

@martinren87
Copy link
Author

Hi Joël, I have been doing several test and I can confirm that the main issue has been solved.
There is still one little problem:
When I turn off my mobile data (stop connectivity but wifi connected) it takes arround 10 seconds for the ESP8266 to detect and confirm that the connection with the broker has been lost. In that 10 seconds, if I try to publish anything, the ESP8266 works for some seconds and then it hangs.
I think it might be a little bug related to the main issue of this topic. The behaviour is the same, only that now it does it just in that little time that takes the ESP8266 to detect the disconnection.
Can you test it yourself?

@256dpi
Copy link
Owner

256dpi commented Nov 17, 2017

If you have a silent mqtt connection (no incoming and outgoing messages), the client will send ping packets to test if the connection is still alive. So, if you disconnect the internet connection as you described, it may take some time until the WiFi stack emits an error because of the graceful tcp keep alive. That's why you experience a delay until the client detects a disconnection. On top of that, the client uses command timeouts internally. By default any network action has a timeout of 1 sec, which means that the network read/write command can take up to that amount of time to complete. That probably causes the hangs in your code.

Both systems can be configure using void setOptions(int keepAlive, bool cleanSession, int timeout);. A lower keep alive will more often send ping packets if the client is silent and thus detect disconnections earlier. A lower timeout will grant less time for network actions to complete and may reduce the duration of the hangs. But, be aware that this also might decrease the network stability if the value is to low.

On top of that, I might also be that the ESP8266 WiFi system has more timeouts and delays I'm not aware of.

@martinren87
Copy link
Author

Thank you for your response Joël.
I have read the documentation and changed the keepAlive and timeout values. As you mentioned, both options changed the time it takes to detect the disconnection.
However, the "infinite loop" is still there: When trying to publish in that little time it takes to detect the disconnection (that little time can be greater or lesser depending on the previous options, but it is always there) the ESP8266 hangs in an infinite loop and stop responding. That hangs lasts until the connection to the broker is established again, so it can be long hours in that "inifinite loop".
I understand that the time to detect the disconnection can be changed, but that does not prevent the microcontroller to hang in my code that is publishing values very often.
Can you check if there is something to do to avoid this behaviour? Do you need an example code to test it? Let me know if I can help you anyway

@256dpi
Copy link
Owner

256dpi commented Feb 8, 2018

Sorry for leaving this issue stale. The publish methods should never block more than the specified command timeout. Only if the low-level connection write call blocks, the function would block indefinitely and yield the behaviour you described. Do you still encounter the same issues? It's possible that the ESP8266 WiFi stack is doing something weird when the connection is lost. Possibly we have to look there for more answers.

@sw-tt-chandershekharsuthar

hello @martinren87 & @256dpi
I'm publishing data with WiFiClientSecure layer ( port 8883 ) on ESP8266. I'm facing the same problem. In my case when I'm try to publish data than my broker will connect and data will publish only one time properly but the second time connection will break and device will restart the script.
But if I want to subscribe only than the system will work properly without any problem. I can get data in my device 100 time as well.

So my problem is only in publishing I tried with changing in " keepAlive and timeout" and with WiFi to mobile hotspot but the problem is same as it is
help me out ..

@256dpi
Copy link
Owner

256dpi commented Sep 13, 2019

I'm closing this as the ESP8266 project switched to a new TLS library and made many improvements to the network stack that probably automatically fixed this issue by now.

@256dpi 256dpi closed this as completed Sep 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants