Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device disconnects from AWS (and WiFi?) after period of inactivity #8

Closed
jbdamask opened this issue Feb 1, 2019 · 12 comments
Closed

Comments

@jbdamask
Copy link
Owner

jbdamask commented Feb 1, 2019

Not clear if it dropped WiFi or AWS. Rebooting fixed it
Error code
/** Returned when the Network is disconnected and reconnect is either disabled or physical layer is disconnected */
NETWORK_DISCONNECTED_ERROR = -13,

@jbdamask jbdamask changed the title Device disconnected from AWS after 30 mins non-use Device disconnects from AWS (and WiFi?) after period of inactivity Mar 13, 2019
@jbdamask
Copy link
Owner Author

Worked the first night. Keep testing

@jbdamask
Copy link
Owner Author

Ok, it's not tied to inactivity. It happened this morning within minutes of me using the devices.
I'm studying CloudWatch logs (search AWSIoTLogs using certificate name associated with device, e.g. 86ffe5ecfe1347f4161078954deafa36789ba889d6f8a8fccb266b83fbf73f9a Disconnect). I see that both devices issued Client Disconnect messages within seconds of each other. This means the client is doing something on purpose and telling AWS about it. This leads me to search the MQTTClient library for disconnect issues. I found something about timeouts but not sure if it's related.

@jbdamask
Copy link
Owner Author

jbdamask commented Mar 15, 2019

Dunno wtf happened here...I just uploaded some code, pushed the button and things were cool then blamo. Possible red-herring....ignore but keep this note

[Starting] Opening the serial port - /dev/cu.SLAB_USBtoUART
[Info] Opened the serial port - /dev/cu.SLAB_USBtoUART
single click
Function: publish()
Calling...
New state: 1
incoming: lights - {"thing_name":"feather_esp32_4","state":"1"}
feather_esp32_4
ets Jun  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x3 (DOWNLOAD_BOOT(UART0/UART1/SDIO_REI_REO_V2))
waiting for download
������������ets Jun  8 2016 00:22:57

@jbdamask
Copy link
Owner Author

jbdamask commented Mar 18, 2019

Possibly related to battery? Both devices use RAVPower 6700. Both devices typically disconnect at the same time (which is interesting because they and their batteries are independent entities).

Interesting observation is that one device which has been connected to my computer via USB did not disconnect while the other, which is battery-powered, did. (Update - this may be because the battery ran out!)

Tests to run:

  1. When USB battery powered, cycle devices one minute apart (do they then disconnect at the same time?)
    1. No effect. Error may still occur simultaneously
  2. Repeat tests with one device connected to my computer
  3. Test with different brands of USB batteries (repeat Test 1 with this configuration)
  4. Test one device USB and one with LiPo
  5. Test with different ESP32 board
  6. Test one in and one out of pillow (could it be an internal temperature shutoff?)
    1. Still happens when device is outside of pillow

@jbdamask
Copy link
Owner Author

jbdamask commented Mar 18, 2019

7:00PM Both devices have disconnected and reconnected. Different batteries. 6 mins apart (this is important). Neither are in the pillow

On 3/18/19 at 4:25, one device disconnected and reconnected (feather 2). This device is plugged into the PKCELL battery, is in it's enclosure but not in a pillow. The other device is plugged into the RAVPower and didn't lose connection.

@jbdamask
Copy link
Owner Author

As per Issue #17, I coded a button long-press event to restart the ESP chip. Today, the device disconnected and I pressed the button. It did not reset the chip, but a few random pixels on the strip lit up as if in calling state. I can replicate the behavior by pressing it again. However, it doesn't restart.

So what does this mean? It means the button press registers but the logic doesn't execute correctly...

@jbdamask
Copy link
Owner Author

feather_esp32_2 is plugged into a 5v power switch (not a battery) and went for nearly 24 hours. But overnight it disconnected (and re-connected) several times. However my other device (running FreeRTOS) didn't disconnect. I'm thinking more and more that this has to do with the MQTT client library (in the notes it says to put a delay(10) after client.loop() )

@jbdamask
Copy link
Owner Author

I'm now reading about the MQTT Keep Alive concept. Notably, it says, "If the client does not send a messages during the keep-alive period, it must send a PINGREQ packet to the broker to confirm that it is available and to make sure that the broker is also still available. The broker must disconnect a client that does not send a a message or a PINGREQ packet in one and a half times the keep alive interval. Likewise, the client is expected to close the connection if it does not receive a response from the broker in a reasonable amount of time."

Note that the library's keep alive is 10 seconds by default (configurable) and the longest allowed by the protocol is 18h 12m 15s (go figure).

"Usually, a disconnected client tries to reconnect. Sometimes, the broker still has an half-open connection for the client. In MQTT, if the broker detects a half-open connection, it performs a ‘client take-over’. The broker closes the previous connection to the same client (determined by the client identifier), and establishes a new connection with the client. This behavior ensures that the half-open connection does not stop the disconnected client from re-establishing a connection."

Maybe I'm dealing with some cases of half-open connections and AWS doesn't detect it properly...

jbdamask added a commit that referenced this issue Mar 21, 2019
@jbdamask
Copy link
Owner Author

jbdamask commented Mar 21, 2019

AWS's embedded c sdk uses a default keep alive of 1200 seconds (!). I wonder why there's is so different from the 10 second default in arduino mqtt....

Anyway, I changed the setting in my code and uploaded to feather_esp32_4
client.setOptions(1200, true, 1000);

Note that feather_esp32_2 is still running using the default setting

@jbdamask
Copy link
Owner Author

Ok, I've seen feather_esp32_2 disconnect a couple of times but not feather_esp32_4.
So what does this really mean? If we assume that the device doesn't publish messages frequently, then the broker will try to ping the device up to 864 times per day. This means there are up to 864 chances for it to catch a network fluctuation and think I've disconnected. By changing it to 1200 seconds I reduce the number of chances of hitting a hiccup considerably; there are only up to 72 times per day that this runs. I can raise this higher if I needed to.

@jbdamask
Copy link
Owner Author

jbdamask commented Mar 22, 2019

Update - The device still disconnects and causes a UX error. It just messes up less often. I wish the freaking library could figure out it's not connected and fix itself...I'm sure there's a way but the standard check of client.connect() isn't working. Oh well, I'll go back to resetting the chip every hour I guess....or add a heartbeat

@jbdamask
Copy link
Owner Author

jbdamask commented Apr 9, 2019

Closing issue. While this isn't fully understood, periodic restarts seem to help

@jbdamask jbdamask closed this as completed Apr 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant