-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent Exception 28 Crash/Reboots on Testing Build #1643
Comments
I created a custom build that only has the plugins I need. That is to say, all unused plugins were removed from the define_plugin_sets.h file. The list of plugins allowed in my build are:
The entire flash was erased during the code upload so ESPEasy defaulted to AP mode on first boot. WiFi was configured and the previous settings were loaded from file. FreeRam increased from 14KB (normal builds) to about 20KB. This lighter weight build hasn't experienced the exception 28 crashes. The performance tests were ended after 45 hours without incident. I flashed a second NodeMCU (identical hardware) with the same exact software and restored the configuration. Surprisingly, it only had 16KB of FreeRam. I expected it to be about 20KB. I reviewed both devices and all settings were identical. Reboots did not change the reported FreeRam. The FreeRam memory difference seemed significant. So I clean reflashed both boards and restored the configuration. Now their FreeRam was similar (but not identical), around 16KB. The 20KB seen earlier seems like a significant observation. Maybe an uninitialized memory pointer?
|
Do you also have some chart showing memory usage over time? |
@TD-er: FreeRam memory usage has remained relatively consistent during run time. Normal builds' FreeRam settles to about 15K a couple minutes after boot. I'll see minor reduction after a few hours, but FreeRam mostly remains above 14K. My lightweight build that ran with 20KB FreeRam for 45 hours appears to be a anomaly. I've tried dozens of cold and soft reboots and I cannot reproduce the 20KB. All I see now is around 15-16KB. A couple weeks ago I saw one of my boards slowly lose FreeRam over several hours. I rebooted it when it hit 9KB. The problem went away after the reboot. My backup test board (with the lightweight build) was at 16KB when I went to bed last night. This morning it was at 6KB, but still running. Unfortunately the log was disabled. I've rebooted it and the memory has remained steady at 15KB. Also, sometime last night my main board (with the lightweight build) rebooted. Today I caught it do a panic reboot. But I had serial log turned on, so I suspect the serial log related reboot problem is still haunting me. I've disabled serial log and cold booted it. Everything seems Ok now. I've enabled syslog and I will post the log if the memory leak re-appears this afternoon. Regarding my self-builds, my NodeMCU boards are 4MB. I have been selecting 3MB SPIFF size during compile. The other choice is 1MB SPIFF. However, I've tried both settings and don't see any difference in operation. What is your recommendation, 1MB or 3MB SPIFF?
|
You could also a Generic plugin set to free mem and log that to your OpenHab About the SPIFF. I guess OTA updates need some space and I think it writes them on SPIFF, but I am not sure about that. That may be an issue. |
I'm using syslog (level: info) and sending all the activity to my NAS. That way I get FreeRam at 5 sec intervals (from generic plugin) plus all the other actions. BTW, Tools->Factory Reset does not reset the settings files. It only reboots my NodeMCU's. Not sure if that is unique to my builds or if it is affecting everyone. Maybe it is an important symptom related to SPIFF R/W access?
|
I've had more reports about factory reset. If it can be reproduced by others, then it is worth an issue I guess. |
I can report the factory reset issue. I'll do it after all my existing ESPEasy drama is resolved.
|
Hmm, let's hope your memory of that issue will last that long ;) |
At this point I can easily brag that my memory is better than ESPEasy's.
|
Some progress to report. Here's the latest status: I'm using Arduino ESP8266 board core V2.4.1. Issue #4497 on Github reported that there is a WiFi related memory leak on this release (not present on V2.4.0). Two days ago V2.4.2 was released with the leak patch and other improvements. So I've installed V2.4.2. Details: esp8266/Arduino#4497 When I recompiled my "lightweight" build the FreeRam was 25KB! That fantastic news gave me reason to revert back the full [Testing] build. Now FreeRam on it is 18KB. Not bad, about 4KB improvement. I suspect this will fix my random memory leak. Fingers are crossed that it also eliminates the exception 28 reboots and/or the other memory issues I've experienced. Even if it doesn't, the increased FreeRam is nice.
|
They are indeed improving free RAM in the last few releases. |
The two stock [Testing] builds (self compile on V2.4.2) are still running (15 hours so far). No reboots, memory near 18KB but has decreased a small amount. Looks promising.
|
Yesterday one of the test systems rebooted after running for 18 hours. I didn't record the log so the details are a mystery. The main test system lost it's WiFi connection after running for 21 hrs. It remained offline for about two hours, then rebooted on its own. While offline it continued to run, as confirmed by the Nextion display that was reporting run time and local time. The Syslog recording ended when the WiFi was lost, but the last entries didn't show any obvious problems. Both systems were restarted and have been running for about 22 hours so far. The memory leak hasn't appeared yet. Random memory leaks are the work of the devil. FWIW: Present Status: Going Forward: I've also edited the Nextion plugin and added reserve() to the Strings that concatenate. But not yet tested. Epilog:
|
Can you also run one of the last builds, which is based on the last PlatformIO code (1.8.0), which is using core libraries 2.4.2 |
Good idea. I try out the latest [Testing] build after the next crash/reboot appears.
|
Can you test this PR? |
Can you post/email the ESP_Easy_mega_test_ESP8266_4096.bin? This will ensure I use a build that is identical to yours.
|
That's a valid point :) |
I've loaded #1664 on two NodeMCU's. Testing has started. I see you added some graphics to the menu bar. Are your new icons the fix to the missing "3-bar" menu that has affected some small screen browsers?
Results summary from Aug 17 - 19 test run:
|
Yep, those are to add some recognizable pictogram to the tabs. I am eager to know if this fix I made here does fix at least the Exception reboots. I still have to look at the HTTP handling, so there may still be crashes, but it would be nice if reproducability was reduced ;) |
Update: My workshop ("test") system rebooted after 7 hours.
Creating good looking GUI graphics is something I wish I could do.
|
My main "production" board is still running (>22 hours). But last night it lost the WiFi connection and now it only has local control. Losing WiFi seems to be an issue that started sometime with the Aug releases and this is the third occurrence I have experienced. I never saw it with the July code releases I had been using. Perhaps my WiFi lost connection problem is related to issue #1640. Like that installation I also use static IP. I noticed that his build used ESP8266 2.4.1 whereas all my affected firmware is on 2.4.2. The device is about 3 meters from my TPlink router so the WiFi RF signal is strong (-40dBm). As an experiment I rebooted my router and the ESP still remains offline. So I don't believe the issue is a DHCP versus static IP problem. Before I reboot the ESPEasy board do you have any experiments for me to try?
|
Just a though, can you change the wifi channel of the accesspoint? |
Good idea, but does not help. Still offline. Normally the router is set to AUTO, but for this test I manually set it to several different channels. My working ESPEasy devices (another NodeMCU, two Sonoff TH10's) followed the channel changes and reconnected in a few seconds after each new channel setting. Anything else before I reboot?
|
I am afraid I am also out of ideas. I may add some check for connection errors in the controllers and if that is above some level perform a wifi reset (wifi off/on) |
My Nextion menus include a system status page. The SSID is missing (shown as - - ) and the static IP is still present. The RSSI is reported as +31dBm, which is a default value when WiFi is disconnected. |
Maybe you can add a button on the nextion and couple that with digital disconnect command :) I will also check this -31 dB value you mention, maybe it is also a good check for WiFi status. |
I've thought about adding a reboot rule when RSSI is 31dBm for more than a minute. But that's a last resort primitive band-aide.
Actually, rssi is **+**31dBm (not -31) when wifi is disconnected. At least that's what I get on my NodeMCU modules.
|
OK, I will look into the source to see what it means :) |
This old open issue continues to be reported by others. Since there are now several open tickets on ESPEasy's random reboot problems I have closed this one to reduce the "noise."
|
Summarize of the problem/feature request
The Testing build is prone to Exception 28 crash reboots. I suspect that the higher memory usage from the additional plugins is inviting memory allocation issues.
This problem was reported in issue #1625. But it is not related to the yield panic problem. So I have created a new ticket for the Exception 28 problem.
Expected behavior
Exception 28 is a fatal memory allocation problem. These should never occur.
Actual behavior
The exception 28 crash reboots occur randomly. May crash a couple times a day, or may run for days without incident
Steps to reproduce
System configuration
Hardware:
The ESP8266 boards are NodeMCU clones (LoLin 0.1 V3) with 4MB memory (memory chip ID 001640EF, Speed 40000000, IDE Mode DIO).
ESP Easy version: ESPEasy_mega-20180808
I self compile using Arduino 1.8.5 and ESP8266 core 2.4.1.
Arduino Settings: Board NodeMCU 1.0 (ESP-12E), Flash Size 4M (3M SPIFFS)
ESP Easy settings/screenshots:
The following plugins are being used:
P001 Switch Input
P026 SysInfo
P075 Nextion
P045 MPU6050
Controller is OpenHab MQTT. Message Interval is 500mS. Rules and NPT are enabled. Serial log is disabled.
Rules or log data
The text was updated successfully, but these errors were encountered: