Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modules with mega_20210503 all off-line after unpredictable time #3649

Closed
blb4github opened this issue May 20, 2021 · 40 comments · Fixed by #5059
Closed

Modules with mega_20210503 all off-line after unpredictable time #3649

blb4github opened this issue May 20, 2021 · 40 comments · Fixed by #5059
Labels
Category: Wifi Related to the network connectivity Status: Needs Info Needs more info before action can be taken Type: Bug Considered a bug

Comments

@blb4github
Copy link

As briefly mentioned in #3638 (comment)_

After updating to mega_20210503 all my (7) Wemos D1 modules become off-line after unpredictable time. Some after minutes, 1 after a couple of days. Before this happened they became aslo very unresponsive. In the mean time I have downgraded to 20210223 and they are all up and running for days again.

@TD-er
Copy link
Member

TD-er commented May 20, 2021

Yep, there's something funky with the WiFi not reconnecting.
What build do you need, to test the current code base? Hopefully I have already fixed it, but right now I can't reproduce it.

@TD-er TD-er added Category: Wifi Related to the network connectivity Status: Needs Info Needs more info before action can be taken Type: Bug Considered a bug labels May 20, 2021
@ghtester
Copy link

ghtester commented May 21, 2021

Yes I have also experienced 2 times that ESP node with a custom firmware built from latest sources could not reconnect even after RESET by button - it was necessary to unplug it from power and plug it again. I'll try to reproduce the issue again. I am not sure but it looks like an old, already fixed issue, reverted back - discussed here : #2757 and here: #2721 (comment)

BTW. I have just tried to run a WiFi scan from Tools page on one node which is connected to WiFi & working and to my surprise it said: No Access Points found (there are plenty of around for sure).

@TD-er
Copy link
Member

TD-er commented May 21, 2021

OK, that's an important clue for me.
There is a check in the WiFi scan function to see if there are unprocessed WiFi events.
Apparently there are, so if you could fetch via the serial console some of those 30-sec log lines stating the WiFi connection state, that would be great.

@TD-er
Copy link
Member

TD-er commented May 21, 2021

Can you test with this PR: #3650 ?

@ghtester
Copy link

ghtester commented May 21, 2021

Encountered the issue again, this is a debug log:
100922411 : Info : WD : Uptime 1682 ConnectFailures 26 FreeMem 10232 WiFiStatus WL_IDLE_STATUS ESPeasy internal wifi status: DISCONNECTED
scandone
100923180 : Info : WiFi : Scan finished, found: 11
100923185 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-78dBm) WPA2/PSK
100923199 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dBm) attempt # 2833
100924235 : Info : ADC : Analog value: 15 = 15.00
100924297 : Info : EVENT: Analog#Input=15.00
scandone
no AP03 found, reconnect after 1s
reconnect
100925522 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2202 msscandone

100925625 : Info : WIFI : Set WiFi to OFFdel if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100925860 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100925861 : Info : WiFi : Start network scan all channelfpm close 3
mode : sta(8c:aa:b5:51:7d:89)
add if0
scandone
s
100928050 : Info : WiFi : Scan finished, found: 12
100928055 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-78dBm) WPA2/PSK
100928056 : Info : WIFI : Set WiFi to APdel if0
usl
mode : softAP(8e:aa:b5:51:7d:89)
add if1
dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1)
bcn 100

100928170 : Error : WIFI : Error while starting AP Mode with SSID: ESP21 IP: 192.168.4.1
100928173 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0mode : sta(8c:aa:b5:51:7d:89) + softAP(8e:aa:b5:51:7d:89)
add if0
dBm) attempt # 2834
100928173 : Info : WIFI : Set WiFi to AP+STA
100928370 : Info : EVENT: WiFi#Disconnected
scandone
no AP03 found, reconnect after 1s
reconnect
100930592 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2309 msscandone

100930694 : Info : WIFI : Set WiFi to OFFbcn 0
del if1
del if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100930929 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100930974 : Info : EVENT: WiFi#Disconnected
100931105 : Info : EVENT: WiFi#APmodeDisabled
fpm close 3
100931703 : Info : WIFI : Set WiFi to STAmode : sta(8c:aa:b5:51:7d:89)
add if0
sleep enable,type: 2

100931807 : Info : WiFi : Start network scan all channels
scandone
100934012 : Info : WiFi : Scan finished, found: 18
100934018 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-79dBm) WPA2/PSK
100934023 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dBm) attempt # 2835
100934203 : Info : ADC : Analog value: 15 = 15.00
100934268 : Info : EVENT: Analog#Input=15.00
scandone
no AP03 found, reconnect after 1s
reconnect
100936343 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2199 msscandone

100936447 : Info : WIFI : Set WiFi to OFFdel if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100936681 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100936738 : Info : EVENT: WiFi#Disconnected
fpm close 3
100937466 : Info : WIFI : Set WiFi to STAmode : sta(8c:aa:b5:51:7d:89)
add if0
sleep enable,type: 2

100937570 : Info : WiFi : Start network scan all channels
scandone
100939773 : Info : WiFi : Scan finished, found: 16
100939779 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-79dBm) WPA2/PSK
100939783 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dBm) attempt # 2836
scandone
no AP03 found, reconnect after 1s
reconnect
100942090 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2196 msscandone

100942194 : Info : WIFI : Set WiFi to OFFdel if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100942428 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100942463 : Info : EVENT: WiFi#Disconnected
fpm close 3
100943204 : Info : WIFI : Set WiFi to STAmode : sta(8c:aa:b5:51:7d:89)
add if0
sleep enable,type: 2

100943308 : Info : WiFi : Start network scan all channels
100944153 : Info : ADC : Analog value: 8 = 8.00
100944196 : Info : EVENT: Analog#Input=8.00
scandone
100945527 : Info : WiFi : Scan finished, found: 14
100945533 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-79dBm) WPA2/PSK
100945537 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dBm) attempt # 2837
scandone
no AP03 found, reconnect after 1s
reconnect
100947862 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2197 msscandone

100947966 : Info : WIFI : Set WiFi to OFFdel if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100948200 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100948201 : Info : WiFi : Start network scan all channefpm close 3
mode : sta(8c:aa:b5:51:7d:89)
add if0
scandone
ls
100950390 : Info : WiFi : Scan finished, found: 16
100950396 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-82dBm) WPA2/PSK
100950397 : Info : WIFI : Set WiFi to APdel if0
usl
mode : softAP(8e:aa:b5:51:7d:89)
add if1
dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1)
bcn 100

100950511 : Error : WIFI : Error while starting AP Mode with SSID: ESP21 IP: 192.168.4.1
100950513 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dmode : sta(8c:aa:b5:51:7d:89) + softAP(8e:aa:b5:51:7d:89)
add if0
Bm) attempt # 2838
100950514 : Info : WIFI : Set WiFi to AP+STA
100950712 : Info : EVENT: WiFi#Disconnected
100952444 : Info : WD : Uptime 1683 ConnectFailures 26 FreeMem 9608 WiFiStatus WL_DISCONNECTED ESPeasy internal wifi status: DISCONNECTED
scandone
no AP03 found, reconnect after 1s
reconnect
100952954 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2309 msscandone

100953056 : Info : WIFI : Set WiFi to OFFbcn 0
del if1
del if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100953291 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100953336 : Info : EVENT: WiFi#Disconnected
100953467 : Info : EVENT: WiFi#APmodeDisabled
fpm close 3
100954065 : Info : WIFI : Set WiFi to STAmode : sta(8c:aa:b5:51:7d:89)
add if0
sleep enable,type: 2

100954169 : Info : WiFi : Start network scan all channels
100954230 : Info : ADC : Analog value: 8 = 8.00
100954306 : Info : EVENT: Analog#Input=8.00
scandone
100956380 : Info : WiFi : Scan finished, found: 17
100956387 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-77dBm) WPA2/PSK
100956401 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dBm) attempt # 2839
scandone
no AP03 found, reconnect after 1s
reconnect
100958717 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2196 msscandone

100958820 : Info : WIFI : Set WiFi to OFFdel if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100959055 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100959078 : Info : EVENT: WiFi#Disconnected
fpm close 3
100959841 : Info : WIFI : Set WiFi to STAmode : sta(8c:aa:b5:51:7d:89)
add if0
sleep enable,type: 2

100959945 : Info : WiFi : Start network scan all channels
scandone
100962149 : Info : WiFi : Scan finished, found: 17
100962155 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-79dBm) WPA2/PSK
100962159 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dBm) attempt # 2840
100964202 : Info : ADC : Analog value: 8 = 8.00
100964264 : Info : EVENT: Analog#Input=8.00
scandone
no AP03 found, reconnect after 1s
reconnect
100964467 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2198 msscandone

100964571 : Info : WIFI : Set WiFi to OFFdel if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100964805 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100964829 : Info : EVENT: WiFi#Disconnected
fpm close 3
100965591 : Info : WIFI : Set WiFi to STAmode : sta(8c:aa:b5:51:7d:89)
add if0
sleep enable,type: 2

100965696 : Info : WiFi : Start network scan all channels
scandone
100967915 : Info : WiFi : Scan finished, found: 19
100967922 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-80dBm) WPA2/PSK
100967925 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dBm) attempt # 2841
scandone
no AP03 found, reconnect after 1s
reconnect
100970230 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2196 msscandone

100970333 : Info : WIFI : Set WiFi to OFFdel if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100970568 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100970569 : Info : WiFi : Start network scan all channelfpm close 3
mode : sta(8c:aa:b5:51:7d:89)
add if0
scandone
s
100972758 : Info : WiFi : Scan finished, found: 14
100972763 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-80dBm) WPA2/PSK
100972764 : Info : WIFI : Set WiFi to APdel if0
usl
mode : softAP(8e:aa:b5:51:7d:89)
add if1
dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1)
bcn 100

100972879 : Error : WIFI : Error while starting AP Mode with SSID: ESP21 IP: 192.168.4.1
100972881 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dmode : sta(8c:aa:b5:51:7d:89) + softAP(8e:aa:b5:51:7d:89)
add if0
Bm) attempt # 2842
100972882 : Info : WIFI : Set WiFi to AP+STA
100973058 : Info : EVENT: WiFi#Disconnected
100974192 : Info : ADC : Analog value: 8 = 8.00
100974264 : Info : EVENT: Analog#Input=8.00
scandone
no AP03 found, reconnect after 1s
reconnect
100975301 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2306 msscandone

100975403 : Info : WIFI : Set WiFi to OFFbcn 0
del if1
del if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100975639 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100975674 : Info : EVENT: WiFi#Disconnected
100975814 : Info : EVENT: WiFi#APmodeDisabled
fpm close 3
100976413 : Info : WIFI : Set WiFi to STAmode : sta(8c:aa:b5:51:7d:89)
add if0
sleep enable,type: 2

100976517 : Info : WiFi : Start network scan all channels
scandone
100978720 : Info : WiFi : Scan finished, found: 24
100978728 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-80dBm) WPA2/PSK
100978733 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dBm) attempt # 2843
scandone
no AP03 found, reconnect after 1s
reconnect
100981054 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2195 msscandone

100981157 : Info : WIFI : Set WiFi to OFFdel if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100981392 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100981449 : Info : EVENT: WiFi#Disconnected
100981513 : Info : EVENT: Clock#Time=Fri,15:32
fpm close 3
100982164 : Info : WIFI : Set WiFi to STAmode : sta(8c:aa:b5:51:7d:89)
add if0
sleep enable,type: 2

100982268 : Info : WiFi : Start network scan all channels
100982397 : Info : WD : Uptime 1683 ConnectFailures 26 FreeMem 10760 WiFiStatus WL_IDLE_STATUS ESPeasy internal wifi status: DISCONNECTED
100984178 : Info : ADC : Analog value: 8 = 8.00
100984242 : Info : EVENT: Analog#Input=8.00
scandone
100984486 : Info : WiFi : Scan finished, found: 14
100984492 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-80dBm) WPA2/PSK
100984495 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dBm) attempt # 2844
scandone
no AP03 found, reconnect after 1s
reconnect
100986824 : Info : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2196 msscandone

100986927 : Info : WIFI : Set WiFi to OFFdel if0
usl
mode : null
force slp enable,type: 2
fpm open,type:2 0

100987162 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn
100987185 : Info : EVENT: WiFi#Disconnected
fpm close 3
100987937 : Info : WIFI : Set WiFi to STAmode : sta(8c:aa:b5:51:7d:89)
add if0
sleep enable,type: 2

100988041 : Info : WiFi : Start network scan all channels
scandone
100990248 : Info : WiFi : Scan finished, found: 15
100990254 : Info : WiFi : Best AP candidate: AP03 00:12:0E:32:56:B3 Ch:1 (-81dBm) WPA2/PSK
100990258 : Info : WIFI : Connecting AP03 B0:73:5D:B3:E1:02 Ch:9 (0dBm) attempt # 2845


I see it tries to connect to other BSSID than the Best AP candidate...
wifiscan lists 36 APs around but the BSSID which the node tries to connect (B0:73:5D:B3:E1:02) is not there... AP03's MAC is 00:12:0E:32:56:B3
I can also confirm the RESET does NOT help and the reconnect is not possible until the node is powered off / on. But it looks a bit different from the old - already fixed - issue as now still a many APs are listed (in past the wifiscan discovered much less APs when the issue happened). Now after power off / on just 21 APs are detected so it's opposite... :-/
Yes I'll try to compile & test the PR # 3650 later today if the log above is in accordance with your expectations so you believe it's fixed.

@TD-er
Copy link
Member

TD-er commented May 21, 2021

I already merged it, so you can just build using the latest sources.

@ghtester
Copy link

ghtester commented May 21, 2021

Thanks for the info, I am just compiling it.
FYI I have experienced a node crash (the firmware compiled 2 days ago), after Exception the note reconnected to WiFi.
18370362 : Error : MQTT : Connection lost, state: Disconnected
18371431 : Error : MQTT : Broker C005 connection failed (2354/0)

Fatal exception 29(StoreProhibitedCause):
epc1=0x4000df64, epc2=0x00000000, epc3=0x4029f89e, excvaddr=0x00000000, depc=0x00000000

                                                                                                                            --------------- CUT HERE FOR EXCEPTION DECODER ---------------

Exception (29):
epc1=0x4000df64 epc2=0x00000000 epc3=0x4029f89e excvaddr=0x00000000 depc=0x00000000

stack>>>

ctx: sys
sp: 3ffffc40 end: 3fffffb0 offset: 0190
3ffffdd0: 3ffee040 3ffeeb18 00000002 4010100c
3ffffde0: 402a65cf 00040000 00000000 402a6564
3ffffdf0: 00000002 402a650b 00000002 402a5668
3ffffe00: 402a5691 3ffffeb0 3ffeeb18 00000016
3ffffe10: 402a30f4 3ffffeb0 3ffee9c4 3ffee350
3ffffe20: 3ffec720 3ffffeb0 3ffffeb0 3fffc278
3ffffe30: 2d212d21 3fff0021 00000022 00080100
3ffffe40: 00000000 00000000 0000001f 401006ec
3ffffe50: 4029f465 00000018 3fffc228 40105d31
3ffffe60: ffffffa5 3ffee4d8 3ffec730 3ffeeb18
3ffffe70: 3ffed980 00000005 00000000 402a3deb
3ffffe80: 00000000 3fff3004 ffffffa5 00000000
3ffffe90: 00000000 3ffeeb18 3ffee44c 00000000
3ffffea0: 000000a0 3ffee248 3ffe9d24 3ffed980
3ffffeb0: 00000000 04110101 00640104 00000044
3ffffec0: 3ffec744 00000042 3ffec758 3ffec738
3ffffed0: 00000000 3ffec744 3ffec74b 3ffec761
3ffffee0: 00000000 00000000 3ffec767 00000000
3ffffef0: 00000000 00000000 00000000 00000000
3fffff00: 00000000 00000000 00000005 00000000
3fffff10: 3fff868c 402a385e 3ffed980 3fff3004
3fffff20: 00000000 3ffeeb18 3ffed980 3ffec720
3fffff30: 3ffec720 0000006a 00000000 00000005
3fffff40: 00000000 3ffec72a 402ad3a7 3ffed980
3fffff50: 3ffec714 3fffdcc0 3ffe9fb8 3ffe9fb8
3fffff60: 00000080 3ffed980 00000000 3ffe8624
3fffff70: 402acc67 3fffdab0 00000000 40279e20
3fffff80: 3ffe9fb8 40000f49 3fffdab0 40000f49
3fffff90: 40000e19 00000005 000dde90 00000000
3fffffa0: 00000000 aa55aa55 000000ed 40105591
<<<stack<<<

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

ets Jan 8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 3584, room 16
tail 0
chksum 0xb0
csum 0xb0
v2843a5ac
~ld
▒U47 : Info :

INIT : Booting version: My Build: May 19 2021 09:45:40 (ESP82xx Core 2843a5ac, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support)
48 : Info : INIT : Free RAM:33344
50 : Info : INIT : Exception # 1 - Restart Reason: Exception
50 : Info : FS : Mounting...
76 : Info : FS : Mount successful, used 77810 bytes of 957314
97 : Info : CRC : SecuritySettings CRC ...OK
100 : Info : WiFi : Start network scan all channels
136 : Info : INIT : Free RAM:29776

@TD-er
Copy link
Member

TD-er commented May 21, 2021

I did notice the MQTT reconnect attempts can happen quite fast after each other, so maybe that's hammering a bit to quickly on the data structures in the LWIP stack, which has shown more quirks in the past.
In a testrun today I noticed it retried only like 10 - 30 msec apart, so the logging alone can also cause crashes then at such a high rate.

@ghtester
Copy link

ghtester commented May 21, 2021

Well, the unwanted quick MQTT reconnecting was due to a (probably) bad IR sensor which is connected to this node and suddenly started to receive a noise. :-( As it is configured to send the data to MQTT controller, it was the reason of crash... This should not happen under normal conditions.
Just upgraded to fresh custom build & started testing.

Firmware

Build:⋄ | 20113 - Mega
System Libraries:⋄ | ESP82xx Core 2843a5ac, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support
Git Build:⋄ | My Build: May 21 2021 19:25:24
Plugin Count:⋄ | 35
Build Origin: | Vagrant
Build Time:⋄ | May 21 2021 19:24:26
Binary Filename:⋄ | ESP_Easy_mega_20210521_custom_IR_ESP8266_4M1M
Build Platform:⋄ | Linux-4.15.0-51-generic-x86_64-with-glibc2.27
Git HEAD:⋄ | mega_f3ce88e

@TD-er
Copy link
Member

TD-er commented May 22, 2021

I looked into your log again and I guess there is still another improvement to be made in clearing the unhandled WiFi events when WiFi is turned off and on again.
No need to keep flags set when we start again from scratch.

@ghtester
Copy link

ghtester commented May 25, 2021

FYI in my custom build, created with PR3655, sometimes (but for a long time, even after refreshing the page) I see on Main page RSSI: of Secondary AP (Configured SSID2) despite it's in fact connected to Primary AP (Configured SSID1). The same I see on More info page - WiFi - SSID: Secondary AP (incorrect), Channel: channel of Primary AP (correct)
There are different WiFi channels and DHCP ranges so it's clear to which SSID is the ESP Easy node connected (SSID1) - the connection is working.

Connected: 1d15h55m
Number Reconnects: 4

Firmware

Build:⋄ | 20113 - Mega
System Libraries:⋄ | ESP82xx Core 2843a5ac, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support
Git Build:⋄ | My Build: May 23 2021 08:56:56
Plugin Count:⋄ | 35
Build Origin: | Self built
Build Time:⋄ | May 23 2021 08:55:56
Binary Filename:⋄ | ESP_Easy_mega_20210523_custom_IR_ESP8266_4M1M
Build Platform:⋄ | Linux-4.15.0-51-generic-x86_64-with-glibc2.27
Git HEAD:⋄ | HEAD_71c6ffa

144163148: WD : Uptime 2403 ConnectFailures 12619 FreeMem 12464 WiFiStatus WL_CONNECTED ESPeasy internal wifi status: Conn. IP Init

@TD-er
Copy link
Member

TD-er commented May 25, 2021

The important question is, is it still working? I guess it is given your reply.
Whether or not connecting to the other SSID is correct or incorrect is debatable.
If one connection is lost before it is marked stable, the next connection is attempted.
This timeout is 5 minutes.

So if a node is connected to the other SSID, it can mean:

  • strongest SSID was not available/seen when attempted to scan
  • strongest SSID connection was lost within 5 minutes after connecting.

@TD-er
Copy link
Member

TD-er commented May 25, 2021

Oh, just re-read your post.... The displayed connection isn't the actual connection.
That's a bug for sure.

@ghtester
Copy link

Yeah, everything was working fine. Just a "cosmetic" issue in Main page / Main - More info page.
The wrong SSID was displayed there for a long time (looked as permanent), I did not analyze what exactly happened with connecting / reconnecting, both APs have RSSID about 88 - 92 which is not good.

The displayed connection isn't the actual connection.

Exatly, this was the case, I am not sure if / how it is reproducible - to be watched.

@ghtester
Copy link

I think this issue can be closed, with latest builds which already has the PR merged, from my perspective everything looks OK with WiFi reconnect.

@TD-er
Copy link
Member

TD-er commented May 30, 2021

Thanks for reporting back.
Do you see the node may report the wrong connected AP, as was reported by one user. (you??)

@ghtester
Copy link

Yes that was me, reporting above the wrong info about AP connected.
As I still don't know how to reproduce it, can't say much but currently I am not experiencing this issue with Today's custom build.
After reboot due Exceptions, sometimes the node connected to Primary AP, sometimes to Backup AP and the SSID was always correct. Also connection to AP was always reasonable quick.
Maybe this issue was also already fixed. If not, maybe it could occur when the switch between APs is made during operation (without node reboot). I'll try to test and will report if i encounter it again.
Anyway, thanks for your great work!

@blb4github
Copy link
Author

To test I have to compile myself, right?

@TD-er
Copy link
Member

TD-er commented May 30, 2021

Or tell me what you need, so I can compile it for you.

@blb4github
Copy link
Author

Hi TD-er,

That would be great, i'll need the equivalent for ESP_Easy_mega_20210223_normal_ESP8266_4M1M

@TD-er
Copy link
Member

TD-er commented May 30, 2021

Can you let me know how it performs (especially regarding the reported memory issues)
ESP_Easy_mega_20210530_normal_ESP8266_4M1M.bin

@blb4github
Copy link
Author

Thanks! I'll monitor & will report back.

@blb4github
Copy link
Author

I updated 5 out of 7 modules with this firmware and all 5 are still on-line and didn't reset since then (now 9 hours ago). Good result so far! How can I monitor the memory issues?

@TD-er
Copy link
Member

TD-er commented May 31, 2021

You can add a task with the system info plugin and send it to a controller, so you can collect the data.
See here for the discussion related to the memory issues: #3660 (comment)

@blb4github
Copy link
Author

Is this useful?
Schermafbeelding 2021-05-31 om 21 34 10

@TD-er
Copy link
Member

TD-er commented May 31, 2021

It is more useful if we know what node uses what firmware version and whether or not it is comparable with another node.
For example, can we compare the blue line and the purple line regarding included (and enabled/used) plugins, but with different builds?

@ghtester
Copy link

ghtester commented Jun 7, 2021

Well, unfortunately I have to report that one of my ESP nodes is still experiencing the issue, even with the Custom build created 2 days ago:

Firmware

Build:⋄ | 20114 - Mega
System Libraries:⋄ | ESP82xx Core 2843a5ac, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support
Git Build:⋄ | My Build: Jun 5 2021 09:06:41
Plugin Count:⋄ | 35
Build Origin: | Vagrant
Build Time:⋄ | Jun 5 2021 09:05:42
Binary Filename:⋄ | ESP_Easy_mega_20210605_custom_IR_ESP8266_4M1M
Build Platform:⋄ | Linux-4.15.0-51-generic-x86_64-with-glibc2.27
Git HEAD:⋄ | mega_f25fa16

It looks even though the Best AP candidate's name and BSSID are correct, the BSSID to which the ESP node is Connecting is wrong so it never connects until cold boot (RESET by button is not enough).
167647219 : Info : WiFi : Best AP candidate: AP04 00:12:0E:32:68:A0 Ch:1 (-82dBm) WPA2/PSK
167647233 : Info : WIFI : Connecting AP04 9A:B7:2B:C6:04:EE Ch:6 (0dBm) Unknown attempt # 64

@TD-er
Copy link
Member

TD-er commented Jun 7, 2021

OK, so there is some discrepency between what is considered the 'active' AP candidate and what is being used.
Will look into it.

@ghtester
Copy link

ghtester commented Jun 7, 2021

I still don't know how to reproduce it, just experienced a reboot due to Exception and the node reconnected succesfully.
But perhaps it has something to do with RTC Struct (could a wrong BSSID sometimes saved to RTC memory?).

RTC Struct
We store several variables in this struct that may be useful to survive a reboot.
WiFi related settings for a quick reconnect:
Last connected WiFi channel
Last WiFi Settings Index
**Last BSSID (48 bit)
**

@blb4github
Copy link
Author

Sorry not providing updates for a while. I'm using now ESP_Easy_mega_20210530_normal_ESP8266_4M1M.bin for all my modules. They all stay reachable via WIFI but they are not very stable. In average they are rebooting within 3 days.

@TD-er
Copy link
Member

TD-er commented Jun 20, 2021

Can you perhaps try this build: Experimental web flasher
If you don't want to flash them via serial, the bin files are located here

@blb4github
Copy link
Author

Thanks TD-er, I've update all 7 modules with ESP_Easy_mega_20210615_normal_ESP8266_4M1M.bin, I'll monitor their behavior.

@blb4github
Copy link
Author

1st update: 5 out of 7 modules didn't reset since firmwware update, now all running for approx 23 hours. The other 2 did reset at least once, 1 is running for 20 hours, the other for only 3 hours

@TD-er
Copy link
Member

TD-er commented Jun 23, 2021

Do you have any stats of the crashing units?
For example RSSI value, trend of used memory, etc.

@blb4github
Copy link
Author

blb4github commented Jun 23, 2021

Schermafbeelding 2021-06-23 om 15 29 52

I did flash them around 22/06 01:00u, there is a big change in free memory before / after. I do see a free ram change on the moment that the 2 modules (PPM and KK) are resetting

Schermafbeelding 2021-06-23 om 15 34 14

Schermafbeelding 2021-06-23 om 15 36 32

@TD-er
Copy link
Member

TD-er commented Jun 23, 2021

Do these units have such extremely constant memory allocations, or do you have a lot less samples?

@TD-er
Copy link
Member

TD-er commented Apr 1, 2022

Should be fixed by these PRs:

Please let me know if it is indeed fixed now... finally

@blb4github
Copy link
Author

For me most of my modules where very stable and working for weeks without any reset. I have updated the FW for all of them 3 days ago. Free memory is higher, 5 out of 7 didn't reset in the mean time, the 2 which did did so more often before.
afbeelding

@TD-er
Copy link
Member

TD-er commented Apr 1, 2022

Both PRs I linked were from today and yesterday, so I guess those were not yet present 3 days ago ;)

If you don't have several access points with the same SSID, then I guess it will not make much of a difference with the WiFi PR.
The main fix in the MQTT PR is about more frequently checking if there is an event to be processed and not going in overdrive to reconnect to the MQTT broker if the broker is unreachable.

@blb4github
Copy link
Author

blb4github commented Apr 1, 2022

Aha 😀. I thought it was linked with the new version from 3 days ago ….. I do have 3 access point with same SSID and using MQTT so this change can maybe help in my situation as well. I’ll see with next firmware update release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Wifi Related to the network connectivity Status: Needs Info Needs more info before action can be taken Type: Bug Considered a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants