Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got Exception 28 when lost internet and reconnect internet using MQTT #4134

Closed
tarzan115 opened this issue Jan 11, 2018 · 23 comments
Closed
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.

Comments

@tarzan115
Copy link
Contributor

Basic Infos

Hardware

Hardware: ESP-12E
Core Version: 2.4.0

Description

I used library async mqtt to connect mqtt, but when I already connected to the broker. I plug out the line of the internet, wifi still present. after 1 minute I plug in to have internet. and my device auto reconnect to broker and crash with Exception 28

Settings in IDE

Module: Generic ESP8266 Module
Flash Size: 4MB/3MB
CPU Frequency: 80Mhz
Flash Mode: dio
Flash Frequency: 40Mhz
Upload Using: SERIAL
Reset Method: ck
LwIP Variant: both v2 Prebuild (MSS=536) and v2 Prebuild (MSS=1460 , unstable) got that error

Sketch

#include <Arduino.h>

void setup() {

}

void loop() {

}

Debug Messages

Exception 28: LoadProhibited: A load referenced a page mapped with an attribute that does not permit loads
Decoding 12 results
0x40231e0a: pbuf_cat at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c line 848 (discriminator 1)
0x4023189b: ip_chksum_pseudo at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/inet_chksum.c line 542
0x4022d91c: udp_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/udp.c line 393
0x40230f18: ip4_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 679
0x4022a461: ethernet_input_LWIP2 at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 182
0x401045cc: esp2glue_ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 406
0x4025c492: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 349
0x4025c4c0: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 357
0x402573df: ppPeocessRxPktHdr at ?? line ?
0x4025e543: ets_snprintf at ?? line ?
@d-a-v
Copy link
Collaborator

d-a-v commented Jan 11, 2018

A possible fix (#4105) happened after 2.4.0 release. Can you retry with current git version of core ?
(link)

@devyte devyte added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Jan 11, 2018
@tarzan115
Copy link
Contributor Author

tarzan115 commented Jan 12, 2018

I use git version, with latest commit 02259a4.
This issue still present with log. My case is not lost Wi-Fi signature but my case is lost internet and reconnect internet. Wi-Fi still keep connected

with v2 MSS=536

Exception 29: StoreProhibited: A store referenced a page mapped with an attribute that does not permit stores
Decoding 61 results
0x40224fba: memset at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/machine/xtensa/../../../../.././newlib/libc/machine/xtensa/memset.S line 133
0x40237ddb: bi_divide at crypto/bigint.c line 407
0x40238109: bi_set_mod at crypto/bigint.c line 774
0x402371ec: trim at crypto/bigint.c line 672
0x402392c5: RSA_pub_key_new at crypto/rsa.c line 255
0x4023cf39: asn1_get_big_int at ssl/asn1.c line 323
0x4023d3d5: asn1_public_key at ssl/asn1.c line 556
0x4023d261: asn1_get_printable_str at ssl/asn1.c line 470
:  (inlined by) asn1_name at ssl/asn1.c line 502
0x4023d102: asn1_validity at ssl/asn1.c line 410
0x4023537a: x509_new at ssl/x509.c line 128
0x40104ef1: ets_timer_disarm at ?? line ?
0x40239c7f: SHA256_Process at crypto/sha256.c line 139
0x4023c3cc: SHA256_Update at crypto/sha256.c line 231
0x402388d4: hmac_sha256_v at crypto/hmac.c line 165
0x4023c475: SHA256_Final at crypto/sha256.c line 273 (discriminator 3)
0x402388d4: hmac_sha256_v at crypto/hmac.c line 165
0x40100ec6: pp_post at ?? line ?
0x401042a8: lmacTxFrame at ?? line ?
0x401034c7: lmacRecycleMPDU at ?? line ?
0x4010392a: lmacRecycleMPDU at ?? line ?
0x40101db2: trc_NeedRTS at ?? line ?
0x40100ec6: pp_post at ?? line ?
0x4010340e: lmacProcessTxSuccess at ?? line ?
0x4010228f: wDev_ProcessFiq at ?? line ?
0x40101db2: trc_NeedRTS at ?? line ?
0x40101f8c: wDev_ProcessFiq at ?? line ?
0x40239c56: SHA256_Process at crypto/sha256.c line 137
0x4023c3a5: SHA256_Update at crypto/sha256.c line 228
0x402582ab: pp_attach at ?? line ?
0x4010320d: lmacProcessTXStartData at ?? line ?
0x4010320a: lmacProcessTXStartData at ?? line ?
0x40102136: wDev_ProcessFiq at ?? line ?
0x402573c7: ppTxPkt at ?? line ?
0x40233cf0: basic_read at ssl/tls1.c line 2007
0x40232e31: process_certificate at ssl/tls1.c line 2007
0x402345ce: do_clnt_handshake at ssl/tls1_clnt.c line 107
0x4010020c: _umm_free at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x402014d8: ax_port_read at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/tcp_axtls.c line 589
0x40234018: do_handshake at ssl/tls1.c line 2007
:  (inlined by) basic_read at ssl/tls1.c line 1483
0x4022cd35: tcp_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c line 348
0x4022c535: tcp_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c line 218
0x40234180: ssl_read at ssl/tls1.c line 2007
0x40101f8c: wDev_ProcessFiq at ?? line ?
0x402012a5: tcp_ssl_read at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/tcp_axtls.c line 477
0x4025c21b: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 348
0x4025c219: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 348
0x40213337: AsyncClient::_recv(tcp_pcb*, pbuf*, long) at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/ESPAsyncTCP.cpp line 659
0x40213360: AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, long) at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/ESPAsyncTCP.cpp line 659
0x4022cc54: tcp_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c line 435 (discriminator 1)
0x40230ca9: ip4_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 685
0x4022a1e5: ethernet_input_LWIP2 at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 182
0x401045d4: esp2glue_ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 406
0x4025c22a: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 356
0x4025c258: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 364
0x40257032: ppPeocessRxPktHdr at ?? line ?
0x4025e2df: ets_snprintf at ?? line ?

with v2 MSS=1460

Exception 9: LoadStoreAlignmentCause: Load or store to an unaligned address
Decoding 12 results
0x40231b86: pbuf_cat at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c line 848 (discriminator 1)
0x40231657: ip_chksum_pseudo at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/inet_chksum.c line 542
0x4022d6d8: udp_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/udp.c line 393
0x40230cd4: ip4_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 679
0x4022a1e1: ethernet_input_LWIP2 at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 182
0x401045d4: esp2glue_ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 406
0x4025c222: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 356
0x4025c250: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 364
0x4025702a: ppPeocessRxPktHdr at ?? line ?
0x4025e2d7: ets_snprintf at ?? line ?

@ztittle
Copy link
Contributor

ztittle commented Jan 12, 2018

I just updated an hour ago and this is happening to me as well. In my case, I load client certificates for mutual authentication to aws iot.

Fatal exception 28(LoadProhibitedCause):
epc1=0x4020bb67, epc2=0x00000000, epc3=0x00000000, excvaddr=0x007e8008, depc=0x00000000

Exception (28):
epc1=0x4020bb67 epc2=0x00000000 epc3=0x00000000 excvaddr=0x007e8008 depc=0x00000000

ctx: cont
sp: 3fff1840 end: 3fff1b90 offset: 01a0

>>>stack>>>
3fff19e0:  62755362 65696c43 205d746e 65747441
3fff19f0:  6974706d 4d20676e 20545451 402122b0
3fff1a00:  3fff1a40 3fff0740 00000000 4020f931
3fff1a10:  c025c177 3fff1a70 3fff0688 3ffe93c8
3fff1a20:  3fff3104 3ffea930 3fff3104 00000000
3fff1a30:  3fff1dbc 000004c8 000004c8 3ffe93c8
3fff1a40:  3fff0a74 3fff0740 3fff09c4 4020fab0
3fff1a50:  00000000 00000000 3fff0a9c 402022bc
3fff1a60:  3fff0a74 3fff0740 3fff09c4 40213f84
3fff1a70:  00000000 00000000 00000000 40217be4
3fff1a80:  3fff0a74 3fff09c4 3fff0740 40214420
3fff1a90:  3ffe93c8 00000000 000003e8 00000000
3fff1aa0:  3fff608c 3fff60cc 3ffe93c8 00000000
3fff1ab0:  000003e8 3fff1b10 3fff3e04 3fff3fcc
3fff1ac0:  3ffe93c8 00000000 000003e8 40201951
3fff1ad0:  3fff3d6c 3fff2cac 3fff3e44 4010020c
3fff1ae0:  40213c8c 4021414c 3fff1b10 40100690
3fff1af0:  3fff06dc 3fff1b10 3fff0a9c 3fff071c
3fff1b00:  00000324 3fff06dc 3fff06c8 40214cbe
3fff1b10:  00000000 00000000 00000000 feefeffe
3fff1b20:  feefeffe feefeffe feefeffe feefeffe
3fff1b30:  feefeffe feefeffe feefeffe feefeffe
3fff1b40:  feefeffe feefeffe feefeffe feefeffe
3fff1b50:  feefeffe feefeffe feefeffe 3fff0b5c
3fff1b60:  3fffdad0 00000000 3fff0b54 40216850
3fff1b70:  feefeffe feefeffe feefeffe 40202f84
3fff1b80:  feefeffe feefeffe 3fff0b70 4020536c
<<<stack<<<

 ets Jan  8 2013,rst cause:2, boot mode:(1,7)


 ets Jan  8 2013,rst cause:4, boot mode:(1,7)

wdt reset

Exception 28: LoadProhibited: A load referenced a page mapped with an attribute that does not permit loads
Decoding 18 results
0x4020bb67: WiFiClientSecure::connected() at ?? line ?
0x4020bb67: WiFiClientSecure::connected() at ?? line ?
0x402122b0: PubSubClient::connected() at ?? line ?
0x4020f931: PubSubClient::connect(char const*, char const*, char const*, char const*, unsigned char, unsigned char, char const*) at ?? line ?
0x4020fab0: PubSubClient::connect(char const*, char const*, char const*) at ?? line ?
0x402022bc: String::~String() at ?? line ?
0x40213f84: InterlockPubSubClient::reconnect() at ?? line ?
0x40217be4: std::_Function_base::~_Function_base() at ?? line ?
0x40214420: InterlockPubSubClient::initialize() at ?? line ?
0x40201951: Print::write(char const*) at ?? line ?
0x4010020c: _umm_free at umm_malloc.c line ?
0x40213c8c: std::_Function_base::_Base_manager ::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager  const&, std::_Manager_operation) at InterlockPubSubClient.cpp line ?
0x4021414c: std::_Function_handler ::_M_invoke(std::_Any_data const&, char*, unsigned char*, unsigned int) at InterlockPubSubClient.cpp line ?
0x40100690: free at ?? line ?
0x40214cbe: Program::initialize() at ?? line ?
0x40216850: setup at ?? line ?
0x40202f84: loop_wrapper() at core_esp8266_main.cpp line ?
0x4020536c: cont_norm at cont.o line ?

@devyte
Copy link
Collaborator

devyte commented Jan 12, 2018

There is an option in the Arduino menu to enable debug oom. Please enable, rebuild, and retest. I'd like to know if there is an oom message in the log just before the crash.

@ztittle
Copy link
Contributor

ztittle commented Jan 12, 2018

I enabled that option, but I did not see any oom messages. I did some digging and 8765da2 is the last stable commit for me. Checking out the following commit bd1c7ce triggers the exception 28.

@ztittle
Copy link
Contributor

ztittle commented Jan 12, 2018

I was able to resolve the Exception 28 in the case of WifiClientSecure. It appears to be due to a recent refactor where the client context lost initialization to nullptr. Created a PR for it #4149. I don't believe it's related to this specific issue as my issue is not with the async library.

@tarzan115
Copy link
Contributor Author

thank @ztittle,
but my issue still present. I think maybe is was problem with ESPAsyncTCP library

Exception (29):
epc1=0x40225952 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

ctx: sys 
sp: 3ffffcb0 end: 3fffffb0 offset: 01a0

>>>stack>>>
3ffffe50:  00000000 00001ad0 00001adc 401007d9  
3ffffe60:  00000002 00001ad0 3fffa004 4020229d  
3ffffe70:  00000000 3fff97d4 3fffa004 402330f1  
3ffffe80:  00000000 3fff97d4 3fff5b6c 40234dbc  
3ffffe90:  00000000 0000006f 3fff97d4 4020229d  
3ffffea0:  3fff283c 3fff595c 3fff24d8 3fff97d4  
3ffffeb0:  00000000 3fff595c 3fff5b6c 402011c7  
3ffffec0:  00000006 00000001 3fff614c 3fff3bf8  
3ffffed0:  3fff595c 3fff3bdc 3fff1cf0 40213857  
3ffffee0:  3fff595c 3fff3bdc 3fff3bdd 402138bc  
3ffffef0:  3fff595c 3fff3bdc 3fff3bdd 4022d306  
3fffff00:  c889a8c0 00000012 00000002 401007ac  
3fffff10:  40240000 02d41da5 3fffff80 3fff61ca  
3fffff20:  3fff3d60 3fff61ac 3fff3e48 40231679  
3fffff30:  00000014 00000348 00000348 3fff3d60  
3fffff40:  3fffdc80 3fff44b4 3fff5a04 3fff4534  
3fffff50:  00000008 3fff3d60 3fff61ac 4022ab79  
3fffff60:  3fffdc80 3fff44b4 3fff5a0c 401048ec  
3fffff70:  4025cb5a 3fff44b4 3fff5a0c 4025cb88  
3fffff80:  3fff61bc 3fff61ac 00000000 3fff2940  
3fffff90:  40257a8f 00000000 3fff5a0c 4025ec0b  
3fffffa0:  40000f49 3fffdab0 3fffdab0 40000f49  
<<<stack<<<
Decoding 21 results
0x40225952: memset at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/machine/xtensa/../../../../.././newlib/libc/machine/xtensa/memset.S line 133
0x401007d9: umm_calloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1688
0x4020229d: calloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266/heap.c line 82
0x402330f1: ssl_new at ssl/tls1.c line 2007
0x40234dbc: ssl_client_new at ssl/tls1_clnt.c line 69
0x4020229d: calloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266/heap.c line 82
0x402011c7: tcp_ssl_new_client at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/tcp_axtls.c line 477
0x40213857: AsyncClient::_connected(void*, long) at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/ESPAsyncTCP.cpp line 659
0x402138bc: AsyncClient::_s_connected(void*, void*, long) at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/ESPAsyncTCP.cpp line 659
0x4022d306: tcp_process at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c line 821
:  (inlined by) tcp_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c line 372
0x401007ac: umm_malloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1668
0x40240000: crypto_mod_exp at ?? line ?
0x40231679: ip4_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 685
0x4022ab79: ethernet_input_LWIP2 at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 182
0x401048ec: esp2glue_ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 406
0x4025cb5a: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 356
0x4025cb88: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 364
0x40257a8f: ppPeocessRxPktHdr at ?? line ?
0x4025ec0b: ets_snprintf at ?? line ?

@tarzan115
Copy link
Contributor Author

I have been tested with LwIP v1.4 (prebuild) and it works perfectly.
something wrong with LwIP v2!

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 19, 2018

edit: not yet in master, check #4078

A fix for WiFi dis/reconnection is now in master (not really about "lost internet").

@tarzan115 @ztittle Can you please retry with current git master with debug options enabled (serial debug output and CORE+WIFI+...+OOM) ?

If the problem persist, it will help a lot for us to have a sketch so we can locally reproduce the issue and try to fix it.

@tarzan115
Copy link
Contributor Author

tarzan115 commented Jan 20, 2018

my issue still valid with this log. option is v2 prebuild (MSS=1460)

MQTT connected
:urn 36
:urd 12, 36, 13
:urch 36, 36
:urd 12, 36, 13
:urch 36, 74
:urch 74, 28
:urd 4, 28, 13
:urch 28, 28
:urd 4, 28, 13
Disconnected from MQTT. Because: TCP disconnected
:urch 28, 36
:urd 12, 36, 13
:urch 36, 74
:urch 74, 36
:urd 12, 36, 13
Fatal exception 29(StoreProhibitedCause):
epc1=0x402269be, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

decode stack

Exception 29: StoreProhibited: A store referenced a page mapped with an attribute that does not permit stores
Decoding 21 results
0x402269be: memset at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/machine/xtensa/../../../../.././newlib/libc/machine/xtensa/memset.S line 133
0x401007d9: umm_calloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1688
0x402022a9: calloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266/heap.c line 82
0x40234135: ssl_new at ssl/tls1.c line 2007
0x40235e00: ssl_client_new at ssl/tls1_clnt.c line 69
0x402022a9: calloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266/heap.c line 82
0x402011d3: tcp_ssl_new_client at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/tcp_axtls.c line 483
0x40218203: AsyncClient::_connected(void*, long) at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/ESPAsyncTCP.cpp line 664
0x40218268: AsyncClient::_s_connected(void*, void*, long) at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/ESPAsyncTCP.cpp line 664
0x4022e372: tcp_process at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c line 821
:  (inlined by) tcp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c line 372
0x401007ac: umm_malloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1668
0x40240000: aes_decrypt_init at ?? line ?
0x402326e5: ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 685
0x4022bbe5: ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 182
0x40104854: esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 406
0x4025dac4: ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 356
0x4025dad6: ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 364
0x402589a2: ppPeocessRxPktHdr at ?? line ?
0x4025fad3: ets_snprintf at ?? line ?

and sometime got this

Exception 9: LoadStoreAlignmentCause: Load or store to an unaligned address
Decoding 12 results
0x40231b86: pbuf_cat at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c line 848 (discriminator 1)
0x40231657: ip_chksum_pseudo at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/inet_chksum.c line 542
0x4022d6d8: udp_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/udp.c line 393
0x40230cd4: ip4_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 679
0x4022a1e1: ethernet_input_LWIP2 at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 182
0x401045d4: esp2glue_ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 406
0x4025c222: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 356
0x4025c250: ethernet_input at /home/david/dev/esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 364
0x4025702a: ppPeocessRxPktHdr at ?? line ?
0x4025e2d7: ets_snprintf at ?? line ?
Exception 29: StoreProhibited: A store referenced a page mapped with an attribute that does not permit stores
Decoding 21 results
0x402269c6: ieee80211_send_mgmt at ?? line ?
0x401007d9: umm_calloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1688
0x402022a9: uart_init at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266/uart.c line 464
0x4023413d: rc_only_sta_trc at ?? line ?
0x40235e08: pm_suspend at ?? line ?
0x402022a9: uart_init at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266/uart.c line 464
0x402011d3: tcp_ssl_new_client at D:\Documents\Arduino\libraries\ESPAsyncTCP-master\src/tcp_axtls.c line 483
0x4021820b: do_handshake at ssl/tls1.c line 2007
:  (inlined by) basic_read at ssl/tls1.c line 1483
0x40218270: basic_read at ssl/tls1.c line 2007
0x4022e37a: wpa_remove_ptk at ?? line ?
0x401007ac: umm_malloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1668
0x40240000: tx_cap_init at ?? line ?
0x402326ed: ppTxPkt at ?? line ?
0x4022bbed: ieee80211_recv_action_register at ?? line ?
0x40104854: esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 406
Exception 9: LoadStoreAlignmentCause: Load or store to an unaligned address
Decoding 15 results
0x40233376: pbuf_cat at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c line 848 (discriminator 1)
0x4020e434: UdpContext::_s_recv(void*, udp_pcb*, pbuf*, ip4_addr const*, unsigned short) at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\libraries\ESP8266mDNS/ESP8266mDNS.cpp line 396
0x40232e6f: ip_chksum_pseudo at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/inet_chksum.c line 542
0x4022eef4: udp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/udp.c line 393
0x401007ac: umm_malloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1668
0x40240000: dh5_derive_shared at ?? line ?
0x402324f0: ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 679
0x4022b9fd: ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 182
0x40104854: esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 406
0x4025d8dc: ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 356
0x4025d8ee: ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 364
0x402587ba: ppPeocessRxPktHdr at ?? line ?
0x4025f8eb: ets_snprintf at ?? line ?

@tarzan115
Copy link
Contributor Author

I use libraries async-mqtt-client and ESPAsyncTCP with config ASYNC_TCP_SSL_ENABLED = 1 in file async_config.h of ESPAsyncTCP

:urch 28, 74
:urch 74, 36
:urd 12, 36, 13
:urch 36, 74
:urch 74, 36
:urd 12, 36, 13
:urch 36, 74
:urch 74, 36
:urd 12, 36, 13
:urch 36, 74
:urch 74, 36
:urd 12, 36, 13
:urch 36, 74
:urch 74, 28
:urd 4, 28, 13
:urch 28, 57
:urch 57, 57
:urch 57, 36
:urd 12, 36, 13
:urch 36, 36
:urd 12, 36, 13
:urch 36, 74
:urch 74, 74
:oom(260)@?
Fatal exception 29(StoreProhibitedCause):
epc1=0x4022634a, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000004, depc=0x00000000

Exception (29):
epc1=0x4022634a epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000004 depc=0x00000000
Exception 29: StoreProhibited: A store referenced a page mapped with an attribute that does not permit stores
Decoding 40 results
0x4022634a: memset at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/machine/xtensa/../../../../.././newlib/libc/machine/xtensa/memset.S line 133
0x40238466: more_comps at crypto/bigint.c line 672
0x40238660: alloc at crypto/bigint.c line 672
0x40238a26: regular_multiply at crypto/bigint.c line 672
0x4023850c: trim at crypto/bigint.c line 672
0x40238971: regular_multiply at crypto/bigint.c line 672
0x402394d1: bi_barrett at crypto/bigint.c line 1293
0x401007ac: umm_malloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1668
0x40239699: bi_mod_power at crypto/bigint.c line 1400
0x4023850c: trim at crypto/bigint.c line 672
0x4023a879: RSA_public at crypto/rsa.c line 254
:  (inlined by) RSA_encrypt at crypto/rsa.c line 288
0x40235548: send_client_key_xchg at ssl/tls1_clnt.c line 409
0x402359dd: do_clnt_handshake at ssl/tls1_clnt.c line 123
0x4022e5a6: pbuf_copy_partial at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c line 1306
0x401007ec: umm_calloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1693
0x4010020c: _umm_free at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x401009a4: free at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1737
0x402014cc: ax_port_read at D:\Documents\Arduino\libraries\ESPAsyncTCP\src/tcp_axtls.c line 581
0x40235350: do_handshake at ssl/tls1.c line 2007
:  (inlined by) basic_read at ssl/tls1.c line 1483
0x4022c66c: cyclic_timer at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 165
0x40234658: send_raw_packet at ssl/tls1.c line 2007
:  (inlined by) send_packet at ssl/tls1.c line 1269
0x40232278: ip4_output_if_opt_src at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 962
0x402354b8: ssl_read at ssl/tls1.c line 2007
0x402012a9: tcp_ssl_read at D:\Documents\Arduino\libraries\ESPAsyncTCP\src/tcp_axtls.c line 477
0x401009a4: free at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1737
0x40232908: mem_free at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c line 152
0x40217b3b: AsyncClient::_recv(tcp_pcb*, pbuf*, long) at D:\Documents\Arduino\libraries\ESPAsyncTCP\src/ESPAsyncTCP.cpp line 665
0x40217b64: AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, long) at D:\Documents\Arduino\libraries\ESPAsyncTCP\src/ESPAsyncTCP.cpp line 665
0x4022e01c: tcp_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_in.c line 435 (discriminator 1)
0x401007ac: umm_malloc at C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\cores\esp8266\umm_malloc/umm_malloc.c line 1668
0x40240000: sha256_init at ?? line ?
0x40232071: ip4_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 685
0x4022b571: ethernet_input_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 182
0x40104854: esp2glue_ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 406
0x4025d450: ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 356
0x4025d462: ethernet_input at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 364
0x4025845b: ppPeocessRxPktHdr at ?? line ?
0x4025f45f: ets_snprintf at ?? line ?

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 20, 2018

Looks like you are running out of memory as per the:oom message
and because a *alloc() is quite often near the top of your stacktraces.

@igrr this line seems to return NULL.

@devyte suggested that :oom should always be enabled by default.

More generally, would we have new functions like *alloc_check() along with our *alloc() standard functions.

  • *alloc() would panic()+reset() if oom occurs,
  • *alloc_check() wouid not but users of these *_check() functions promise to check the return value ensuring the sketch can go on ?

I would be please to provide such a PR for the core.

@earlephilhower what do you think ?

@igrr
Copy link
Member

igrr commented Jan 20, 2018

AxTLS is pretty lax in checking for null pointers (it mostly doesn't have any checks). Making allocs in axTLS to cause panic on failure would be an improvement, but only a cosmetic one — that doesn't help to get the problem fixed.
I'm not against doing that, though.

I don't like the idea of panicking on failed standard malloc. There are a few places where malloc failure is checked and handled in a reasonable way (e.g. in LwIP).

Another option (more general one) is to print some heap info in the panic handler. E.g. "last allocation has failed" kind of thing.

By the way, there is an open issue about axTLS crashing on oom. Suggest merging this one with that issue. I'm on mobile so can't find it right now...

@earlephilhower
Copy link
Collaborator

My own opinion is that I'm a bit leery of changing the behavior of such a basic operation. Malloc() can always return failure. If the app coder doesn't check for it. that's their own fault. May apps can handle a NULL return, but I'd bet that almost none of them would ever implement a mallloc_check(), just for the ESP8266 Arduino. So, I'd thumbs down changing malloc() to SEGV.

UMM spews :oom errors when that flag's enabled, right? Do we dump stack (but continue) at that point so at some later point when a real SEGV happens because of the NULL pointer? W/o call stack, the fact that a malloc failed isn't so useful for debugging where it happened.

Just thinking out loud, maybe somewhere between @igrr and @d-a-v, we can stash the app-level malloc() caller address when a malloc fails, and then dump that on a panic. That way your well-behaved app will still work fine when it gets a NULL, but a bad app will crash and have the last failed malloc() ID'd for examination? No need to spew :oom in that case.

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 21, 2018

@igrr is it #2201 ?

@earlephilhower :oom does not dump nor save stack when triggered but shows caller's FILE/LINE.
I like your third point, :oom saves stack then it is dumped on panic.
Do we end up with:

  • :oom always enabled by default as @devyte suggested
  • :oom counts null, saves (then overwrites) the calling stack when triggered
  • counter and saved stack are displayed on panic
  • this is all disabled with -DNDEBUG

@earlephilhower
Copy link
Collaborator

So even the memory manager itself sometimes doesn't check for null on malloc. 😞

void *umm_calloc( size_t num, size_t item_size ) {
...
  size += POISON_SIZE(size);
  ret = _umm_malloc(size);
  memset(ret, 0x00, size);
...
}

@devyte
Copy link
Collaborator

devyte commented Jan 21, 2018

This merits a local fix, and opening an issue in the umm repo.

@earlephilhower
Copy link
Collaborator

Assuming this is thr right source repo, it's already fixed there:

https://github.com/rhempel/umm_malloc/blob/a058c1a16f1a272ec2775b7f40181d6c50cfbd94/src/umm_malloc.c#L616

void *umm_calloc( size_t num, size_t item_size ) {
  void *ret;

  ret = umm_malloc((size_t)(item_size * num));
  if (ret)
      memset(ret, 0x00, (size_t)(item_size * num));

  return ret;
}

I'll open a bug and a patch in the ESP8266 repo.

@tarzan115
Copy link
Contributor Author

my issue is gone. Thanks, guys!
I think this issue should be close. 😄

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 22, 2018

@earlephilhower This is the right author but his code was not on github when @igrr started to use it.
On top of that since the realloc bug last month, a new branch (issue-11) was created with a wider fix than ours. @rhempel is wondering if we can validate it before merge. #4047 is a reminder for this task.

@earlephilhower
Copy link
Collaborator

@d-a-v I see you've pulled in the realloc() free()ing on OOM fix. The rest of his changes seems to be an optimization of the realloc algorithm, but it's only been in his codebase a couple weeks. Is that the bit you're talking about? Rest of the code seems untouched in that branch.

Any particular case you're trying to fix/optimize, that can be tested against his changes? W/o a compelling reason, IMO it's a bit premature to pull in such fresh code in a bit of the core that's so important.

@rhempel
Copy link

rhempel commented Feb 4, 2018

Hi folks, I am the author of umm_malloc() and would like to make sure that this library meets your exact needs. The specific case that that branch (issue-11) was trying to fix was that a failled re-allocation did not leave the original memory intact.
If there are cases where you feel that umm_malloc() is not meeting your needs please feel free to open an issue and I'll do my best to address it.
I would, however appreciate that when I do provide a fix that you test up against your application to verify the fix. Of course I have a test suite but it's only as good as the failures that get reported :-)

@devyte
Copy link
Collaborator

devyte commented Feb 4, 2018

Hi @rhempel , your support is greatly appreciated! We're currently in a sort of code freeze waiting for the imminent sdk 2.2 release, so that we can make our own much needed 2.4.1 release. Investigation of issue-11 is tracked in #4047 , and I certainly expect to look at it in detail myself, once 2.4.1 is out the door.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Projects
None yet
Development

No branches or pull requests

7 participants