Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new operator w/o (std::nothrow) aborts, when "Exceptions: disabled" #6269

Closed
6 tasks done
mhightower83 opened this issue Jul 7, 2019 · 7 comments
Closed
6 tasks done

Comments

@mhightower83
Copy link
Contributor

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in the current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: ESP-12
  • Core Version: [76cda9b]
  • Development Env: Arduino IDE
  • Operating System: Ubuntu

Settings in IDE

  • Module: Adafruit HUZZAH

  • Flash Mode: qio

  • Flash Size: 4MB

  • lwip Variant: v2 Lower Memory

  • Reset Method: nodemcu

  • Flash Frequency: 40Mhz

  • CPU Frequency: 80Mhz

  • Upload Using: SERIAL

  • Upload Speed: 115200(serial upload only)

  • Exceptions: Disabled

Problem Description

With "Exceptions: Disabled" selected from the Arduino IDE.
A build using the new operator without (std::nothrow) causes a crash, when out of memory.
Fails with "Abort called" and stack trace.
No opportunity to test for NULL.

This test sketch also fails with Arduino ESP8266 Core 2.5.0; however, it works (returns NULL) in Core 2.4.2.

And before rerunning each test, I did a git pull against the branch and ran ./get.py from the tools directory.

MCVE Sketch

void setup() {
  Serial.begin(115200);
  delay(20);
  Serial.printf("\nUp and running ...\n");
//  char *p = new (std::nothrow) char[128000];  // This will return NULL
  char *p = new char[128000];  //This fails with "Abort called" and stack trace

  if (p == NULL) {
    Serial.printf("\n\"new\" failed as it should! PASS!\n");
  } else {
    Serial.printf("\n\"new\" worked, it should not have!, FAIL!\n");
  }
}

void loop() {
}

Debug Messages

Up and running ...

Abort called

>>>stack>>>

ctx: cont
sp: 3ffffdd0 end: 3fffffc0 offset: 01b0
3fffff80:  40201d52 00000014 3ffee294 4020a4d4  
3fffff90:  3fffdad0 00000000 3ffee294 40201058  
3fffffa0:  feefeffe feefeffe 3ffee2bc 40201874  
3fffffb0:  feefeffe feefeffe 3ffe84f4 40100a41  
<<<stack<<<

last failed alloc call: 4020A504(128000)

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
v76cda9bd
~ld


Decoding stack results
0x40201d52: delay(unsigned long) at /home/mhightow/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_wiring.cpp line 57
0x4020a4d4: operator new[](unsigned int) at ../../../../../dl/gcc-xtensa/libstdc++-v3/libsupc++/new_opv.cc line 33
0x40201058: setup() at /home/mhightow/Arduino/test/nothrow3/nothrow3.ino line 14
0x40201874: loop_wrapper() at /home/mhightow/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 129

This issue may be related to #6251 (comment)

@earlephilhower
Copy link
Collaborator

That's the correct, expected behavior. New still had a failure. It'll try and throw an exception, but the core will redirect it to an abort and the code will end (since it's really an unhandled exception).

No-exceptions mode just means your code isn't blown up by 2x by exception frames (ehxxx in the elf) and handler code.

@mhightower83
Copy link
Contributor Author

Sorry, I guess I missed something somewhere. Dyslexia occasionally creates some reading comprehension issues for me. I just wanted to say, the need to add (std::nothrow), is new behavior for 2.5.0, which requires editing existing libraries that worked before.

In 2.4.2 and before we didn't need the (std::nothrow) to avoid the crash, abort. If the alloc failed, we just needed to follow up with a test for NULL.

@earlephilhower
Copy link
Collaborator

You're correct about earlier versions.

Unfortunately, while you were probably conscientious and checking for new failure I would bet that the vast majority of folks just ran and crashed sometime later. This also covers the case where plain variable constructors hit a problem. Can't check for NULL or partially built objects there, so throwing is the only way to generically signal "something bad happened."

mhightower83 added a commit to mhightower83/ESPAsyncTCP that referenced this issue Jul 16, 2019
with different core releases.

Summary:

The operator "new ..." was changed to "new (std::nothrow) ...", which will
return NULL when the heap is out of memory. Without the change "soft WDT"
was the result, starting with Arduino ESP8266 Core 2.5.0. (Note, RE:"soft
WDT" - the error  reporting may improve with core 2.6.) With proir core
versions the library appears to work fine.
ref: esp8266/Arduino#6269 (comment)

To support newer lwIP versions and buffer models. All references to 1460
were replaced with TCP_MSS. If TCP_MSS is not defined (exp. 1.4v lwIP)
1460 is assumed.

The ESPAsyncTCP library should build for Arduino ESP8266 Core releases:
2.3.0, 2.4.1, 2.4.2, 2.5.1, 2.5.2. It may still build with core versions
2.4.0 and 2.5.0. I did not do any regression testing with these, since
they had too many issues and were quickly superseded.

lwIP tcp_err() callback often resulted in crashes. The problem was a
tcp_err() would come in, while processing a send or receive in the
forground. The tcp_err() callback would be passed down to a client's
registered disconnect CB. A common problem with SyncClient and other
modules as well as some client code was: the freeing of ESPAsyncTCP
AsyncClient objects via disconnect CB handlers while the library was
waiting for an operstion to  finished. Attempts to access bad pointers
followed. For SyncClient this commonly occured during a call to delay().
On return to SyncClient _client was invalid. Also the problem described by
issue me-no-dev#94 also surfaced

Use of tcp_abort() required some very special handling and was very
challenging to make work without changing client API. ERR_ABRT can only be
used once on a return to lwIP for a given connection and since the
AsyncClient structure was sometimes deleted before returning to lwIP, the
state tracking became tricky. While ugly, a global variable for this
seemed to work; however, I  abanded it when I saw a possible
reentrancy/concurrency issue. After several approaches I settled the
problem by creating "class ACErrorTracker" to manage the issue.

Additional Async Client considerations:

The client sketch must always test if the connection is still up at loop()
entry and after the return of any function call, that may have done a
delay() or yield() or any ESPAsyncTCP library family call. For example,
the connection could be lost during a call to _client->write(...). Client
sketches that delete _client as part of their onDisconnect() handler must
be very careful as _client will become invalid after calls to delay(),
yield(), etc.
@mcspr
Copy link
Collaborator

mcspr commented Jul 16, 2019

Sorry for not posting my earlier question as an issue.

While complex objects may be a problem, bunch of the code uses "old" Arduino-inherited style to allocate basic types (i.e. mixing up malloc() and new):

if(!newbuf) {

if (!_sc || !_iobuf_in || !_iobuf_out) {

If the future thing is to finally enable exceptions, the intent here is to allow user app to catch badalloc exception?

Sidenote: I happen to notice this on ESP32, which behaves the same (but with a lot more memory to spare). So this may not be as surprising thing to happen as I initially thought.

@earlephilhower
Copy link
Collaborator

Yes. Throw() and catch() worked on GCC 4.8 until we lost the non-32-b-read-on-pgmem exception handler from the pre3.0.0 SDK (since pre3.0.0 was incredibly buggy). In gcc 9.1 I've just got a patch to rewrite the eh_frame parser working w/o any non-32b accesses and I hope to backport that very soon.

@earlephilhower
Copy link
Collaborator

@devyte, @d-a-v, any thoughts? It may be possible to pull new.o from the no-exceptions libs and revert to our basic malloc() wrapper (i.e. instead of crashing on a failed alloc), but I'm not too keen on it. We've already had new std::shared_ptr fail in bad ways when not crashing on new[] failure of the contained type.

@devyte
Copy link
Collaborator

devyte commented Jul 17, 2019

The correct solution would be to fix all code that uses new and expects it to not throw, and change it to new(nothrow). That is what the language standard mandates.
We could make new behave like new(nothrow) when exceptions are disabled right now to maintain backwards compat and not have things break out there with minor releases. But for 3.x I think we should adhere to the language standard, i. e. : if you want to assure nothrow, either use new(nothrow) or malloc and new in place, per C++ documentation.

me-no-dev pushed a commit to me-no-dev/ESPAsyncTCP that referenced this issue Sep 21, 2019
* Fixes the handling tcp_err() callback and addresses build issues
with different core releases.

Summary:

The operator "new ..." was changed to "new (std::nothrow) ...", which will
return NULL when the heap is out of memory. Without the change "soft WDT"
was the result, starting with Arduino ESP8266 Core 2.5.0. (Note, RE:"soft
WDT" - the error  reporting may improve with core 2.6.) With proir core
versions the library appears to work fine.
ref: esp8266/Arduino#6269 (comment)

To support newer lwIP versions and buffer models. All references to 1460
were replaced with TCP_MSS. If TCP_MSS is not defined (exp. 1.4v lwIP)
1460 is assumed.

The ESPAsyncTCP library should build for Arduino ESP8266 Core releases:
2.3.0, 2.4.1, 2.4.2, 2.5.1, 2.5.2. It may still build with core versions
2.4.0 and 2.5.0. I did not do any regression testing with these, since
they had too many issues and were quickly superseded.

lwIP tcp_err() callback often resulted in crashes. The problem was a
tcp_err() would come in, while processing a send or receive in the
forground. The tcp_err() callback would be passed down to a client's
registered disconnect CB. A common problem with SyncClient and other
modules as well as some client code was: the freeing of ESPAsyncTCP
AsyncClient objects via disconnect CB handlers while the library was
waiting for an operstion to  finished. Attempts to access bad pointers
followed. For SyncClient this commonly occured during a call to delay().
On return to SyncClient _client was invalid. Also the problem described by
issue #94 also surfaced

Use of tcp_abort() required some very special handling and was very
challenging to make work without changing client API. ERR_ABRT can only be
used once on a return to lwIP for a given connection and since the
AsyncClient structure was sometimes deleted before returning to lwIP, the
state tracking became tricky. While ugly, a global variable for this
seemed to work; however, I  abanded it when I saw a possible
reentrancy/concurrency issue. After several approaches I settled the
problem by creating "class ACErrorTracker" to manage the issue.

Additional Async Client considerations:

The client sketch must always test if the connection is still up at loop()
entry and after the return of any function call, that may have done a
delay() or yield() or any ESPAsyncTCP library family call. For example,
the connection could be lost during a call to _client->write(...). Client
sketches that delete _client as part of their onDisconnect() handler must
be very careful as _client will become invalid after calls to delay(),
yield(), etc.

* moved DebugPrintMacros.h to the correct loccation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants