Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ultra-low overhead record last failed memory allocation, dump on a panic #4220

Merged
merged 4 commits into from
Mar 17, 2018

Conversation

earlephilhower
Copy link
Collaborator

@earlephilhower earlephilhower commented Jan 23, 2018

I know there was an issue of gitter discussion about this with @d-a-v and @devyte , but I can't seem to find it now.

This patch implements a very low overhead way of recording the caller/size of the last allocation that failed, and only printing it out on a panic. Unlike the OOM debug, there's no add'l strings carried, as we use a gcc builtin to get the malloc/etc.'s caller address (a simple offset from a1 gives the return address, so this is very low CPU overhead wven when it fails).

Combined with a patch to EspExceptionDecoder this gives a reasonably useful insight into where things are going bad w/o changing your compilation options (i.e. works w/NDEBUG to allow highest RAM/flash app usage).

-edit- Update to pull 31, which has the toolchain bug workaround
EspExceptionDecoder patch: me-no-dev/EspExceptionDecoder#31

---snip---

At runtime, whenever a realloc, malloc, calloc, or new call fails to
find enough memory, record the calling function and size requested.
Do NOT print anything or call any heavyweight functions on this, as it
is legal to return NULL to an alloc-type call.

If the main application handles this NULL properly, there should be no
panic and nothing different will happen.

If the main application panics() at any later point in time, this record
will be printed out as part of the crash log for processing with an
updated EspExceptionDecoder.java.

This adds 2-3 instructions overhead in the normal case, and around 10-12
instructions in the failing case, and requires an additional 8 bytes of
.BSS to function.

Only a single address is kept, the final failing malloc-type function call
before a panic, but it is trivial to extend to a fifo with little overhead
in the common, non-failing case.

@earlephilhower
Copy link
Collaborator Author

Sample panic output:

Connecting to WiFi
...Connecting to WiFi
...Connecting to WiFi
...Connecting to WiFi
...Connecting to WiFi
...Connecting to WiFi
Connected
Go to http://192.168.1.138/ to control the web radio.
Resuming stream from EEPROM: http://rb-bremeneins-live.cast.addradio.de/rb/bremeneins/live/mp3/128/stream.mp3, type=MP3, vol=80
Running for 21 seconds ...Free mem=7832
Changing URL to: http://rb-bremeneins-live.cast.addradio.de/rb/bremeneins/live/mp3/128/stream.mp3, vol=80
Before stop...Free mem=7832
After stop...Free mem=7832
Saved settings
created icystream
created buffer
created decoder
Decoder start...
Done start new URL
Running for 22 seconds*...Free mem=3696
Running for 23 seconds*...Free mem=3696
Running for 24 seconds*...Free mem=3696
Running for 25 seconds*...Free mem=3696
Running for 26 seconds*...Free mem=4560
Running for 27 seconds*...Free mem=4560
Running for 28 seconds*...Free mem=4560
Running for 29 seconds*...Free mem=2520
Running for 30 seconds*...Free mem=2520

Exception (29):
epc1=0x40213596 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000004 depc=0x00000000

ctx: sys 
sp: 3ffffd30 end: 3fffffb0 offset: 01a0

>>>stack>>>
3ffffed0:  00000000 3fffa5d4 3fff03ad 40213594  
3ffffee0:  3fffa5d4 3fff03ac 3fff03ad 40213620  
3ffffef0:  0000003c 00000000 00000000 40222802  
3fffff00:  8a01a8c0 00000000 3ffead90 00000000  
3fffff10:  00000000 0000003c 3fffff80 3fffab5a  
3fffff20:  3fff054c 3fffab3c 3fff0634 402268b1  
3fffff30:  00000014 00001282 00001282 3fff054c  
3fffff40:  3fffdc80 3fff0aec 3fff9a64 3fff0b3c  
3fffff50:  00000008 3fff054c 3fffab3c 4021ffcd  
3fffff60:  3fffdc80 3fff0aec 3fff9a64 4010453c  
3fffff70:  4023d7b0 3fff0aec 3fff9a64 4023d7c2  
3fffff80:  3fffab4c 3fffab3c 00000000 3fffdcc0  
3fffff90:  402387f3 00000000 3fff9a64 4023f79b  
3fffffa0:  40000f49 3fffdab0 3fffdab0 40000f49  
<<<stack<<<

last failed alloc call: 40213594(48)

 ets Jan  8 2013,rst cause:2, boot mode:(1,6)


 ets Jan  8 2013,rst cause:4, boot mode:(1,6)

wdt reset

@earlephilhower
Copy link
Collaborator Author

Sample decoder output
image

@igrr
Copy link
Member

igrr commented Jan 23, 2018

Looks good, although it is a sad that we have to replicate this code twice to get the caller address. It would be much nicer to do this in malloc, which all of {new, _malloc_r, pvPortMalloc} will call.

Maybe we should look into enabling -fno-omit-frame-pointer for better backtraces.

@earlephilhower
Copy link
Collaborator Author

The docs say that anything other than (0) (i.e. caller) may crash in the docs. I can't vouch one way or the other.

https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html

Calling this function with a nonzero argument can have unpredictable effects, including crashing the calling program. As a result, calls that are considered unsafe are diagnosed when the -Wframe-address option is in effect. Such calls should only be made in debugging situations.

@igrr
Copy link
Member

igrr commented Jan 23, 2018

On Xtensa with Window ABI it can indeed crash, if one tries to get the return address beyond the end of the stack. With sufficient checking of return values, we found it to be safe in practice.

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 25, 2018

I tried and -fno-omit-frame-pointer cost: ~500 Bytes (at least) - It could be automatically enabled along with debug options, or disabled with none or NDEBUG.
+1 for moving this (great) feature into umm_malloc()
+1 for the trivial fifo

@earlephilhower
Copy link
Collaborator Author

@d-a-v Good datapoint, what exactly was built and is that 500 bytes of add'l stack usage or 500 bytes of code? I'm not sure how large a full-fat stack frame is on the LX106, but if it's large this could have an impact on highly object-ified code.

We'd also need to rebuild axtls, lwip, and newlib with the option if you want to get reasonable debugging coverage.

I'd vote against changing this setting with a debug flag. This is most useful in the case when I don't have any debug enabled...if you're running w/debug, just enable OOM debug and you'll get even more info than this can provide. If you're not, at least you get a call stack (w/no-omit) or calling function (w/o no-omit) to work from.

To move into umm_malloc, it would have to use no-frame-omit. OTW you'll catch only the wrapper malloc()/new()/etc. function as the caller. Right now it works because it overwrites multiple times in some cases (i.e. operator new() calls malloc. Malloc will write the caller as new() and return NULL. Operator new() will then overwrite the caller as (__builtin_caller(0)) and return NULL. I actually had it there, too, until I realized that it was effectively a no-op since nobody but the wrappers call umm_malloc().

Maybe a stepped approach. Get it out there for a release as-is, leaving the ABI stack-frame settings as-is, see if it's useful or people clamor for the more detail that the no-omit-frame can provide.

@devyte
Copy link
Collaborator

devyte commented Jan 29, 2018

I'm confused. I see the changes in new/delete/_*alloc_r(), but how does it work when the user calls malloc/realloc/calloc() directly?

@earlephilhower
Copy link
Collaborator Author

As well you should be, @devyte .

It was there when I was testing, but I didn't include it in the patch as I thought (quite incorrectly, after examining umm_malloc_cfg.h) that the non *_rs actually fell through to the re-entrant versions (which would actually end up in infinite recursion upon closer inspection). I'll re-up the patch including the umm_malloc.c changes as well. The same if(fail) { record caller/size } 4-lines of code will be in umm_malloc, umm_realloc, umm_calloc.

These will work because it's basically brute-force overwriting the last callback function multiple times on a failure. Should the -fno-omit-frame change go through, only the umm_malloc() calls would need to be instrumented (as long as the # of recorded callbacks exceeds 2 you'll end up back in a userland function).

Is there some method/place in the repo for putting in test sketches? Something like this would be easily caught with a set of ~6 basic tests (each test would have to fault and abort so you couldn't roll them into a single one). I've been manually writing little one-offs to test, but it's obviously not methodical or easily reproducible. W/o an emulator doing CI on an embedded system is darn hard...

@devyte
Copy link
Collaborator

devyte commented Jan 29, 2018

@earlephilhower I have yet to fully dwell into the whole testing frame, so my knowledge about it is limited, but my current understanding is that there are 2 places for unit tests:

  • tests/host for Linux-side tests that run with CI. Must be code that is higher-level, buildable on Linux, and therefore not dependent on the ESP hardware.
  • tests/device for tests that are run on the ESP. I don't know how/when these are run, but suspect @igrr runs them before making a release.

@igrr
Copy link
Member

igrr commented Jan 29, 2018

I've recently added some documentation about the tests: https://github.com/esp8266/Arduino/blob/master/tests/README.md#testing-on-device

@earlephilhower
Copy link
Collaborator Author

Cool, I hadn't noticed that at all. After looking at it I don't think it can be used for exception testing, but for standard library-type things it doesn't look too bad to implement!

At runtime, whenever a realloc, malloc, calloc, or new call fails to
find enough memory, record the calling function and size requested.
Do NOT print anything or call any heavyweight functions on this, as it
is legal to return NULL to an alloc-type call.

If the main application handles this NULL properly, there should be no
panic and nothing different will happen.

If the main application panics() at any later point in time, this record
will be printed out as part of the crash log for processing with an
updated EspExceptionDecoder.java.

This adds 2-3 instructions overhead in the normal case, and around 10-12
instructions in the failing case, and requires an additional 8 bytes of
.BSS to function.

Only a single address is kept, the final failing malloc-type function call
before a panic, but it is trivial to extend to a fifo with little overhead
in the common, non-failing case.
@everslick
Copy link
Contributor

everslick commented Jan 30, 2018

may i suggest something like this:

#ifndef DISABLE_OUT_OF_MEM_CHECK
#define CHECK_OUT_OF_MEM(_SIZE_, _RET_)
#else
#define CHECK_OUT_OF_MEM(_SIZE_, _RET_) \
do { \
  if (0 != _SIZE_ && 0 == _RET_) { \
    umm_last_fail_alloc_addr = __builtin_return_address(0); \
    umm_last_fail_alloc_size = _SIZE_; \
  } \
while (0)
#endif

this will avoid some code duplication and allows the feature to be switched off via compile flags.

@igrr igrr added this to the 2.5.0 milestone Mar 8, 2018
@devyte
Copy link
Collaborator

devyte commented Mar 17, 2018

@earlephilhower what's the status of this? Is it ready for merge?

@earlephilhower
Copy link
Collaborator Author

@devyte, It'll be good for merge tomorrow after a minor merge fix due to some changes in postmortem.c.

I'll have to ping @me-no-dev to see if he'd be willing to look at the the associated pull in me-no-dev/EspExceptionDecoder#31 so the decoder will actually show this new info, but this can go in w/o it as the info will just be ignored with the old decoder.

@earlephilhower earlephilhower merged commit 4a958c8 into esp8266:master Mar 17, 2018
@earlephilhower earlephilhower deleted the oom_recorder branch March 27, 2018 22:52
bryceschober pushed a commit to bryceschober/Arduino that referenced this pull request Apr 5, 2018
At runtime, whenever a realloc, malloc, calloc, or new call fails to
find enough memory, record the calling function and size requested.
Does not print anything or call any heavyweight functions on this, as it
is legal to return NULL to an alloc-type call.

If the main application handles this NULL properly, there should be no
panic and nothing different will happen.

If the main application panics() at any later point in time, this record
will be printed out as part of the crash log for processing with an
updated EspExceptionDecoder.java.

This adds 2-3 instructions overhead in the normal case, and around 10-12
instructions in the failing case, and requires an additional 8 bytes of
.BSS to function.

Only a single address is kept, the final failing malloc-type function call
before a panic, but it is trivial to extend to a fifo with little overhead
in the common, non-failing case.

(cherry picked from commit 4a958c8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants