-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposed umm_malloc improvements #6274
Conversation
…pport alternative. Replace ets_intr_lock()/ets_intr_unlock() with uint32_t oldValue=xt_rsil(3)/xt_wrs(oldValue). Added UMM_CRITICAL_DECL macro to define storage for current state. Expanded UMM_CRITICAL_... to use unique identifiers. This helpt facilitate gather function specific timing information. Replace printf with something that is ROM or IRAM based so that a printf that occurs during an ISR malloc/new does not cause a crash. To avoid any reentry issue it should also avoid doing malloc lib calls. Refactor realloc to avoid memcpy/memmove while in critical section. This is only effective when realloc is called with interrupts enabled. The copy process alone can take over 10us (when copying more than ~498 bytes with a 80MHz CPU clock). It would be good practice for an ISR to avoid realloc. Note, while doing this might initially sound scary, this appears to be very stable. It ran on my troublesome sketch for over 3 weeks until I got back from vacation and flashed an update. Troublesome sketch - runs ESPAsyncTCP, with modified fauxmo emulation for 10 devices. It receives lost of Network traffic related to uPnP scans, which includes lots of TCP connects disconnects RSTs related to uPnP discovery. I have clocked umm_info critical lock time taking as much as 180us. A common use for the umm_info call is to get the free heap result. It is common to try and closely monitor free heap as a method to detect memory leaks. This may result in frequent calls to umm_info. There has not been a clear test case that shows an issue yet; however, I and others think they are or have had crashes related to this. I have added code that adjusts the running free heap number from _umm_malloc, _umm_realloc, and _umm_free. Removing the need to do a long interrupts disabled calculation via _umm_info. Build optional, min/max time measurements for locks held while in info, malloc, realloc, and free. Also, maintain a count of how many times each is called with INTLEVEL set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ets_strlen() is a problem, but everything else seems minor.
Uhh, sorry, I may not have understood correctly, what I was asked there on that merge branch question? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get the comment about our special case on ets_strlen? OTW, I like it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Switch default FW to "2.2.2-dev(38a443e)" (menu:2.2.1+100) (esp8266#6272)
@earlephilhower - oh you got that already. I didn't think I had finished? When I a have an established pull request from say branch pr3 on my fork of arduino/esp8266 to master branch on https://github.com/esp8266 and I the push an update to my branch pr3 on GitHub, am I done? No need to redo the push to https://github.com/esp8266? Did I need to do the merge of https://github.com/esp8266 master back to my pr3 on GitHub? |
@mhightower83 you seem to be doing this correctly:
|
@devyte Thanks for the info, I get it now. |
I think we may need to add memory barrier to From /* NOTE: these asm macros don't modify memory, but they are marked
* as such to act as memory access barriers to the compiler because
* these macros are sometimes used to delineate critical sections;
* function calls are natural barriers (the compiler does not know
* whether a function modifies memory) unless declared to be inlined. */
# define XTOS_SET_INTLEVEL(intlevel) ({ unsigned __tmp; \
__asm__ __volatile__( "rsil %0, " XTSTR(intlevel) "\n" \
: "=a" (__tmp) : : "memory" ); \
__tmp;}) In my first unseen fix to umm_malloc I used
I switched it over to use I went back to try and understand better, the significance of the "memory" parameter. Then I realized that the current Arduino/cores/esp8266/Arduino.h Line 162 in cd6cf98
might need it. I wanted to see the results in assembly. So I did a quick compile to .S of this sample function with the old #include <Arduino.h>
#if 0
#undef xt_rsil
#define xt_rsil(level) (__extension__({uint32_t state; __asm__ __volatile__("rsil %0," __STRINGIFY(level) : "=a" (state)::"memory"); state;}))
#endif
int just4Fun(int *a, int *b) {
uint32_t old_ps;
int result = *a;
old_ps = xt_rsil(15);
*b = *a;
xt_wsr_ps(old_ps);
return result;
} Using _Z8just4FunPiS_:
l32i.n a2, a2, 0
rsil a4,15
s32i.n a2, a3, 0
wsr a4,ps; isync
ret.n Using _Z8just4FunPiS_:
l32i.n a4, a2, 0
rsil a5,15
l32i.n a2, a2, 0
s32i.n a2, a3, 0
wsr a5,ps; isync
mov.n a2, a4
ret.n Notice the change.
|
The change to xt_rsil() needs to be copied into to umm_malloc_cfg.h. Is there a better way to handle this? At the moment, when Esp-frag.cpp builds, a warning is issued so you know the mismatch exists. Which is probably better than renaming the macro locally in umm_malloc_cfg.h and not know it mismatches. I also have some comment corrections. @devyte If I do a push will it create any disturbance with your review? |
Push away :) I tend to do multiple review passes anyways because I don't always see everything at first |
@devyte, it looks like @mhightower83 is next in line for merge, once he gets the change he was talking about included. |
@earlephilhower The changes I described were there when you merged master into pr3, 1 hour ago. @devyte indicated he had a lot of comments. Based on that, I'll need to do at least one more push after he reviews. |
Yes. Please move, in this PR, Including edit: While you are at it, and because you use the same function in your PR, and if you don't mind, please add also a define like |
Arduino.h, to cores/esp8266/core_esp8266_features.h Added esp_get_cycle_count() to core_esp8266_features.h. Updated umm_malloc and Esp.h to use new defines and location.
@d-a-v Looks like xt_rsil/xt_wsr_ps move trips on the stuff in the Travis ./common/mock.h file. |
Sure, like cut-n-pasting like you said. CSE and other optimizations can happen then. My statement was that you can't move an assignment past a conditional dependent on it (we're not the Itanic :) ). The main IRQ goes like: We're probably getting down a rathole not related to UMM here, and this is already a long thread. Any update @devyte? I don't understand the concern (infinite loop of includes or something?) |
umm_realloc() to avoid exposing a transient OOM condition to ISR.
umm_realloc() to avoid exposing a transient OOM condition to ISR.
to avoid a possible OOM for an ISR.
Updated realloc() to do a preliminary free() of unused space, before performing a critical section exit and memmove. This change was applied to the current _umm_realloc(). This change should reduce the risk of an ISR getting an OOM, during a realloc memmove operation. Added additional stats for verifying correct operation.
Updated realloc() to do a preliminary free() of unused space, before performing a critical section exit and memmove. This change was applied to the current _umm_realloc(). This change should reduce the risk of an ISR getting an OOM when interrupting an active realloc memmove operation. Added additional stats for verifying correct operation. Updated: for clarity and Travis-CI fail.
@devyte, I think you're up again on this one. Don't want it to fade off the first page of PRs... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devyte will be out of communication for a couple weeks, but he mentioned an issue with includes, which I think I've addressed with the comments. OTW, I'm good with getting this in and seeing how it runs...
travis failed. @earlephilhower any thoughts on what is needed here: Installing APT Packages
35611.72s$ travis_apt_get_update
3570.45s$ sudo -E apt-get -yq --no-install-suggests --no-install-recommends $(travis_apt_get_options) install g++-7 gcc-7
358Reading package lists...
359Building dependency tree...
360Reading state information...
361E: Unable to locate package g++-7
362E: Couldn't find any package by regex 'g++-7'
363E: Unable to locate package gcc-7
apt-get.diagnostics
364apt-get install failed
469The command "sudo -E apt-get -yq --no-install-suggests --no-install-recommends $(travis_apt_get_options) install g++-7 gcc-7" failed and exited with 100 during . |
Looks like a transient Travis.ci problem. I've restarted the job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Hopefully @devyte can chime in. We can always pull in the DEBUGLOG macro from upstream changes at a later time.
The library versions of memmove, memcpy, and memset were in flash. Updated to use ROM functions ets_memmove, ets_memcpy, and ets_memset. Additional note, the library version of memmove does not appear to have been optimized. It took almost 10x longer than the ROM version. Renamed printf macro to DBGLOG_FUNCTION and moved to umm_malloc_cfg.h. Changed printf macro usage to use DBGLOG_FUNCTION.
@devyte @d-a-v @earlephilhower there are some diagnostic areas of the umm_malloc code that I am noticing and have concerns with:
At this time I currently have no measurements on the time to perform INTEGRITY_CHECK() or CHECK_POISON_ALL_BLOCKS() I guess I'll add that to my TODO list. I am thinking that only the GET_UNPOISONED() option should be enabled when just selecting debug serial port. The CHECK_POISON_ALL_BLOCKS() should take more work to turn on. Some of these more aggressive diagnostics could start creating problems, especially when critical sections are added. Sigh, I am never going to be though with this PR 🤕 |
CC @rhempel for @mhightower83 's concerns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently suspect that realloc is fundamentally unsafe and should never be called from an ISR. Similar for free.
Addresses: #6246 (comment)
Correct critical section with interrupt level preserving and nest support
alternative. Replace ets_intr_lock()/ets_intr_unlock() with uint32_t
oldValue=xt_rsil(3)/xt_wrs(oldValue). Added UMM_CRITICAL_DECL macro to define
storage for current state. Expanded UMM_CRITICAL_... to use unique
identifiers. This helpt facilitate gather function specific timing
information.
Replace printf with something that is ROM or IRAM based so that a printf
that occurs during an ISR malloc/new does not cause a crash. To avoid any
reentry issue it should also avoid doing malloc lib calls.
Refactor realloc to avoid memcpy/memmove while in critical section. This is
only effective when realloc is called with interrupts enabled. The copy
process alone can take over 10us (when copying more than ~498 bytes with a
80MHz CPU clock). It would be good practice for an ISR to avoid realloc.
Note, while doing this might initially sound scary, this appears to be very
stable. It ran on my troublesome sketch for over 3 weeks until I got back from
vacation and flashed an update. Troublesome sketch - runs ESPAsyncTCP, with
modified fauxmo emulation for 10 devices. It receives lost of Network traffic
related to uPnP scans, which includes lots of TCP connects disconnects RSTs
related to uPnP discovery.
I have clocked umm_info critical lock time taking as much as 180us. A common
use for the umm_info call is to get the free heap result. It is common
to try and closely monitor free heap as a method to detect memory leaks.
This may result in frequent calls to umm_info. There has not been a clear
test case that shows an issue yet; however, I and others think they are or
have had crashes related to this.
I have added code that adjusts the running free heap number from _umm_malloc,
_umm_realloc, and _umm_free. Removing the need to do a long interrupts
disabled calculation via _umm_info.
Build optional, min/max time measurements for locks held while in info,
malloc, realloc, and free. Also, maintain a count of how many times each is
called with INTLEVEL set.