Skip to content

Conversation

@mhightower83
Copy link
Contributor

@mhightower83 mhightower83 commented Jul 9, 2019

Addresses: #6246 (comment)

Correct critical section with interrupt level preserving and nest support
alternative. Replace ets_intr_lock()/ets_intr_unlock() with uint32_t
oldValue=xt_rsil(3)/xt_wrs(oldValue). Added UMM_CRITICAL_DECL macro to define
storage for current state. Expanded UMM_CRITICAL_... to use unique
identifiers. This helpt facilitate gather function specific timing
information.

Replace printf with something that is ROM or IRAM based so that a printf
that occurs during an ISR malloc/new does not cause a crash. To avoid any
reentry issue it should also avoid doing malloc lib calls.

Refactor realloc to avoid memcpy/memmove while in critical section. This is
only effective when realloc is called with interrupts enabled. The copy
process alone can take over 10us (when copying more than ~498 bytes with a
80MHz CPU clock). It would be good practice for an ISR to avoid realloc.
Note, while doing this might initially sound scary, this appears to be very
stable. It ran on my troublesome sketch for over 3 weeks until I got back from
vacation and flashed an update. Troublesome sketch - runs ESPAsyncTCP, with
modified fauxmo emulation for 10 devices. It receives lost of Network traffic
related to uPnP scans, which includes lots of TCP connects disconnects RSTs
related to uPnP discovery.

I have clocked umm_info critical lock time taking as much as 180us. A common
use for the umm_info call is to get the free heap result. It is common
to try and closely monitor free heap as a method to detect memory leaks.
This may result in frequent calls to umm_info. There has not been a clear
test case that shows an issue yet; however, I and others think they are or
have had crashes related to this.

I have added code that adjusts the running free heap number from _umm_malloc,
_umm_realloc, and _umm_free. Removing the need to do a long interrupts
disabled calculation via _umm_info.

Build optional, min/max time measurements for locks held while in info,
malloc, realloc, and free. Also, maintain a count of how many times each is
called with INTLEVEL set.

…pport

  alternative. Replace ets_intr_lock()/ets_intr_unlock() with uint32_t
  oldValue=xt_rsil(3)/xt_wrs(oldValue). Added UMM_CRITICAL_DECL macro to define
  storage for current state. Expanded UMM_CRITICAL_... to  use unique
  identifiers. This helpt facilitate gather function specific  timing
  information.

  Replace printf with something that is ROM or IRAM based so that a printf
  that occurs during an ISR malloc/new does not cause a crash. To avoid any
  reentry issue it should also avoid doing malloc lib calls.

  Refactor realloc to avoid memcpy/memmove while in critical section. This is
  only effective when realloc is called with interrupts enabled. The copy
  process alone can take over 10us (when copying more than ~498 bytes with a
  80MHz CPU clock). It would be good practice for an ISR to avoid realloc.
  Note, while doing this might initially sound scary, this appears to be very
  stable. It ran on my troublesome sketch for over 3 weeks until I got back from
  vacation and  flashed an update. Troublesome sketch - runs ESPAsyncTCP, with
  modified fauxmo emulation for 10 devices. It receives lost of Network traffic
  related to uPnP scans, which includes lots of TCP connects disconnects RSTs
  related to uPnP discovery.

  I have clocked umm_info critical lock time taking as much as 180us. A common
  use for the umm_info call is to get the free heap result. It is common
  to try and closely monitor free heap as a method to detect memory leaks.
  This may result in frequent calls to umm_info. There has not been a clear
  test case that shows an issue yet; however, I and others think they are or
  have had crashes related to this.

  I have added code that adjusts the running free heap number from _umm_malloc,
  _umm_realloc, and _umm_free. Removing the need to do a long interrupts
  disabled calculation via _umm_info.

  Build optional, min/max time measurements for locks held while in info,
  malloc, realloc, and free. Also, maintain a count of how many times each is
  called with INTLEVEL set.
Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ets_strlen() is a problem, but everything else seems minor.

@mhightower83
Copy link
Contributor Author

Uhh, sorry, I may not have understood correctly, what I was asked there on that merge branch question?
Please disregard.

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the comment about our special case on ets_strlen? OTW, I like it...

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Switch default FW to "2.2.2-dev(38a443e)" (menu:2.2.1+100) (esp8266#6272)
@mhightower83
Copy link
Contributor Author

@earlephilhower - oh you got that already. I didn't think I had finished?
Sorry, I am really new to using GitHub with these updates using push/pull, etc to handle follow-ups.

When I a have an established pull request from say branch pr3 on my fork of arduino/esp8266 to master branch on https://github.com/esp8266 and I the push an update to my branch pr3 on GitHub, am I done?

No need to redo the push to https://github.com/esp8266?

Did I need to do the merge of https://github.com/esp8266 master back to my pr3 on GitHub?

@devyte
Copy link
Collaborator

devyte commented Jul 10, 2019

@mhightower83 you seem to be doing this correctly:

  • work in a branch in your fork of this repo
  • commit locally in your workspace
  • push to the branch in your repo once done
  • visit github (either your fork or here) and push the green button to make the PR
  • If you need to update, just push to your branch in your repo and the PR gets automagically updated
  • we can update the branch from the web interface. This can cause your local workspace to become unsynced. In that case you have to do special stuff to pull latest changes and merge them into your local workspace.

@mhightower83
Copy link
Contributor Author

@devyte Thanks for the info, I get it now.

@devyte devyte self-requested a review July 10, 2019 03:47
@mhightower83
Copy link
Contributor Author

I think we may need to add memory barrier to xt_rsil().
Based on the comments I saw here:

From ESP8266_RTOS_SDK/components/esp8266/include/xtensa/xtruntime.h

/*  NOTE: these asm macros don't modify memory, but they are marked
 *  as such to act as memory access barriers to the compiler because
 *  these macros are sometimes used to delineate critical sections;
 *  function calls are natural barriers (the compiler does not know
 *  whether a function modifies memory) unless declared to be inlined.  */
# define XTOS_SET_INTLEVEL(intlevel)            ({ unsigned __tmp; \
                        __asm__ __volatile__(   "rsil   %0, " XTSTR(intlevel) "\n" \
                                                : "=a" (__tmp) : : "memory" ); \
                        __tmp;})

In my first unseen fix to umm_malloc I used XTOS_SET_MIN_INTLEVEL similar to the above; however,
based on @devyte comment:

... The core currently doesn't support masking a specific interrupt level. That doesn't mean it can't be implemented, only that at this time locking is implemented as all or nothing.

I switched it over to use xt_rsil()/xt_wsr_ps() for the proposed fix.

I went back to try and understand better, the significance of the "memory" parameter. Then I realized that the current xt_rsil() in

#define xt_rsil(level) (__extension__({uint32_t state; __asm__ __volatile__("rsil %0," __STRINGIFY(level) : "=a" (state)); state;}))

might need it.

I wanted to see the results in assembly. So I did a quick compile to .S of this sample function with the old xt_rsil() and one with "memory" specified:

#include <Arduino.h>

#if 0
#undef xt_rsil
#define xt_rsil(level) (__extension__({uint32_t state; __asm__ __volatile__("rsil %0," __STRINGIFY(level) : "=a" (state)::"memory"); state;}))
#endif

int just4Fun(int *a, int *b) {
  uint32_t old_ps;
  int result = *a;
  old_ps = xt_rsil(15);
  *b = *a;
  xt_wsr_ps(old_ps);
  return result;
}

Using xt_rsil() without "memory" fence, trimmed .S results:

_Z8just4FunPiS_:
        l32i.n  a2, a2, 0
        rsil a4,15
        s32i.n  a2, a3, 0
        wsr a4,ps; isync
        ret.n

Using xt_rsil() with "memory" fence, trimmed .S results:

_Z8just4FunPiS_:
        l32i.n  a4, a2, 0
        rsil a5,15
        l32i.n  a2, a2, 0
        s32i.n  a2, a3, 0
        wsr a5,ps; isync
        mov.n   a2, a4
        ret.n

Notice the change.

  • The first example reads *a outside the critical area. Then saves that value to *b while in the critical area.
  • The second example reads a copy of *a outside the critical area. However, while in the critical area it reads *a again before saving it to *b.

@mhightower83
Copy link
Contributor Author

The change to xt_rsil() needs to be copied into to umm_malloc_cfg.h. Is there a better way to handle this? At the moment, when Esp-frag.cpp builds, a warning is issued so you know the mismatch exists. Which is probably better than renaming the macro locally in umm_malloc_cfg.h and not know it mismatches.

I also have some comment corrections.

@devyte If I do a push will it create any disturbance with your review?

@devyte
Copy link
Collaborator

devyte commented Jul 15, 2019

Push away :) I tend to do multiple review passes anyways because I don't always see everything at first

@earlephilhower
Copy link
Collaborator

@devyte, it looks like @mhightower83 is next in line for merge, once he gets the change he was talking about included.

@mhightower83
Copy link
Contributor Author

@earlephilhower The changes I described were there when you merged master into pr3, 1 hour ago. @devyte indicated he had a lot of comments. Based on that, I'll need to do at least one more push after he reviews.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 16, 2019

The change to xt_rsil() needs to be copied into to umm_malloc_cfg.h. Is there a better way to handle this?

Yes. Please move, in this PR, xt_rsil&co definitions out from Arduino.h, say, to already existing cores/esp8266/core_esp8266_features.h.

Including Arduino.h just to get these macros available is painful because of include file dependencies, in umm and also everywhere else.

edit: While you are at it, and because you use the same function in your PR, and if you don't mind, please add also a define like esp_get_cycle_count() in the same file and use it (in umm and) in the bottom of Esp.h (uint32_t EspClass::getCycleCount()).

Arduino.h, to cores/esp8266/core_esp8266_features.h
Added esp_get_cycle_count() to core_esp8266_features.h.
Updated umm_malloc and Esp.h to use new defines and location.
@mhightower83
Copy link
Contributor Author

@d-a-v Looks like xt_rsil/xt_wsr_ps move trips on the stuff in the Travis ./common/mock.h file.

@earlephilhower
Copy link
Collaborator

Sure, like cut-n-pasting like you said. CSE and other optimizations can happen then. My statement was that you can't move an assignment past a conditional dependent on it (we're not the Itanic :) ). The main IRQ goes like: now=get-cycle-count; forall active-waveform-pins: if (now>trigger[pin]) do-toggle-stuff...

We're probably getting down a rathole not related to UMM here, and this is already a long thread. Any update @devyte? I don't understand the concern (infinite loop of includes or something?)

umm_realloc() to avoid exposing a transient OOM condition to ISR.
umm_realloc() to avoid exposing a transient OOM condition to ISR.
Updated realloc() to do a preliminary free() of unused space,
before performing a critical section exit and memmove.
This change was applied to the current _umm_realloc().
This change should reduce the risk of an ISR getting an
OOM, during a realloc memmove operation.
Added additional stats for verifying correct operation.
Updated realloc() to do a preliminary free() of unused space,
before performing a critical section exit and memmove.
This change was applied to the current _umm_realloc().
This change should reduce the risk of an ISR getting an
OOM when interrupting an active realloc memmove operation.
Added additional stats for verifying correct operation.
Updated: for clarity and Travis-CI fail.
@earlephilhower
Copy link
Collaborator

@devyte, I think you're up again on this one. Don't want it to fade off the first page of PRs...

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devyte will be out of communication for a couple weeks, but he mentioned an issue with includes, which I think I've addressed with the comments. OTW, I'm good with getting this in and seeing how it runs...

@mhightower83
Copy link
Contributor Author

travis failed. @earlephilhower any thoughts on what is needed here:

Installing APT Packages
35611.72s$ travis_apt_get_update
3570.45s$ sudo -E apt-get -yq --no-install-suggests --no-install-recommends $(travis_apt_get_options) install g++-7 gcc-7
358Reading package lists...
359Building dependency tree...
360Reading state information...
361E: Unable to locate package g++-7
362E: Couldn't find any package by regex 'g++-7'
363E: Unable to locate package gcc-7
apt-get.diagnostics
364apt-get install failed
469The command "sudo -E apt-get -yq --no-install-suggests --no-install-recommends $(travis_apt_get_options) install g++-7 gcc-7" failed and exited with 100 during .

@earlephilhower
Copy link
Collaborator

Looks like a transient Travis.ci problem. I've restarted the job.

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Hopefully @devyte can chime in. We can always pull in the DEBUGLOG macro from upstream changes at a later time.

@d-a-v d-a-v self-requested a review August 3, 2019 17:58
The library versions of memmove, memcpy, and memset were in flash.
Updated to use ROM functions ets_memmove, ets_memcpy, and ets_memset.
Additional note, the library version of memmove does not appear to
have been optimized. It took almost 10x longer than the ROM version.
Renamed printf macro to DBGLOG_FUNCTION and moved to umm_malloc_cfg.h.
Changed printf macro usage to use DBGLOG_FUNCTION.
@mhightower83
Copy link
Contributor Author

#6366 (comment)

  • WiFi disconnected/in promiscuous mode, no heap fragmenter - umm_malloc calls panic() because one of it's structures fails the integrity check - mask wrong @ umm_malloc.cpp:736.

@devyte @d-a-v @earlephilhower there are some diagnostic areas of the umm_malloc code that I am noticing and have concerns with:

  1. UMM_INTEGRITY_CHECK
  • doesn't use critical sections when scanning the heap and neither does the upstream version. The issue I see here is the heap entries changing by an ISR doing an alloc call, while integrity is running.
  • if/when critical sections are added this may push past the 10us interrupts disable guideline.
  • currently this requires editing files to enable, not a menu option which makes it less likely to be used. Therefore less likely to cause someone a problem.
  1. UMM_POISON (UMM_POSION_CHECK)
  • Two functions are enabled.
    1. CHECK_POISON_ALL_BLOCKS() - A full heap check of all active allocations.
      • doesn't use critical sections when scanning the heap and neither does the upstream version
      • if/when critical sections are added this may push past the 10us disabled IRQs guideline.
      • Is enabled when debug serial is selected. (I didn't realize this one was enabled here.)
    2. GET_UNPOISONED() - Individually check the current allocation presented through a realloc() or free().
      • This doesn't need a critical section, because our allocation cannot change w/o our action.
      • Is enabled when debug serial is selected.
      • I think this is a good option. It has lower overhead less likely to exceed 10us.
  1. CHECK_POISON_ALL_BLOCKS() is done before INTEGRITY_CHECK() I think it should be reversed. INTEGRITY_CHECK() verifies that links are good. I think it would be better to do that before CHECK_POISON_ALL_BLOCKS() goes confidently walking all the links checking for memory oversteps.
  2. Using the newlib printf() function may be a problem if the heap is damaged and it does a malloc to handle any printing. Also, an issue is if the error occurs from an ISR call, printf() is in flash.

At this time I currently have no measurements on the time to perform INTEGRITY_CHECK() or CHECK_POISON_ALL_BLOCKS() I guess I'll add that to my TODO list.

I am thinking that only the GET_UNPOISONED() option should be enabled when just selecting debug serial port. The CHECK_POISON_ALL_BLOCKS() should take more work to turn on. Some of these more aggressive diagnostics could start creating problems, especially when critical sections are added.

Sigh, I am never going to be though with this PR 🤕

@devyte
Copy link
Collaborator

devyte commented Aug 9, 2019

CC @rhempel for @mhightower83 's concerns

Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently suspect that realloc is fundamentally unsafe and should never be called from an ISR. Similar for free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants