Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calls made to malloc/free will enable interrupts on return #6246

Closed
6 tasks done
mhightower83 opened this issue Jul 1, 2019 · 29 comments
Closed
6 tasks done

Calls made to malloc/free will enable interrupts on return #6246

mhightower83 opened this issue Jul 1, 2019 · 29 comments

Comments

@mhightower83
Copy link
Contributor

mhightower83 commented Jul 1, 2019

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: ESP-12
  • Core Version: 5a47cab
  • Development Env: Arduino IDE
  • Operating System: Ubuntu

Settings in IDE

  • Module: Adafruit Feather HUZZAH ESP8266
  • Flash Mode: qio
  • Flash Size: 4MB
  • lwip Variant: v2 Lower Memory
  • Reset Method: nodemcu
  • Flash Frequency: 40Mhz
  • CPU Frequency: 80Mhz
  • Upload Using: SERIAL
  • Upload Speed: 115200 (serial upload only)

Problem Description

To simulate an IRQ handler calling malloc I used xt_rsil() to set INTLEVEL.
when malloc/free are called with an elevated INTLEVEL, INTLEVEL is set to 0 on return.

I noticed that ets_intr_lock() and ets_intr_unlock() are being used to handle critical section for umm_malloc. ets_intr_unlock() does not restore the previous INTLEVEL; it does an rsil a2, 0.
Ref: #2218 (comment) Thus causing the immediate problem.

MCVE Sketch

#include <ets_sys.h>
#include <interrupts.h>

#ifndef xt_rsr_ps
#define xt_rsr_ps()  (__extension__({uint32_t state2; __asm__ __volatile__("rsr.ps %0" : "=a" (state2)); state2;}))
#endif

/* 
 * A printf routine that can safely print with interrupts disabled
 * and does no "malloc lib" calls. 
 * 
 * The default printf will do a malloc call, spoiling the test.
 */
 
// ROM _putc1, ignores CRs and sends CR/LF for LF, newline.
// Always returns character sent.
extern "C" int constexpr (*_putc1)(int) = (int (*)(int))0x40001dcc;

extern "C" void uart_buff_switch(uint8_t);

int _sz_printf_P(const size_t buf_len, const char *fmt, ...) __attribute__((format(printf, 2, 3)));
int _sz_printf_P(const size_t buf_len, const char *fmt, ...) {
  
#ifdef DEBUG_ESP_PORT
#define VALUE(x) __STRINGIFY(x)
  // Preprocessor and compiler together will optimize away the if.
  if (strcmp("Serial1", VALUE(DEBUG_ESP_PORT)) == 0) {
    uart_buff_switch(1U);
  } else {
    uart_buff_switch(0U);
  }
#else
  uart_buff_switch(0U); // Side effect, clears RX FIFO
#endif

  char __aligned(4) ram_buf[buf_len];
  ets_memcpy(ram_buf, fmt, buf_len);
  va_list argPtr;
  va_start(argPtr, fmt);
  int result = ets_vprintf(_putc1, ram_buf, argPtr);
  va_end(argPtr);
  return result;
}

#define printf(fmt, ...) _sz_printf_P((sizeof(fmt) + 3) & ~0x03U, PSTR(fmt), ##__VA_ARGS__)



void setup() {
  Serial.begin(115200);
  delay(20);
  uint32_t beforePS, afterPS;

  printf("\n\n\nShow what happens to an elevated INTLEVEL after a \"malloc\" and \"free\"?\n");
  
  printf("\n  Preset INTLEVEL with \"xt_rsil(3);\"\n");
  xt_rsil(3);
  // Confirm that printf() is not spoiling our test.
  //   * A pass while the other tests fail is confirming.
  //   * A pass while the other tests pass tells nothing.
  beforePS = xt_rsr_ps();
  printf("  %s\n", "Indirectly confirm that printf() is not calling malloc and spoiling our test?");
  afterPS = xt_rsr_ps();
  printf("  PS register before/after \"printf();\":  0x%04X/0x%04X, %s\n", beforePS, afterPS, (beforePS ==  afterPS) ? "Pass" : "Fail!");

  printf("\n  Preset INTLEVEL with \"xt_rsil(4);\"\n");
  xt_rsil(4);
  beforePS = xt_rsr_ps();
  char *pc = (char *) malloc(3);
  afterPS = xt_rsr_ps();
  printf("  PS register before/after \"malloc(3);\": 0x%04X/0x%04X, %s\n", beforePS, afterPS, (beforePS ==  afterPS) ? "Pass" : "Fail!");

  *pc = '1'; // w/o this, malloc/free appears to be optimized away.
  
  printf("\n  Preset INTLEVEL with \"xt_rsil(5);\"\n");
  xt_rsil(5);
  beforePS = xt_rsr_ps();
  free(pc);
  afterPS = xt_rsr_ps();
  printf("  PS register before/after \"free(pc);\":  0x%04X/0x%04X, %s\n", beforePS, afterPS, (beforePS ==  afterPS) ? "Pass" : "Fail!");
  
  xt_rsil(0);
  printf("\n\n");
}

void loop() {
  // Something to show we did not hang
  static size_t count = 0;
  delay(1000);
  printf(".");
  if (((++count) % 72) == 0)
    printf("\n");
}

Debug Messages

Show what happens to an elevated INTLEVEL after a "malloc" and "free"?

  Preset INTLEVEL with "xt_rsil(3);"
  Indirectly confirm that printf() is not calling malloc and spoiling our test?
  PS register before/after "printf();":  0x0023/0x0023, Pass

  Preset INTLEVEL with "xt_rsil(4);"
  PS register before/after "malloc(3);": 0x0024/0x0020, Fail!

  Preset INTLEVEL with "xt_rsil(5);"
  PS register before/after "free(pc);":  0x0025/0x0020, Fail!

Other Thoughts and Observations

The comments in the code indicate that the locking macros must be allowed to nest. In regards to the comment about _umm_malloc() having a call to _umm_free() I think the code may have been changed after that was written; however, there are calls to _umm_malloc() and _umm_free() from _umm_realloc() that would require nest support.

/*
* A couple of macros to make it easier to protect the memory allocator
* in a multitasking system. You should set these macros up to use whatever
* your system uses for this purpose. You can disable interrupts entirely, or
* just disable task switching - it's up to you
*
* NOTE WELL that these macros MUST be allowed to nest, because umm_free() is
* called from within umm_malloc()
*/
#define UMM_CRITICAL_ENTRY() ets_intr_lock()
#define UMM_CRITICAL_EXIT() ets_intr_unlock()

I have explored using XTOS_SET_MIN_INTLEVEL and XTOS_RESTORE_INTLEVEL from
ESP8266_RTOS_SDK/components/esp8266/include/xtensa/xtruntime.h for the UMM_CRITICAL_ macros. It has been working well for me. And there are comments in xtruntime.h about how it works, something I could never find for ets_intr_lock(). I like that you can specify an INTLEVEL to XTOS_SET_MIN_INTLEVEL and if the current level is higher it will keep it, instead of demoting it, as ets_intr_lock did.

An interesting side benefit from the broken nesting with the ets_intr_lock was that in one section of _umm_realloc() , after a _umm_malloc() call, interrupts were enabled before memcpy() was called. Curiously - this looks like it was safe. All variables referenced at this point were on the stack. I say benefit in the sense that a large block of data could be copied without exceeding the 10us max time for interrupts off.

/*
* Now _umm_malloc() a new/ one, copy the old data to the new block, and
* free up the old block, but only if the malloc was sucessful!
*/
if( (ptr = _umm_malloc( size )) ) {
memcpy( ptr, oldptr, curSize );
_umm_free( oldptr );
}
}
/* Release the critical section... */
UMM_CRITICAL_EXIT();

I have explored the idea of lightly refactoring _umm_realloc() to support locks off during copy/memmove to reduce the over 10us burdon. Of course, when called from an IRQ routine it has no effect, interrupts are off for the whole process. It seems to be working; however, I may not have a sketch that is demanding enough.

@devyte
Copy link
Collaborator

devyte commented Jul 5, 2019

Thanks for the analysis. The core currently doesn't support masking a specific interrupt level. That doesn't mean it can't be implemented, only that at this time locking is implemented as all or nothing.

In any case, your investigation hints at a deeper underlying issue with very wide scope.

Dev Notes:

use case as follows:
disable interrupts
do stuff
malloc //ints get re-enabled
do more stuff //this code is not protected as is expected
enable interrupts //does nothing because ints were enabled above

Considerations:
interrupt lock must nest (stack?) x16 (all int levels)
umm must not be interruptible, ever
InterruptLock changes in interrupt. h
replace ets_enable|disable with something else for umm

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 5, 2019

Your sketch passes all three tests with:

#define UMM_CRITICAL_ENTRY() esp8266::InterruptLock lockMeHere
#define UMM_CRITICAL_EXIT()  do { (void)0; } while (0)

It's not simple as that because of definitions in Arduino.h included from everywhere, so I ended up copying the class into umm for testing because some include-and-definition-location rework is required - I can try to make a PR to prepare for that change if you need so)

Thanks

@mhightower83
Copy link
Contributor Author

@d-a-v, thanks, but I am good. I have already worked through this issue for my needs. I saw this issue in the code while exploring how UMM_POISON worked. I ended out looking at how umm_malloc.cpp worked for a while and made a few other changes. For example, I monitored the min/max timings for the locks in, info/malloc/free/realloc when running my project. Now that I have my project working, I am making my way through different things I have changed and identifying the ones that are problems, that need to be reported/fixed. Writing does not come easy for me. It takes me quite a few drafts before hitting send.

Thanks to everyone for all that you do here. I don't know how you find the time to do so much.

@devyte
Copy link
Collaborator

devyte commented Jul 6, 2019

@mhightower83 your writing was good enough and your analysis detailed enough to lead my thinking right to an unknown problem that could explain a lot of odd crashes. Please uphold your practice of reporting in that way!

@devyte
Copy link
Collaborator

devyte commented Jul 6, 2019

@d-a-v os there anywhere in the umm code where:
a) more than one critical section exists in the same scope? I'm thinking of conflict due to multiple declarations of interruptLock
b) a critical section is entered in one scope but exited in another? I'm thinking of class destruction in the wrong place. Because these are defines, it should be possible to test this by adding a { before the entry and a } after the exit, which would force destruction on exit, but would fail to build if entry and exit are in different scopes.

@earlephilhower
Copy link
Collaborator

UMM's very simple, basically a single file of straight C. There are no object delete/creates, so (b) should not be an issue.

There is only 1 UMM_CRITICAL_ENTRY per function, so that's fine.

There is nested calls (realloc sets CRITICAL but then can call free() which also sets CRITICAL). However that's not a problem since only the first one will do anything.

@devyte
Copy link
Collaborator

devyte commented Jul 6, 2019

@earlephilhower the nested case you mention seems to me like a variation of the use case I described above. I took a quick look at umm, and realloc is the worst case: it calls free and malloc from inside the critical section:

realloc()
  some code
  ENTRY (ints locked)
  some code
  jump to malloc()
    some code
    ENTRY (does nothing)
    some code
    EXIT (reenables ints)
    some code (point A)
    return from malloc()
  some code (point B)
  EXIT (does nothing)
  some code
  return from realloc

The code from point A to point B inclusive won't be protected as is expected, unless we implement nested locks.

@devyte
Copy link
Collaborator

devyte commented Jul 6, 2019

Thinking further, nested locks are actually already supported via multiple instances of InterruptLock, where each instance represents one level of nesting (the instance saves the previous state, then restores it on destruction).
I think the scopes are ok, which means that @d-a-v 's solution is viable as proposed. Good thing we have umm building as C++ :)

However, @d-a-v 's solution only applies to code under our control. Is there any way to know if the ets_intr_blah() functions are called from anywhere else, e.g.: the closed libs?

@earlephilhower
Copy link
Collaborator

You can get a list of imports (undefined externals) easy enough for the .as we link in with nm, but realistically I think there's not much we can do about them.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 8, 2019

@devyte
We can extract ets_intr_*lock() from firmware as we currently are doing for some other symbols.
We can then re-implement them properly.

@mhightower83
You found a nasty bug, and when there's a fixable one we usually try to address it asap.
Will you propose a fix soon enough ?
(that, because your finding, your credited commit)

This ets_intr_*lock() bug should be mentioned in the nonos-sdk repository.

@d-a-v d-a-v added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Jul 8, 2019
@mhightower83
Copy link
Contributor Author

@d-a-v I did not go into it on issue description; however, I placed wrappers around ets_intr_lock/unlock thinking I would replace it with something that worked better handled nested calling and preserved INTLEVEL. I did that and things didn't make sense. I took a step back and just built a monitor wrapper around ets_intr_lock/unlock monitoring lock nesting depth and some lock time durations, max, INTLEVEL, etc. Things got weird. I found that lock depths go negative and then return to 0! I have come to the conclusion that ets_intr_lock/unlock may be doing exactly what Espressif wants for the blob, but not what we need or thought it does.

This is just a hypothetical. I have no idea if anything like this happens in the blob:

  • blob routine is called from any INTLEVEL.
  • blob routine calls ets_intr_lock, which sets INTLEVEL to 3 and saves the previous INTLEVEL
    • Note, ets_intr_unlock never uses the saved INTLEVEL, just does a rsil(0)
  • blob does a bunch of stuff and it has reached a place where it is safe to be for reentered/interrupted.
  • Then blob sees a block of memory that needs to be copied.
  • blob does not want to keep interrupts off for more than 10us.
  • blob has two choices copy now or queue it.
  • By testing the saved INTLEVEL, blob detects that interrupts were enabled when it was called.
    • The blob does ets_intr_unlock, copy the block of data and then ets_intr_lock, then resume checking for the next block of data.
  • If interrupts were disabled when blob was called, blob puts the block of data on queue for later processing.

As best I can tell ets_intr_lock/unlock is not documented. It is incidentally used in an example by Espressif and we see the function name in the ROM linker table. I think based on their names, we have made assumptions about what they are for and how they are to be used. Which is all we have to go on when documentation is weak.

IMHO I would not try to fix ets_intr_lock/unlock, it may not be broken; however, I would not use it for any purpose in my work.

For this lenght, this message was a bit quick for me. Not sure all my thoughts made it into words. I hope this is clear.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 8, 2019

Again, thanks for these details and your findings.

After discussion with @earlephilhower and @devyte, we plan to remove exposure of ets_intr_*lock() from ROM, and add a fixed one instead. So any code using them even in nested block (from external libraries not under our control) will get something logically working.

To manage nested block, implementation could rely on an internal stack, or even better on @earlephilhower 's idea just a counter and the initial saved state that would be restored on the last ets_intr_unlock() call (counter back to 0). This is still under consideration, given the fact that closed source (blobs) use them too.

I found that lock depths go negative and then return to 0!

Do you mean that there can be more unlock() than lock(), then afterwards some lock()s to return to 0 ?

@d-a-v d-a-v removed the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Jul 8, 2019
@mhightower83
Copy link
Contributor Author

@d-a-v

Do you mean that there can be more unlock() than lock(), then afterwards some lock()s to return to 0 ?

Yes, the depth range I saw was -1 to 2. That is why I think ets_intr_lock()/ets_intr_unlock() should be left alone just for the blob to use. I don't think they are using it for nesting locks in any way or form. Just an interrupt disable/enable method. And, maybe look back to see if they were called with interrupts enabled.

BTW: I went through variations on the counter thing, saved/restored INTLEVEL on an array as the depth stepped up and down. The down is where things got weird and went negative. (my depth variable was originally unsigned :P)
That's when I switched to using linker wrappers on just ets_intr_lock()/ets_intr_unlock() to monitoring the system behavior. I also rebuilt the lwIP so it wasn't using ets_...; as well as got umm_malloc off of ets_intr_lock. So that I was mostly seeing the blobs access.

My conclusion was to just stay away from ets_intr_lock()/ets_intr_unlock(), far far away. Let the blob have it.

@mhightower83
Copy link
Contributor Author

@d-a-v
Will you propose a fix soon enough ?
(that, because your finding, your credited commit)

Oh, I wasn't sure how this was going to flow.
Let me do a fast forward and give you a list of changes I have in umm_malloc.
Let me know what you would like to see in the PR?

Changes:

  Correct critical section with interrupt level preserving and nest support
  alternative. Replace ets_intr_lock()/ets_intr_unlock() with
  uint32_t oldValue=xt_rsil(3)/xt_wrs(oldValue)

  Replace printf with something that is ROM or IRAM based so that a printf
  that occurs during an ISR malloc/new does not cause a crash. To avoid a
  reentry issue it should also avoid doing malloc lib calls

  Refactor realloc to avoid memcpy/memmove while in critical section. This is
  only effective when realloc is called with interrupts enabled. The copy
  process alone can take over 10us (when copying more than ~498 bytes with a
  80MHz CPU clock). It would be good practice for an ISR to avoid realloc.
  Note, while doing this might initially sound scary, this appears to be very stable. It ran
  on my troublesome sketch for over 3 weeks until I got back from vacation and 
  flashed an update. Troublesome sketch - runs ESPAsyncTCP, with modified fauxmo
  emulation for 10 devices. It receives lost of Network traffic related to uPnP scans,
  which includes lots of TCP connects disconnects RSTs related to uPnP discovery. 

  I have clocked umm_info critical lock time taking as much as 180us. A common
  use for the umm_info call is to get the free heap result. It is common
  to try and closely monitor free heap as a method to detect memory leaks.
  This may result in frequent calls to umm_info. There has not been a clear
  test case that shows an issue yet; however, I and others think they are or
  have had crashes related to this.

  I have added code that adjusts the running free heap number from _umm_malloc,
  _umm_realloc, and _umm_free. Removing the need to do a long interrupts
  disabled calculation via _umm_info.

  Build optional, min/max time measurements for locks held while in info,
  malloc, realloc, and free. Also, maintain a count of how many times each is
  called with INTLEVEL set.

@mhightower83
Copy link
Contributor Author

mhightower83 commented Jul 8, 2019

Some of my forgotten thoughts and concerns on the method used for the critical section:

  1. The older Non-OS SDK Manual indicated that interrupts should not be off for longer than 10us.

  2. There is a lot of math that takes place in umm_malloc.cpp. Not all of it needs to be caught up in a critical section.

  3. In my sketch, I have seen _umm_malloc()'s critical section, take up to 10us and a fraction.

  4. I have seen _umm_info()'s critical section take up to 180us.

    • I don't see much that can be done with this, other than minimize its use.
    • eg. create an alternative for getting free heap. Like maintaining a total heap value adjusted by each alloc/free.
  5. _umm_realloc() can involve calls to malloc, free, memcpy, or/and memmove. I don't have timing info on this, other than a memcpy size near 500 bytes takes ~10us.

    • I think ISR should avoid this function.
    • To reduce the interrupts off period, when (INTLEVEL== 0), I think realloc can be broken down into smaller "critical"/"not critical" sections (memcpy/memmove). Much like the client could have done a new malloc, copy old to new, free old.
  6. I really like the InterruptLock class and its variations; however, I am concerned that using it for UMM_CRITICAL_ENTRY() may capture too large of scope due to the current code structure and walking the free heap can take a little time. And walking the whole heap appears to take a lot of time.

@earlephilhower
Copy link
Collaborator

umm_info is really only for debugging and I don't think I've ever seen an app use it. It was busted, in fact, until 1-2 weeks ago when I fixed it after seeing the device test fail. It would crash the system, so it's not like it was failing silently and people not noticing.

Your suggestion of moving the memmove out of the critical section seems like a simple way to get a much shorter period of IRQs disabled! (I'd go further and say that if you're doing malloc/free/realloc/new/delete inside an ISR, you're doing something very wrong, but most users here aren't professional or shipping critical embedded systems.)

FWIW, we've not seen the time spent with IRQs disabled in UMM as a source of problems. The bug of re-enabling IRQs even when they were disabled, is of course something that needs fixing. But the actual UMM operations themselves seem pretty solid as-is.

@mhightower83
Copy link
Contributor Author

umm_info is really only for debugging and I don't think I've ever seen an app use it. It was busted, in fact, until 1-2 weeks ago when I fixed it after seeing the device test fail. It would crash the system, so it's not like it was failing silently and people not noticing.

Yes, I think that problem only surfaced when you used the print option. You fixed it before I could report it :). For me, the problem showed up, when I tried to see how a UMM_POISON failure look like.

  • And that I am afraid of is yet another issue within an issue, aka hardware WDT when INTLEVEL!=0.

However, when you use ESP.getFreeHeap() and it trickles down to umm_free_heap_size( ), umm_info() is called.

size_t ICACHE_FLASH_ATTR umm_free_heap_size( void ) {
umm_info(NULL, 0);
return (size_t)ummHeapInfo.freeBlocks * sizeof(umm_block);
}

And for some, a lot of calls are made to ESP.getFreeHeap() to track leaks.

Your suggestion of moving the memmove out of the critical section seems like a simple way to get a much shorter period of IRQs disabled! (I'd go further and say that if you're doing malloc/free/realloc/new/delete inside an ISR, you're doing something very wrong, but most users here aren't professional or shipping critical embedded systems.)

FWIW, we've not seen the time spent with IRQs disabled in UMM as a source of problems. The bug of re-enabling IRQs even when they were disabled, is of course something that needs fixing. But the actual UMM operations themselves seem pretty solid as-is.

By fixing the critical section operation, the IRQs disabled time may increase, as memcpy will then run behind a critical section.

@earlephilhower
Copy link
Collaborator

This is really opening a can of worms, but it seems to me there's no reason umm_free_heap_size should not be O(1). Can't umm simply keep a running total of allocated blocks? It knows the size of any allocation or deallocation, no?

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 9, 2019

I remember it goes through the whole linked list every time umm_info() is called.
There is currently no cache (I saw that when I added fragmentation info - which increases time).

edit: nice !

I have added code that adjusts the running free heap number from _umm_malloc,
_umm_realloc, and _umm_free. Removing the need to do a long interrupts
disabled calculation via _umm_info.

side note: If we improve umm, maybe we could do it with minimum changes in file structure so we can also propose them upstream.

@mhightower83
Copy link
Contributor Author

Sorry, I am sometimes too suttle. I have changes for that. I'll do a pull request as sneak peak at what I have. When I get back home. (About an hour) And we can reduce it from there.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 9, 2019

Thanks for all details. To summarize my understanding:

  • we cannot modify the ROM anyway
  • we should leave ets_intr_*lock as-is for blobs (closed source firmware) (at least for now)
  • we must track and replace them in sources, including lwIP (v1, v2) with a new version based on a counter, with a C interface (lwIP...)
  • we can improve umm with a cache for freeHeap (maybe fragmentation too)
  • 👍 for mem* () operations outside from critical sections when possible

side notes:

@mhightower83
Copy link
Contributor Author

mhightower83 commented Jul 9, 2019

  • would it be interesting to put some parts of umm in IRAM for speed

Actually, umm_malloc.cpp gets special treatment by the linker to be in IRAM. Only the parts explicitly tagged with ICACHE_FLASH_ATTR are in flash. It accidentally went into flash for a short while, when it was renamed from .c to .cpp.

I'll have to take another look at that. I remember following that part of the code trying to understand how the error was handled. And looked up how it was supposed to be handled. I must have had one of my moments of confusion. [Edited] I thought it worked.

@mhightower83
Copy link
Contributor Author

@d-a-v before I forget again, thank you for the summary acknowledgment that was reassuring.

  • we can improve umm with a cache for freeHeap (maybe fragmentation too)

For "maybe fragmentation too" I went back to some of my earlier edits of umm_malloc.cpp to try and remember some details.

My concern with the fragmentation calculation involves the multiplies and I was concerned that this could add significantly to the run time to do a malloc or free; however, I never explored if that was true.

On the other hand, unless I am in error, I think the existing calculation for freeSize2 could be improved by moving the summing of the square of sizeof(umm_block) outside the while loop. Then freeEntries * square of sizeof(umm_block) would give you the final adjustment to add-in.

Also, there was something about removing DBG_LOG_FORCE with forced printing - this improved something. I think it was mainly size (I did try this in IRAM for a while), maybe a little speed, since it removes an if from the loop. But, then I don't know if this is a valued feature. Maybe support it sa a build option. Sorry, recollection is weak on this one.

Hmm, I am starting to realize that there isn't much I have touched on in this module. I get distracted and go in way too many directions when I work on a problem. I guess I have a little ADHD.

I think this is something different, I wrote I was thinking of this change you made: d-a-v@0b68126

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 10, 2019

Then freeEntries * square of sizeof(umm_block) would give you the final adjustment to add-in.

The metric is the Euclidean/SRSS/L² norm on free space chunks (free chunk = contiguous free blocks): sqrt(sum(on all free chunks)(chunk-size-in-bytes^2)) (opposed to sum(on all free chunks)(chunk-size-in-bytes), they are equals when there is only one chunk of free blocks, fragmentation = 0).
For a cache I think we need to keep track of the square of chunk size.

According to @earlephilhower we don't have HW multiplier for 32bits*32bits=>64bits but I think we have it for 32bits*32bits=>32bits and that's the one used because the result is stored into a 32bits int. This may have to be checked. And even in that case I don't know the cost. I think however this calculation could only be enabled in debug mode.

about removing DBG_LOG_FORCE with forced printing - this improved something

Only in debug mode no ?
Because the if (force) test is not compiled while in normal mode.

upstream umm has an improved realloc

I think this is something different,

Yes, it's the logic from upstream that has been improved by its author I think, but I haven't looked into details. I was simply mentioning it because sources are about to diverge.
The change made is only a bugfix (independently applied upstream too).

@mhightower83
Copy link
Contributor Author

Only in debug mode no ?
Because the if (force) test is not compiled while in normal mode.

Actually, the define for DBG_LOG_FORCE is always there. There are currently no conditional for that debug macro.

#if DBG_LOG_LEVEL >= 1
# define DBG_LOG_INFO( format, ... ) printf( format, ## __VA_ARGS__ )
#else
# define DBG_LOG_INFO( format, ... )
#endif
#define DBG_LOG_FORCE( force, format, ... ) {if(force) {printf( format, ## __VA_ARGS__ );}}

The metric is the Euclidean/SRSS/L² norm on free space chunks (free chunk = contiguous free blocks): sqrt(sum(on all free chunks)(chunk-size-in-bytes^2)) (opposed to sum(on all free chunks)(chunk-size-in-bytes), they are equals when there is only one chunk of free blocks, fragmentation = 0).
For a cache I think we need to keep track of the square of chunk size.

Sorry I got my factorization wrong. I believe there is a square constant that can be factored out and multiplied back at the end of the sum. Reducing the total number of multiplies within the while loop.

Maybe this will illustrate my thinking:

This is what we effectively have now:

   unsigned long constexpr k = sizeof(umm_block)^2

   freeSize2 =   blocks1^2 * k  +  blocks2^2 * k  +  blocks3^2 * k  + ...

Factor out k and we have:

   freeSize2 = ( blocks1^2      +  blocks2^2      +  blocks3^2      + ... ) * k

And reduce to something equivalent to

   freeSize2 = ( blocks1^2      +  blocks2^2      +  blocks3^2      + ... )
   freeSize2 *= k;

@ChocolateFrogsNuts
Copy link
Contributor

Thanks for all details. To summarize my understanding:
* we cannot modify the ROM anyway
* we should leave ets_intr_*lock as-is for blobs (closed source firmware) (at least for now)
* we must track and replace them in sources, including lwIP (v1, v2) with a new version based on a counter, with a C interface (lwIP...)

I just had a look at the esp8266 boot rom disassembly by @trebisky and a read of the Xtensa instruction set reference manual....

ets_intr_lock() actually returns the old interrupt level - by accident or design I'm not sure, but it's baked in rom now... Looking at the disassembly:

ets_intr_lock:
40000f74:	206300        rsil   a2, 3
40000f77:	31feff          l32r  a3, [0x3fffdcc0]	; 0x40000f70
40000f7a:	2903      	  s32i.n	a2, a3, 0
40000f7c: 0df0            ret.n

rsil reads the register into a2, then updates it with '3'.
the l32r / s32i store the old interrupt level - for no apparent reason.
then the ret... now at this point, a2 still contains the old interrupt level register... which is handy because a2 is also the place where the compiler expects int values to be returned from functions.

  int ets_intr_lock();
  int fl1=ets_intr_lock();
  int fl2=ets_intr_lock();
  ets_intr_unlock();
  int fl3=ets_intr_lock();
  ets_intr_unlock();
  Serial.printf("fl1=%02x, fl2=%02x  fl3=%02x\n", fl1, fl2, fl3);

proves it works, and

  int saved = ets_intr_lock();
  // do work
  if ((saved & 0x0F)==0) ets_intr_unlock();

will restore the register as appropriate. note the (saved & 0x0F) is necessary as the whole register is returned and only the low 4 bits contain the interrupt level.

We could certainly do with some better locking mechanisms, and applying this knowledge to my copy of umm_malloc and lwIP doesn't seem to have made any difference, but hey, there it is :-)

@devyte
Copy link
Collaborator

devyte commented Aug 10, 2019

The ets_intr_blah() functions have one major flaw: the intr reg stored/returned is never restored, but instead the unlock re-enables all interrupts. That means e. g. you can't have nested critical sections, such as is currently the case in umm realloc, because the code part between the inner unlock and the outer unlock won't be protected.
I have an upstream PR to unnest umm realloc. There is currently a PR here with proposed umm improvements by @mhightower83. There is work planned to build a call tree of functions in the ROM that eventually call ets_intr_blah(), and replace their linkage with fixed versions compiled into our core (this implies possibly fixing the sdk closed source libs as well).

I don't know if any of the above is the cause of what you're seeing, but any help is appreciated, either confirming, implementing, finding some other cause of instability, etc.

@ChocolateFrogsNuts
Copy link
Contributor

Agreed that not being able to select the interrupt level is a flaw, and there needs to be the option to use that if required.
My point being that we assumed ets_intr_lock did not return a value because the piece of example code did not use it. In reality it returns the old interrupt level, which can be used to enable interrupts only when required, as in my example. If Espressif have used it in this way, their code is not necessarily broken :-)
Looking at the calling code in the blob is the only way to be sure.

@devyte
Copy link
Collaborator

devyte commented Aug 19, 2019

#6274 is merged, the original issue here is addressed. For whatever is pending, let's discuss in a new issue.

@devyte devyte closed this as completed Aug 19, 2019
@devyte devyte self-assigned this Aug 19, 2019
@devyte devyte added this to the 2.6.0 milestone Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants