-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update mmu_get... and mmu_set... #8290
Update mmu_get... and mmu_set... #8290
Conversation
Added 32-bit dependency injections as needed to guard against compiler optimizing 32-bit loads from IRAM to 8-bit or 16-bit loads.
Added 32-bit dependency injections as needed to guard against compiler optimizing 32-bit loads from IRAM to 8-bit or 16-bit loads.
…83/Arduino into pr-mmu-inline-strict-aliasing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, besides the comments
(and it is broken as-is, so I'd rather push this sooner than later)
/* | ||
Verify mmu_get_uint16()'s compliance with strict-aliasing rules under | ||
different optimizations. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already here, but would this and the rest of the file make sense as a device test file at /tests/device/test_irammem/
like there's already a umm malloc one? And given the structure of the things below with success and failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about how the optimizations might vary across processor vendors, my thinking was this way it is tested with a matching vendor/version of the ESP8266 GCC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic idea is to have a separate set of 'tests' that have a clearly defined success and failures, which is useful to have grouped together instead of remembering to run the example code to sanity-check certain Core implementation details. 'Device' is just the board running the test program, and... we are fighting the c++ object lifetime assumptions done by the compiler, not the machine code specifics?
Just to note, idk if it actually plays correctly with something supposed to be hw-only and not mixed with MOCK tests
Corrected start of DRAM constant in mmu_is_dram(). Replaced #define(s) with const to properly limit scope. Compiler appears to optomize it down to the same size. In some places used ld variables and core-isa.h defines to set range checking values.
#include <sys/config.h> // For config/core-isa.h | ||
/* | ||
Cautiously use XCHAL_..._VADDR values where possible. | ||
While XCHAL_..._VADDR values in core-isa.h may define the Xtensa processor | ||
CONFIG options, they are not always an indication of DRAM, IRAM, or ROM | ||
size or position in the address space. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess make sense with the available definitions, naming would also simplify migration to anything else.
Looking at the RTOS, there is esp8266/eagle_soc.h declaring various IRAM_..., DRAM_..., RTC_... preprocessor tokens and funcs related to ranges. This is out-of-scope here, but I wonder if something like this is warranted for the repo as well (and since even the 3.x NONOS does not have anything of sorts)
These changes are needed to address bugs that can emerge with the improved optimization from the GCC 10.3 compiler. Updated performance inline functions `mmu_get_uint8()`, ... and `mmu_set_uint8()`, ... to comply with strict-aliasing rules. Without this change, stale data may be referenced. This issue was revealed in discussions on esp8266#8261 (comment) Changes to avoid over-optimization of 32-bit wide transfers from IRAM, turning into 8-bit or 16-bit transfers by the new GCC 10.3 compiler. This has been a reoccurring/tricky problem for me with the new compiler. So far referencing the 32-bit value loaded by way of an Extended ASM R/W output register has stopped the compiler from optimizing down to an 8-bit or 16-bit transfer. Example: ```cpp uint32_t val; __builtin_memcpy(&val, v32, sizeof(uint32_t)); asm volatile ("" :"+r"(val)); // inject 32-bit dependency ... ``` Updated example `irammem.ino` * do a simple test of compliance to strict-aliasing rules * For `mmu_get_uint8()`, added tests to evaluate if 32-bit wide transfers were converted to an 8-bit transfer.
These changes are needed to address bugs that can emerge with the improved optimization from the GCC 10.3 compiler.
Updated performance inline functions
mmu_get_uint8()
, ... andmmu_set_uint8()
, ... to comply with strict-aliasing rules.Without this change, stale data may be referenced. This issue was revealed in discussions on #8261 (comment)
Changes to avoid over-optimization of 32-bit wide transfers from IRAM, turning into 8-bit or 16-bit transfers by the new GCC 10.3 compiler. This has been a reoccurring/tricky problem for me with the new compiler.
So far referencing the 32-bit value loaded by way of an Extended ASM R/W output register has stopped the compiler from optimizing down to an 8-bit or 16-bit transfer.
Example:
Updated example
irammem.ino
mmu_get_uint8()
, added tests to evaluate if 32-bit wide transfers were converted to an 8-bit transfer.Edited: Added missing references and clarifications.