Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mmu_get... and mmu_set... #8290

Merged
merged 10 commits into from
Oct 13, 2021

Conversation

mhightower83
Copy link
Contributor

@mhightower83 mhightower83 commented Aug 24, 2021

These changes are needed to address bugs that can emerge with the improved optimization from the GCC 10.3 compiler.

Updated performance inline functions mmu_get_uint8(), ... and mmu_set_uint8(), ... to comply with strict-aliasing rules.
Without this change, stale data may be referenced. This issue was revealed in discussions on #8261 (comment)

Changes to avoid over-optimization of 32-bit wide transfers from IRAM, turning into 8-bit or 16-bit transfers by the new GCC 10.3 compiler. This has been a reoccurring/tricky problem for me with the new compiler.

So far referencing the 32-bit value loaded by way of an Extended ASM R/W output register has stopped the compiler from optimizing down to an 8-bit or 16-bit transfer.

Example:

  uint32_t val;
  __builtin_memcpy(&val, v32, sizeof(uint32_t));
  asm volatile ("" :"+r"(val)); // inject 32-bit dependency
  ...

Updated example irammem.ino

  • do a simple test of compliance to strict-aliasing rules
  • For mmu_get_uint8(), added tests to evaluate if 32-bit wide transfers were converted to an 8-bit transfer.

Edited: Added missing references and clarifications.

Added 32-bit dependency injections as needed to guard against compiler
optimizing 32-bit loads from IRAM to 8-bit or 16-bit loads.
Added 32-bit dependency injections as needed to guard against compiler
optimizing 32-bit loads from IRAM to 8-bit or 16-bit loads.
…83/Arduino into pr-mmu-inline-strict-aliasing
Copy link
Collaborator

@mcspr mcspr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, besides the comments
(and it is broken as-is, so I'd rather push this sooner than later)

Comment on lines +17 to +20
/*
Verify mmu_get_uint16()'s compliance with strict-aliasing rules under
different optimizations.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already here, but would this and the rest of the file make sense as a device test file at /tests/device/test_irammem/ like there's already a umm malloc one? And given the structure of the things below with success and failures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about how the optimizations might vary across processor vendors, my thinking was this way it is tested with a matching vendor/version of the ESP8266 GCC.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic idea is to have a separate set of 'tests' that have a clearly defined success and failures, which is useful to have grouped together instead of remembering to run the example code to sanity-check certain Core implementation details. 'Device' is just the board running the test program, and... we are fighting the c++ object lifetime assumptions done by the compiler, not the machine code specifics?

Just to note, idk if it actually plays correctly with something supposed to be hw-only and not mixed with MOCK tests

libraries/esp8266/examples/irammem/irammem.ino Outdated Show resolved Hide resolved
cores/esp8266/mmu_iram.h Outdated Show resolved Hide resolved
@d-a-v d-a-v added the alpha included in alpha release label Sep 23, 2021
Corrected start of DRAM constant in mmu_is_dram().
Replaced #define(s) with const to properly limit scope. Compiler appears to
optomize it down to the same size.
In some places used ld variables and core-isa.h defines to set range checking values.
Comment on lines +40 to +46
#include <sys/config.h> // For config/core-isa.h
/*
Cautiously use XCHAL_..._VADDR values where possible.
While XCHAL_..._VADDR values in core-isa.h may define the Xtensa processor
CONFIG options, they are not always an indication of DRAM, IRAM, or ROM
size or position in the address space.
*/
Copy link
Collaborator

@mcspr mcspr Sep 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess make sense with the available definitions, naming would also simplify migration to anything else.

Looking at the RTOS, there is esp8266/eagle_soc.h declaring various IRAM_..., DRAM_..., RTC_... preprocessor tokens and funcs related to ranges. This is out-of-scope here, but I wonder if something like this is warranted for the repo as well (and since even the 3.x NONOS does not have anything of sorts)

@mcspr mcspr merged commit b7a2f44 into esp8266:master Oct 13, 2021
@mhightower83 mhightower83 deleted the pr-mmu-inline-strict-aliasing branch December 10, 2021 06:57
hasenradball pushed a commit to hasenradball/Arduino that referenced this pull request Nov 18, 2024
These changes are needed to address bugs that can emerge with the improved optimization from the GCC 10.3 compiler.

Updated performance inline functions `mmu_get_uint8()`, ... and `mmu_set_uint8()`, ...  to comply with strict-aliasing rules. 
Without this change, stale data may be referenced. This issue was revealed in discussions on esp8266#8261 (comment) 

Changes to avoid over-optimization of 32-bit wide transfers from IRAM, turning into 8-bit or 16-bit transfers by the new GCC 10.3 compiler. This has been a reoccurring/tricky problem for me with the new compiler. 

So far referencing the 32-bit value loaded by way of an Extended ASM R/W output register has stopped the compiler from optimizing down to an 8-bit or 16-bit transfer.

Example:
```cpp
  uint32_t val;
  __builtin_memcpy(&val, v32, sizeof(uint32_t));
  asm volatile ("" :"+r"(val)); // inject 32-bit dependency
  ...
```

Updated example `irammem.ino`
* do a simple test of compliance to strict-aliasing rules
* For `mmu_get_uint8()`, added tests to evaluate if 32-bit wide transfers were converted to an 8-bit transfer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alpha included in alpha release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants