sys/atomic_utils: Functions for atomic access #14331

maribu · 2020-06-22T20:34:58Z

Contribution description

This PR provides a set of functions to perform atomic accesses. These functions closely resemble the C11 atomic library functions, but are implemented more efficiently:

Lock-free atomic read-modify-write operations are usually slower than just disabling IRQs, even on big fat STM32F7 CPUs. Still, the compiler will assume lock-free implementations to always be the fastest option. This assumption does not hold on bare metal embedded targets.
If no lock-free implementation is used, the compiler will emit a library call. Just disabling IRQs inline is more efficient (in terms of CPU cycles and ROM)
In special cases (e.g. only one thread is writing to a variable), only the write part of the read-modify-write operation needs to be atomically. This can be implemented cheaper than full atomic operations and is provided here.
- This is useful to fix sys/tsrb is not thread safe #9882
Under certain conditions (e.g. when IRQs are disabled), mixing atomic accesses with non-atomic accesses is possible and faster. This library allows for this.
Atomic operations for setting and clearing bits are provided
- Unlike atomic_fetch_or(&mask, 1 << bit) or atomic_fetch_and(&mask, ~(1 << bit)) this can be implemented lock-free and efficiently with bit-banding on Cortex M3 / M4 MCUs

Testing procedure

Note: This currently only works for CortexM MCUs.

Performance Check

Run the provided benchmark in tests/bench_atomic_util
It contains a README.md. If the given documentation is not sufficient, I really should fix it there rather than here

Arithmetic Correctness Check

Run the added unit tests: BOARD=<SOME_CORTEX_M_BOARD> make -C tests/unittests/ tests-atomic_utils flash test
- This will be most interesting for Cortex M3, M4, and M4F boards

Testing "Atomic-ness"

Run the test in tests/sys_atomic_utils
- You can use make test, but you might want to manually test as well
- The README.md and the in-application documentation should be sufficient. If not, I really should fix it there rather than here.

Issues/PRs references

Useful for #9882

kaspar030 · 2020-06-22T21:18:21Z

Nice one!

Should we already add atomic single bit functions? atomic_set_bit_u32(uint32_t *var, bit) etc.
We can then easily hook them up to use bit banding if a platform supports it.

maribu · 2020-06-24T07:42:26Z

We can then easily hook them up to use bit banding if a platform supports it.

Great idea. Do you have any good documentation for this? E.g. the few examples for using this I found that didn't use the bit-banding periph region did never allocate memory in the SRAM bit-banding region, but just created a wild pointer pointing somewhere in there. That might not be the best idea...

I think there should be some __attribute__((region("bit-banding"))) magic or similar to make sure a variable actually gets placed in there. But I didn't find anything the like. If such exists, the API should also provide an attribute for bitmasks, which by default is an empty preprocessor macro.

The alternative would be to check within every atomic_set_bit_u<WIDTH>() function if the address of the target mask is within the SRAM bit banding region and else fall back to atomic_fetch_or(). But I much rather add a precondition to the function that bitmask are allocated with the magic keyword, so that users get predictable and consistent performance.

maribu · 2020-06-24T08:22:54Z

I think it is __attribute__((section(''.srambb''))) :-) Let me check if this works without touching the linker scripts

maribu · 2020-06-24T19:40:32Z

@kaspar030: I implemented the atomic set_bit / clear_bit functions as suggested using bit banding. Thanks for pointing me to this.

I added some unit tests for arithmetic correctness of the atomic functions; which directly has proven to be worth the effort.

maribu · 2020-06-27T22:16:56Z

I added one more test, which currently is only applicable to the atomic_fetch_<OP>_u<BITS>() family of functions. It also contains a broken implementation volatile_fetch_<OP>_u<BITS>() for reference, which can also be tested. The test seems to perform quite well to detect issues with the volatile_fetch_<OP>_u<BITS>() functions and does not detect any issues for the atomic_*() family of functions :-)

I still need to extend the test to also allow checking the atomic_set_bit_u<BITS>() and atomic_clear_bit_u<BITS>() functions. Also, some documentation will be needed for the test to be actually useful.

maribu · 2020-09-23T13:34:13Z

This is now ready for review.

kaspar030 · 2020-11-05T10:49:53Z

re-ACK. I gave the PR a last complete look. Ran benchmarks, test and unittests on nrf52840dk. No issues.

Let's give murdock another run with tests in samr21-xpro and esp32.

tests/unittests/tests-atomic_utils/tests-atomic_utils.c

fjmolinas · 2020-11-10T09:17:17Z

All tests seem to be ok except `tests/pkg_libhydrogen/samr21-xpro:llvm, seems like its OK to squash @maribu

This tests can be used to test for implementations violating atomic attribute.

chrysn · 2020-11-10T10:24:56Z

In the comparison of IRQs vs. atomics for performance, was this evaluated only with SEQ_CST or also with ACQUIRE and RELEASE? After all, some say (citation needed, I've read an article but can't find it any more) that if you do SEC_CST you're almost always asking more of the platform than you need.

maribu · 2020-11-10T13:34:43Z

After all, some say (citation needed, I've read an article but can't find it any more) that if you do SEC_CST you're almost always asking more of the platform than you need.

tl;dr: No, I currently don't intent to extent this API to offer other guarantees than sequentially consistence. IMO that would gain only limited benefit in our context, but we would pay for that with increased complexity and confused developers.

The comparison is only for sequentially consistent. IMO, adding more variants lead to more complexity and confusion. I already had quite a few highly emotional, exhausting, and lengthy discussions on whether just adding volatile is enough for inter-context-communication (thread-to-thread or thread-to-isr) via shared memory. I think getting the "safest" atomic access variant as fast as possible is time better spent on than on reasoning on whether relaxing the guarantees is possible, documenting it, and discussing it.

Note that overhead for sequentially consistent atomic accesses in the context of embedded systems is often significantly less than it is in multi-core superscalar out-of-order CPUs. First, disabling IRQs briefly is much cheaper than a context switch to the OS and back for operations that lack lock-free implementations. And e.g. a lock on the memory bus that might cause other CPU cores to block during loads and stores from/to RAM is something we also don't need to worry about. (When we add support for multi-core MCUs, I think a high level primitive like an yet to implement mutli_core_mutex_t and carefully trying to avoid inter-core communication wherever possible is the better route to go.)

aabadie · 2020-11-13T07:24:29Z

Hmm, I broke Murdock when merging this one: the unittests don't fit anymore in arm7 boards (msba2 and mcb2388). I suggest to move the atomic function unittest to its own test application (instead of simply excluding these boards from the unittests) ?

aabadie · 2020-11-13T07:41:45Z

Let's see if #15438 is fixing the problem

maribu added State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR Area: sys Area: System labels Jun 22, 2020

maribu requested review from aabadie, DipSwitch, fjmolinas, haukepetersen, leandrolanzieri, MichelRottleuthner, smlng and thomaseichinger as code owners June 22, 2020 20:34

leandrolanzieri mentioned this pull request Jun 23, 2020

posix: initial import of select() function (only support sockets for now) #12975

Merged

maribu mentioned this pull request Jun 23, 2020

sys/pm: Correctly access pm_blocker #13977

Merged

maribu requested a review from miri64 as a code owner June 24, 2020 19:32

maribu force-pushed the atomic_utils branch from 7a903e9 to 27cf219 Compare September 23, 2020 13:31

maribu removed Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet labels Sep 23, 2020

benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Sep 28, 2020

maribu requested review from bergzand, gschorcht, kaspar030, kYc0o and roberthartung as code owners September 28, 2020 14:34

fjmolinas reviewed Nov 10, 2020

View reviewed changes

tests/unittests/tests-atomic_utils/tests-atomic_utils.c Outdated Show resolved Hide resolved

maribu added 11 commits November 10, 2020 10:55

cpu/cortexm_common: Add atomic_utils_arch.h

a3e2d27

cpu/arm7_common: Add atomic_utils_arch.h

9c6aed7

cpu/esp_common: Add atomic_utils_arch.h

ed6b88d

cpu/mips32r2_common: Add atomic_utils_arch.h

1d2e059

cpu/fe310: Add atomic_utils_arch.h

ce09824

cpu/msp430_common: Add atomic_utils_arch.h

3f4577d

cpu/atmega_common: Add atomic_utils_arch.h

a892c1a

cpu/native: Add atomic_utils_arch.h

56a54a7

tests: Add benchmark for sys/atomic_utils

23c91db

tests/unittests: Added unit test for atomic_utils

07fc051

tests: Add another test for atomic_utils

1352ce6

This tests can be used to test for implementations violating atomic attribute.

maribu force-pushed the atomic_utils branch from 17cd09c to 1352ce6 Compare November 10, 2020 09:55

maribu removed the CI: run tests If set, CI server will run tests on hardware for the labeled PR label Nov 10, 2020

benpicco added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Nov 10, 2020

aabadie merged commit 792e031 into RIOT-OS:master Nov 12, 2020

aabadie mentioned this pull request Nov 13, 2020

tests/unittests: split out atomic_utils unittests #15438

Merged

maribu deleted the atomic_utils branch November 13, 2020 08:58

kaspar030 mentioned this pull request Oct 1, 2021

core/rmutex: use atomic utils #16919

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sys/atomic_utils: Functions for atomic access #14331

sys/atomic_utils: Functions for atomic access #14331

maribu commented Jun 22, 2020 •

edited

Loading

kaspar030 commented Jun 22, 2020

maribu commented Jun 24, 2020

maribu commented Jun 24, 2020

maribu commented Jun 24, 2020

maribu commented Jun 27, 2020

maribu commented Sep 23, 2020

kaspar030 commented Nov 5, 2020

fjmolinas commented Nov 10, 2020

chrysn commented Nov 10, 2020

maribu commented Nov 10, 2020

aabadie commented Nov 13, 2020

aabadie commented Nov 13, 2020 •

edited

Loading

sys/atomic_utils: Functions for atomic access #14331

sys/atomic_utils: Functions for atomic access #14331

Conversation

maribu commented Jun 22, 2020 • edited Loading

Contribution description

Testing procedure

Performance Check

Arithmetic Correctness Check

Testing "Atomic-ness"

Issues/PRs references

kaspar030 commented Jun 22, 2020

maribu commented Jun 24, 2020

maribu commented Jun 24, 2020

maribu commented Jun 24, 2020

maribu commented Jun 27, 2020

maribu commented Sep 23, 2020

kaspar030 commented Nov 5, 2020

fjmolinas commented Nov 10, 2020

chrysn commented Nov 10, 2020

maribu commented Nov 10, 2020

aabadie commented Nov 13, 2020

aabadie commented Nov 13, 2020 • edited Loading

maribu commented Jun 22, 2020 •

edited

Loading

aabadie commented Nov 13, 2020 •

edited

Loading