Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/atomic_utils: Functions for atomic access #14331

Merged
merged 12 commits into from
Nov 12, 2020
Merged

Conversation

maribu
Copy link
Member

@maribu maribu commented Jun 22, 2020

Contribution description

This PR provides a set of functions to perform atomic accesses. These functions closely resemble the C11 atomic library functions, but are implemented more efficiently:

  • Lock-free atomic read-modify-write operations are usually slower than just disabling IRQs, even on big fat STM32F7 CPUs. Still, the compiler will assume lock-free implementations to always be the fastest option. This assumption does not hold on bare metal embedded targets.
  • If no lock-free implementation is used, the compiler will emit a library call. Just disabling IRQs inline is more efficient (in terms of CPU cycles and ROM)
  • In special cases (e.g. only one thread is writing to a variable), only the write part of the read-modify-write operation needs to be atomically. This can be implemented cheaper than full atomic operations and is provided here.
  • Under certain conditions (e.g. when IRQs are disabled), mixing atomic accesses with non-atomic accesses is possible and faster. This library allows for this.
  • Atomic operations for setting and clearing bits are provided
    • Unlike atomic_fetch_or(&mask, 1 << bit) or atomic_fetch_and(&mask, ~(1 << bit)) this can be implemented lock-free and efficiently with bit-banding on Cortex M3 / M4 MCUs

Testing procedure

Note: This currently only works for CortexM MCUs.

Performance Check

  • Run the provided benchmark in tests/bench_atomic_util
  • It contains a README.md. If the given documentation is not sufficient, I really should fix it there rather than here
Arithmetic Correctness Check
  • Run the added unit tests: BOARD=<SOME_CORTEX_M_BOARD> make -C tests/unittests/ tests-atomic_utils flash test
    • This will be most interesting for Cortex M3, M4, and M4F boards

Testing "Atomic-ness"

  • Run the test in tests/sys_atomic_utils
    • You can use make test, but you might want to manually test as well
    • The README.md and the in-application documentation should be sufficient. If not, I really should fix it there rather than here.

Issues/PRs references

Useful for #9882

@maribu maribu added State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR Area: sys Area: System labels Jun 22, 2020
@kaspar030
Copy link
Contributor

Nice one!

Should we already add atomic single bit functions? atomic_set_bit_u32(uint32_t *var, bit) etc.
We can then easily hook them up to use bit banding if a platform supports it.

@maribu
Copy link
Member Author

maribu commented Jun 24, 2020

We can then easily hook them up to use bit banding if a platform supports it.

Great idea. Do you have any good documentation for this? E.g. the few examples for using this I found that didn't use the bit-banding periph region did never allocate memory in the SRAM bit-banding region, but just created a wild pointer pointing somewhere in there. That might not be the best idea...

I think there should be some __attribute__((region("bit-banding"))) magic or similar to make sure a variable actually gets placed in there. But I didn't find anything the like. If such exists, the API should also provide an attribute for bitmasks, which by default is an empty preprocessor macro.

The alternative would be to check within every atomic_set_bit_u<WIDTH>() function if the address of the target mask is within the SRAM bit banding region and else fall back to atomic_fetch_or(). But I much rather add a precondition to the function that bitmask are allocated with the magic keyword, so that users get predictable and consistent performance.

@maribu
Copy link
Member Author

maribu commented Jun 24, 2020

I think it is __attribute__((section(''.srambb''))) :-) Let me check if this works without touching the linker scripts

@maribu maribu requested a review from miri64 as a code owner June 24, 2020 19:32
@maribu
Copy link
Member Author

maribu commented Jun 24, 2020

@kaspar030: I implemented the atomic set_bit / clear_bit functions as suggested using bit banding. Thanks for pointing me to this.

I added some unit tests for arithmetic correctness of the atomic functions; which directly has proven to be worth the effort.

@maribu
Copy link
Member Author

maribu commented Jun 27, 2020

I added one more test, which currently is only applicable to the atomic_fetch_<OP>_u<BITS>() family of functions. It also contains a broken implementation volatile_fetch_<OP>_u<BITS>() for reference, which can also be tested. The test seems to perform quite well to detect issues with the volatile_fetch_<OP>_u<BITS>() functions and does not detect any issues for the atomic_*() family of functions :-)

I still need to extend the test to also allow checking the atomic_set_bit_u<BITS>() and atomic_clear_bit_u<BITS>() functions. Also, some documentation will be needed for the test to be actually useful.

@maribu maribu removed Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet labels Sep 23, 2020
@maribu
Copy link
Member Author

maribu commented Sep 23, 2020

This is now ready for review.

@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Sep 28, 2020
@kaspar030 kaspar030 added Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines labels Nov 5, 2020
@kaspar030
Copy link
Contributor

re-ACK. I gave the PR a last complete look. Ran benchmarks, test and unittests on nrf52840dk. No issues.

Let's give murdock another run with tests in samr21-xpro and esp32.

@fjmolinas
Copy link
Contributor

All tests seem to be ok except `tests/pkg_libhydrogen/samr21-xpro:llvm, seems like its OK to squash @maribu

@chrysn
Copy link
Member

chrysn commented Nov 10, 2020

In the comparison of IRQs vs. atomics for performance, was this evaluated only with SEQ_CST or also with ACQUIRE and RELEASE? After all, some say (citation needed, I've read an article but can't find it any more) that if you do SEC_CST you're almost always asking more of the platform than you need.

@maribu
Copy link
Member Author

maribu commented Nov 10, 2020

After all, some say (citation needed, I've read an article but can't find it any more) that if you do SEC_CST you're almost always asking more of the platform than you need.

tl;dr: No, I currently don't intent to extent this API to offer other guarantees than sequentially consistence. IMO that would gain only limited benefit in our context, but we would pay for that with increased complexity and confused developers.


The comparison is only for sequentially consistent. IMO, adding more variants lead to more complexity and confusion. I already had quite a few highly emotional, exhausting, and lengthy discussions on whether just adding volatile is enough for inter-context-communication (thread-to-thread or thread-to-isr) via shared memory. I think getting the "safest" atomic access variant as fast as possible is time better spent on than on reasoning on whether relaxing the guarantees is possible, documenting it, and discussing it.

Note that overhead for sequentially consistent atomic accesses in the context of embedded systems is often significantly less than it is in multi-core superscalar out-of-order CPUs. First, disabling IRQs briefly is much cheaper than a context switch to the OS and back for operations that lack lock-free implementations. And e.g. a lock on the memory bus that might cause other CPU cores to block during loads and stores from/to RAM is something we also don't need to worry about. (When we add support for multi-core MCUs, I think a high level primitive like an yet to implement mutli_core_mutex_t and carefully trying to avoid inter-core communication wherever possible is the better route to go.)

@maribu maribu removed the CI: run tests If set, CI server will run tests on hardware for the labeled PR label Nov 10, 2020
@benpicco benpicco added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Nov 10, 2020
@aabadie aabadie merged commit 792e031 into RIOT-OS:master Nov 12, 2020
@aabadie
Copy link
Contributor

aabadie commented Nov 13, 2020

Hmm, I broke Murdock when merging this one: the unittests don't fit anymore in arm7 boards (msba2 and mcb2388). I suggest to move the atomic function unittest to its own test application (instead of simply excluding these boards from the unittests) ?

@aabadie
Copy link
Contributor

aabadie commented Nov 13, 2020

Let's see if #15438 is fixing the problem

@maribu maribu deleted the atomic_utils branch November 13, 2020 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines Type: new feature The issue requests / The PR implemements a new feature for RIOT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sys/tsrb is not thread safe
6 participants