Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP-IDF targets' Atomic*64 is not lock-free #117305

Closed
taiki-e opened this issue Oct 28, 2023 · 8 comments · Fixed by #117307
Closed

ESP-IDF targets' Atomic*64 is not lock-free #117305

taiki-e opened this issue Oct 28, 2023 · 8 comments · Fixed by #117307
Labels
A-atomic Area: Atomics, barriers, and sync primitives C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@taiki-e
Copy link
Member

taiki-e commented Oct 28, 2023

max_atomic_width of these targets is currently set to 64


However, as mentioned in #115577 (comment), esp-idf's 64-bit atomic builtins use global spinlocks on multi-core systems:
https://github.com/espressif/esp-idf/blob/3b748a6cb76c2db7c6368a0dea32a88bc58bc44d/components/newlib/stdatomic.c#L64

And it violates what the standard library is intended to guarantee.

https://doc.rust-lang.org/nightly/std/sync/atomic/index.html#portability

All atomic types in this module are guaranteed to be lock-free if they’re available. This means they don’t internally acquire a global mutex.

Since riscv32 does not have 64-bit atomic instructions, I do not believe there is any way to fix this problem other than setting max_atomic_width of these targets to 32.

cc @ivmarkov @MabezDev (mentioned because you both are target maintainers)


@rustbot label +A-atomic +I-unsound

@taiki-e taiki-e added the C-bug Category: This is a bug. label Oct 28, 2023
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-atomic Area: Atomics, barriers, and sync primitives I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 28, 2023
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 28, 2023
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-low

@rustbot rustbot added P-low Low priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 30, 2023
@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 30, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 1, 2023
Set max_atomic_width for riscv32*-esp-espidf to 32

Fixes rust-lang#117305

> Since riscv32 does not have 64-bit atomic instructions, I do not believe there is any way to fix this problem other than setting max_atomic_width of these targets to 32.

This is a breaking change because Atomic\*64 will become unavailable, but all affected targets are tier 3, and the current Atomic*64 violates the standard library's API contract and can cause problems with code that rely on the standard library's atomic types being lock-free.

r? `@Amanieu`
cc `@ivmarkov` `@MabezDev`
@bors bors closed this as completed in f3457db Nov 1, 2023
@ivmarkov
Copy link
Contributor

max_atomic_width of these targets is currently set to 64

However, as mentioned in #115577 (comment), esp-idf's 64-bit atomic builtins use global spinlocks on multi-core systems: https://github.com/espressif/esp-idf/blob/3b748a6cb76c2db7c6368a0dea32a88bc58bc44d/components/newlib/stdatomic.c#L64

And it violates what the standard library is intended to guarantee.

https://doc.rust-lang.org/nightly/std/sync/atomic/index.html#portability

All atomic types in this module are guaranteed to be lock-free if they’re available. This means they don’t internally acquire a global mutex.

Since riscv32 does not have 64-bit atomic instructions, I do not believe there is any way to fix this problem other than setting max_atomic_width of these targets to 32.

cc @ivmarkov @MabezDev (mentioned because you both are target maintainers)

@rustbot label +A-atomic +I-unsound

@taiki-e

Sorry to revive this old issue. At the time I just trusted your judgement when you removed the AtomicU64 implementation from the riscv32imac-esp-espidf as my understanding of this subject mater was (and still is) limited.

The problem is, we now we have this new issue - and - without diving into the details of the new issue just yet - we (or I) really need to understand what exactly a "lock-free" atomic does mean and how exactly an atomic that is allowed to "wait" is different from a "lockful" atomic (all atomics in core::sync::atomic are allowed to wait but are NOT allowed to use locks as per the documentation).

To give you a concrete example where I struggle: for the riscv32imac-esp-espidf target, you removed the support for AtomicU64 on the basis that it uses a global spinlock (in the presence of multiple cores).

But why did you consider this spinlock a "lock" and not a "wait" (as per the terminology of the core::sync::atomic documentation from above)?
Because this spinlock (oversimplifying a bit here) is nothing else than a loop. A "wait", if you wish.

The spinlock implementation is essentially (in pseudocode):

static hardware_atomic_bool_spinlock; // A `bool` **hardware** atomic declared statically somewhere

fn software_cmpxchg_for_something_big(something: *mut SomethingBig, old: SomethingBig, new: SomethingBig) -> SomethingBig {
    disable_all_interrupts_for_the_core_that_runs_this_code();

    while hardware_cmpxchg(&hardware_atomic_bool_spinlock, false, true) == true {
        // Ah, the spinlock is set to true; must be the other core; re-try in a busy-loop until we succeed
    }

    // Here we have interrupts disabled for our core, _and_ we have indicated to the other core (via the `hardware_atomic_bool_spinlock`) to wait until we do our job so we can just manipulate `something` now
    let current = unsafe { *something };
    if current == old {
        unsafe { *something = new; }
    }

    let ret = hardware_cmpxchg(&hardware_atomic_bool_spinlock, true, false);
    assert!(ret); // Should always return `true`

    enable_all_interrupts_for_the_core_that_runs_this_code();

    ret
}

So why is the above ^^^ a "lock" and not just a "wait"? And could you give an example of a "wait" which is not a "lock"?

@bjorn3
Copy link
Member

bjorn3 commented Aug 31, 2024

But why did you consider this spinlock a "lock" and not a "wait" (as per the terminology of the core::sync::atomic documentation from above)?

To quote wikipedia

Wait-freedom is the strongest non-blocking guarantee of progress, combining guaranteed system-wide throughput with starvation-freedom. An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes.[15] This property is critical for real-time systems and is always nice to have as long as the performance cost is not too high.
[...]
Lock-freedom allows individual threads to starve but guarantees system-wide throughput. An algorithm is lock-free if, when the program threads are run for a sufficiently long time, at least one of the threads makes progress (for some sensible definition of progress). All wait-free algorithms are lock-free.

Your software_cmpxchg_for_something_big would starve all threads if any thread that calls it permanently gets stopped while the lock on hardware_atomic_bool_spinlock is held. CPU's generally have NMI's which bypass disable_all_interrupts_for_the_core_that_runs_this_code();, so if the NMI handler gets stuck (which it is allowed to, eg because it depends on hardware_atomic_bool_spinlock itself) or NMI's are triggered back to back without any instruction executing in between, disable_all_interrupts_for_the_core_that_runs_this_code(); has no effect at all on the lock-freedom.

A lock-free but non-wait-free implementation allows some threads get stuck, but still requires that at least one thread can continue at some point. Only a non-lock-free implementation is allowed to have all threads that attempt to perform atomic operations to get permanently stuck.

@ivmarkov
Copy link
Contributor

But why did you consider this spinlock a "lock" and not a "wait" (as per the terminology of the core::sync::atomic documentation from above)?

To quote wikipedia

Wait-freedom is the strongest non-blocking guarantee of progress, combining guaranteed system-wide throughput with starvation-freedom. An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes.[15] This property is critical for real-time systems and is always nice to have as long as the performance cost is not too high.
[...]
Lock-freedom allows individual threads to starve but guarantees system-wide throughput. An algorithm is lock-free if, when the program threads are run for a sufficiently long time, at least one of the threads makes progress (for some sensible definition of progress). All wait-free algorithms are lock-free.

Your software_cmpxchg_for_something_big would starve all threads if any thread that calls it permanently gets stopped while the lock on hardware_atomic_bool_spinlock is held. CPU's generally have NMI's which bypass disable_all_interrupts_for_the_core_that_runs_this_code();, so if the NMI handler gets stuck (which it is allowed to, eg because it depends on hardware_atomic_bool_spinlock itself) or NMI's are triggered back to back without any instruction executing in between, disable_all_interrupts_for_the_core_that_runs_this_code(); has no effect at all on the lock-freedom.

A lock-free but non-wait-free implementation allows some threads get stuck, but still requires that at least one thread can continue at some point. Only a non-lock-free implementation is allowed to have all threads that attempt to perform atomic operations to get permanently stuck.

So are you saying that

  • In the presence of NMI the above code is NOT lock free
  • However in the absence of NMI, the above code would be lock free? (it is another topic if we can guarantee the absence of NMI or its weaker variant - that the NMI code never touches hardware_atomic_bool_spinlock)?

@ivmarkov
Copy link
Contributor

As for

or NMI's are triggered back to back without any instruction executing in between, disable_all_interrupts_for_the_core_that_runs_this_code(); has no effect at all on the lock-freedom.

Wouldn't that mean the whole "user-space" (non-interrupt code) = all threads would starve, and nothing would make progress anyway. Including threads that utilize native atomics only. Asking, because I don't see how that particular example (NMIs firing back to back) applies here, as it would (obviously?) starve absolutely everything else, hardware atomics or not?

@bjorn3
Copy link
Member

bjorn3 commented Aug 31, 2024

However in the absence of NMI, the above code would be lock free?

I think it would be assuming that the underlying CPU core ensures that it will always keep making forward progress itself. So for example another core must not be able to shutdown any core that has hardware_atomic_bool_spinlock held.

(it is another topic if we can guarantee the absence of NMI or its weaker variant - that the NMI code never touches hardware_atomic_bool_spinlock)?

Even an NMI handler which immediately returns would make it non-lock-free if there is a possibility that NMI's arrive back to back. If however the CPU guarantees to execute at least one instruction between returning from an NMI and calling the NMI handler again (eg because the only source of NMI's is a watchdog timer and the NMI handler is fast enough), then it should be fine when the NMI handler doesn't touch hardware_atomic_bool_spinlock.

Wouldn't that mean the whole "user-space" (non-interrupt code) = all threads would starve, and nothing would make progress anyway. Including threads that utilize native atomics only. Asking, because I don't see how that particular example (NMIs firing back to back) applies here, as it would (obviously?) starve absolutely everything else, hardware atomics or not?

With lock-free atomics if you suspend all userspace threads except for one, that one thread will always be able to make forward progress with regards to executing atomic operations. In case of a non-lock-free implementation it is possible that it blocks forever waiting on one of the suspended threads releasing the lock.

@ivmarkov
Copy link
Contributor

ivmarkov commented Aug 31, 2024

I think I might understand it a bit now.
If you could confirm my two examples/statements below that would be greatly appreciated (thank you for bearing with me!):

=====================

  1. Example why spinlock+disable-all-interrupts-except-nmi is not lock free in the presence of any NMI:
  • Two cores, core0 and core1
  • Two threads, thread0 - pinned to core0, and thread1 - pinned to core1, both currently inside software_cmpxchg_for_something_big, but thread0 also got the "lock" and is in the midst of mutating something, while thread1 still busyloops on the spinlock
  • Suddenly thread0 gets suspended while still mutating something because core0 suddenly receives a storm of never-ending empty NMIs and therefore thread0 cannot progress
    => ...but that's OK, because it is "suspended" by the NMI storm
  • thread1 - unfortunately - cannot progress either, as it is stuck in the spinlock busy-loop - waiting on thread0, which is suspended forever because of the never-ending NMI storm
    => that's NOT OK, because thread1 is not suspended (because the NMI storm happens on the other core) - yet - it cannot make any progress

=====================

  1. Why a simple "disable-all-interrupts-except-NMI" implementation of software atomics (obviously without a spinlock as it is a single core) on single core systems is lock-free:
  • Because - even though a thread might be in the midst of modifying the value of such a software atomic when interrupted by an avalanche of never-ending NMIs, by your definition, this thread (as well as all others on the single-core system) are "suspended" (by the NMI avalanche). So these not making progress is actually fine and the algorithms still qualifies as lock-free.

Note also that the condition that the NMI handlers should not touch such software atomics is a MUST for this case (2), or else such a software implementation would of course be unsound. It is a must for the "lockful" implementation in case (1) from above as well, but for other reason (the NMI might deadlock).

@bjorn3
Copy link
Member

bjorn3 commented Aug 31, 2024

Both examples are correct afaik.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: Atomics, barriers, and sync primitives C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants