Skip to content

CP15 barrier instructions should be emitted before the exclusives loops (arm) #60605

Open
@zrzka

Description

@zrzka

Symptoms

Environment

  • Linux kernel withCP15_BARRIER_EMULATION=y
  • abi.cp15_barrier set to 1 (emulate)
  • arm-unknown-linux-gnueabihf toolchain

CP15 barrier instructions

  • They're deprecated since armv7
  • Linux kernel can emulate or HW exec them
    • abi.cp15_barrier is set to 2 (HW exec) -> there's no issue
      • The CPU must support them
      • ARMv8 in our case, which still supports them
    • abi.cp15_barrier is set to 1 (emulate) -> there's this issue

Issue description

parking_lot author:

This seems to be closer to an LLVM bug than a parking_lot bug. The source of the problem is the CP15 emulation in the kernel. Essentially the mcr p15, #0x0, r12, c7, c10, #0x5 is trapping to the kernel every time, which invalidates the exclusive monitor between the ldrex and strex instructions. This results in the strex never succeeding and looping indefinitely.

instructions-loop

ARM engineer (Will Deacon) response on this:

Hi again, Robert,

Just a quick update on this:

  1. CP15 barriers remain deprecated in the Armv8 architecture, and so
    may be removed entirely from future CPUs.

  2. Because of (1), the kernel defaults to trap+emulate, so that it can
    warn about the use of these instructions. I think this is the right
    thing to do because, once the instructions have been removed, we
    will have no choice but to trap+emulate (this happened for the SWP
    instruction already). This trapping will prevent your exclusives loop
    from ever succeeding.

  3. The right place to address this issue is in LLVM, where atomic
    read-modify-write operations with conditional release semantics (i.e.
    release on success) should actually emit the CP15 barrier before the
    exclusives loop. Assuming that contention is rare (which it kind of
    needs to be for performant compare-and-swap anyway), I don't see this
    having a meaningful impact on performance.

I've reached out to one of our upstream LLVM developers, and I'll be talking
with him face-to-face next week about getting this fixed.

Will

Solution

Will's third point:

Atomic read-modify-write operations with conditional release semantics (i.e.
release on success) should actually emit the CP15 barrier before the
exclusives loop. Assuming that contention is rare (which it kind of
needs to be for performant compare-and-swap anyway), I don't see this
having a meaningful impact on performance.

And:

I've reached out to one of our upstream LLVM developers, and I'll be talking
with him face-to-face next week about getting this fixed.

I asked for the LLVM bug # to track it, but still no response.

Way forward

  • Fix it on the Rust LLVM fork
  • Wait for the LLVM to have this fixed and wait till Rust's fork synces

Second way can prolong fix by weeks, months, ...? Not sure how fast is the LLVM itself developed & how fast is the Rust's fork syncing. This is the main reason I did report it here as well.

No fix

People aren't / won't be able to use Rust on Linux with CP15_BARRIER_EMULATION=y & abi.cp15_barrier=1 (emulation, default value) & arm-unknown-linux-gnueabihf toolchain.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-atomicArea: Atomics, barriers, and sync primitivesC-bugCategory: This is a bug.O-ArmTarget: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 stateT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions