Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP15 barrier instructions should be emitted before the exclusives loops (arm) #60605

Open
zrzka opened this issue May 7, 2019 · 7 comments
Open
Labels
A-atomic Area: Atomics, barriers, and sync primitives A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@zrzka
Copy link

zrzka commented May 7, 2019

Symptoms

Environment

  • Linux kernel withCP15_BARRIER_EMULATION=y
  • abi.cp15_barrier set to 1 (emulate)
  • arm-unknown-linux-gnueabihf toolchain

CP15 barrier instructions

  • They're deprecated since armv7
  • Linux kernel can emulate or HW exec them
    • abi.cp15_barrier is set to 2 (HW exec) -> there's no issue
      • The CPU must support them
      • ARMv8 in our case, which still supports them
    • abi.cp15_barrier is set to 1 (emulate) -> there's this issue

Issue description

parking_lot author:

This seems to be closer to an LLVM bug than a parking_lot bug. The source of the problem is the CP15 emulation in the kernel. Essentially the mcr p15, #0x0, r12, c7, c10, #0x5 is trapping to the kernel every time, which invalidates the exclusive monitor between the ldrex and strex instructions. This results in the strex never succeeding and looping indefinitely.

instructions-loop

ARM engineer (Will Deacon) response on this:

Hi again, Robert,

Just a quick update on this:

  1. CP15 barriers remain deprecated in the Armv8 architecture, and so
    may be removed entirely from future CPUs.

  2. Because of (1), the kernel defaults to trap+emulate, so that it can
    warn about the use of these instructions. I think this is the right
    thing to do because, once the instructions have been removed, we
    will have no choice but to trap+emulate (this happened for the SWP
    instruction already). This trapping will prevent your exclusives loop
    from ever succeeding.

  3. The right place to address this issue is in LLVM, where atomic
    read-modify-write operations with conditional release semantics (i.e.
    release on success) should actually emit the CP15 barrier before the
    exclusives loop. Assuming that contention is rare (which it kind of
    needs to be for performant compare-and-swap anyway), I don't see this
    having a meaningful impact on performance.

I've reached out to one of our upstream LLVM developers, and I'll be talking
with him face-to-face next week about getting this fixed.

Will

Solution

Will's third point:

Atomic read-modify-write operations with conditional release semantics (i.e.
release on success) should actually emit the CP15 barrier before the
exclusives loop. Assuming that contention is rare (which it kind of
needs to be for performant compare-and-swap anyway), I don't see this
having a meaningful impact on performance.

And:

I've reached out to one of our upstream LLVM developers, and I'll be talking
with him face-to-face next week about getting this fixed.

I asked for the LLVM bug # to track it, but still no response.

Way forward

  • Fix it on the Rust LLVM fork
  • Wait for the LLVM to have this fixed and wait till Rust's fork synces

Second way can prolong fix by weeks, months, ...? Not sure how fast is the LLVM itself developed & how fast is the Rust's fork syncing. This is the main reason I did report it here as well.

No fix

People aren't / won't be able to use Rust on Linux with CP15_BARRIER_EMULATION=y & abi.cp15_barrier=1 (emulation, default value) & arm-unknown-linux-gnueabihf toolchain.

@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 7, 2019
@zrzka
Copy link
Author

zrzka commented May 8, 2019

In case anyone will have this issues as well ...

Rust installation workaround

Force installer to use curl instead of reqwest crate.

export RUSTUP_USE_CURL=1
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

NOTE: This workaround allows you to install Rust toolchain only, but it doesn't fix the issue. Compile the example below, run it and you'll see.

Example which reproduces this problem

Cargo.toml:

[package]
name = "cp15"
version = "0.1.0"
edition = "2018"

[dependencies]
reqwest = "0.9"

src/main.rs:

fn main() {
    let body = reqwest::get("https://www.rust-lang.org");
    println!("Response: {:?}", body);
}

Compile with arm-unknown-linux-gnueabihf and run on the kernel with CP15_BARRIER_EMULATION=y & abi.cp15_barrier=1 => infinite loop, never ends.

@nagisa
Copy link
Member

nagisa commented May 9, 2019

Very nicely investigated issue.

Fix it on the Rust LLVM fork

Fixing this in our fork involves knowing what the fix should look like. I imagine the easiest and fastest approach will be to wait for LLVM to get a patch from one of the ARM’s engineers and then include that patch in our fork.

@zrzka
Copy link
Author

zrzka commented May 10, 2019

Agree. Will ping Will again to get the LLVM bug # (if exists) and the patch which will fix this.

@zrzka
Copy link
Author

zrzka commented May 13, 2019

Did report it in the LLVM's Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41856

@comex
Copy link
Contributor

comex commented May 13, 2019

It occurs to me that even once this bug is fixed, making every compare-and-swap (or at least every one using those semantics) trap to the kernel is going to be rather slow. The non-deprecated equivalent of the CP15 barrier is the DMB instruction, but that's only supported starting with ARMv7, whereas Rust's arm-unknown-linux-gnueabihf target only assumes ARMv6. However, Rust also has armv7-unknown-linux-gnueabihf which should be ABI compatible and avoid the issue. Since the cp15 trapping apparently only exists on arm64 kernels, aarch64-unknown-linux-gnu is also worth a mention. Both armv7 and aarch64 variants are tier 2 Rust targets, same as the arm variant, so I guess there's not much for Rust to do here.

But I am curious why rustup is picking arm, since it identifies architecture based on uname -m, which should reflect the kernel architecture. Even if you're running a 32-bit userland and a 64-bit kernel, it should still show aarch64. But if you're running a 32-bit kernel, why do you have cp15_barrier, whose implementation lives under arch/arm64?...

@zrzka
Copy link
Author

zrzka commented May 13, 2019

@comex the short answer is cross compilation (in quotes).

But I am curious why rustup is picking arm.

Our customers have dozens of different devices - RPi Zero, RPi 1, RPi 3, ... These device are ARMv6, v7, ... Part of our infrastructure is a service called builder. You can push to a special git origin and your service is going to be build on our builder. Building on RPi Zero / via QEMU is slow. Thus our build servers have ARMv8 CPUs. We're using tricks like personality, setarch, ability to run 32-bit user space on 64-bit kernel, docker, ... Which basically solves couple of issues like:

  • builds are fast,
  • same toolchain is used as on real HW,
  • same OS variant is used (like armhf Debian for RPi Zero),
  • ...

It's like ...

Build server (ARMv8 CPU, 64-bit kernel)
  |- Docker (ARMv6 personality -> RPi Zero builds)
  |- Docker (ARMv7 personality -> RPi 3 builds)
  |- ...
Build server (ARMv8 CPU, 64-bit kernel)
  |- Docker (ARMv6 personality -> RPi Zero builds)
  |- Docker (ARMv7 personality -> RPi 3 builds)
  |- ...

We solved our problem by setting abi.cp15_barrier to 2, which means HW exec. But in the future, when ARMv9, v10, v11, ... will be released, ... and these instructions will be removed (probably), we will have to set abi.cp15_barrier to 1 and then all these tricks won't work.

@MichaIng
Copy link

MichaIng commented Apr 19, 2023

Faced the same on ARMv8 platform with 64-bit kernel but 32-bit userland when enforcing armv7-unknown-linux-gnueabihf host triple. Many thanks for providing the workaround export RUSTUP_USE_CURL=1 👍, I was lost until finding this.

EDIT: Ah nope, this solves "downloading" cargo etc ("reqwest-interna" (pid) uses deprecated CP15 Barrier instruction), but when "installing" it, the same loop happens: "rustup-init" (pid) uses deprecated CP15 Barrier instruction

@zrzka
Where do you set "abi.cp15_barrier to 2"?
EDIT: Found it:

echo 'abi.cp15_barrier=2' > /etc/sysctl.d/99-cp15_barrier.conf
sysctl -p /etc/sysctl.d/99-cp15_barrier.conf

Since the LLVM Bugzilla has been archived, for completeness I'm linking the auto-generated/migrated GitHub issue here: llvm/llvm-project#41201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: Atomics, barriers, and sync primitives A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants