Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The (stable) neon aarch64 target feature is unsound: it changes the float ABI #131058

Open
Tracked by #69098
RalfJung opened this issue Sep 30, 2024 · 29 comments
Open
Tracked by #69098
Labels
A-ABI Area: Concerning the application binary interface (ABI) A-rust-for-linux Relevant for the Rust-for-Linux project A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

This is an instance of #116344, but since it affects a target feature marked as "stable" I made a separate issue. See rust-lang/compiler-team#780 for the MCP that approved forbidding the toggling of target features that are unsound due to their ABI impact. Stabilization of neon happened in #90621. I am not sure where the FCP for this occurred.

The summary of the problem is that code compiled with -C target-feature=-neon on an aarch64 target, if it calls any pre-compiled function from the standard library that involves f32/f64 arguments, will use the wrong ABI and hence cause UB. I am working towards marking such features as "forbidden" so that we can fix this soundness hole. But now we are hitting the same where a relevant feature is also already stable, so making it "forbidden" would be a breaking change... so we'll have to figure out something more clever. Like maybe only forbidding disabling the feature? Note however that on the aarch64-unknown-none-softfloat target, +neon is unsound for the same reason.

Cc @Amanieu @workingjubilee

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 30, 2024
@jacobbramley
Copy link
Contributor

aarch64-unknown-none-softfloat +neon could potentially use some soft-float procedure-call standard but use Neon/FP inside functions. That's what the AArch32 softfp ABIs used to do, and the use-case is possibly similar. It'd be a change of behaviour for code compiled with different Rust versions, perhaps.

@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness C-bug Category: This is a bug. A-ABI Area: Concerning the application binary interface (ABI) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. labels Sep 30, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Sep 30, 2024
@RalfJung
Copy link
Member Author

aarch64-unknown-none-softfloat +neon could potentially use some soft-float procedure-call standard but use Neon/FP inside functions.

I don't think LLVM currently supports this. On other architectures, it is possible to force a soft-float ABI while still having FP instructions and registers available inside functions (e.g. +soft-float,+sse2 on x86), but aarch64 does not seem to have that option right now.

@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

@rustbot rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 30, 2024
@workingjubilee
Copy link
Member

workingjubilee commented Sep 30, 2024

imo the major reason that we support neon as a feature flag is so that people can write cfg(target_feature = "neon") or target_feature(enable = "neon") for aarch64 embedded targets (since while the FPU should always be present, you do have to enable it first, and that might not be automatic for some embedded programs...), not so they can unsoundly globally disable it on targets that assume its presence.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 1, 2024

target_feature(enable = "neon") only makes sense on targets that don't already have that target feature... I guess that would be aarch64-unknown-none-softfloat?

However, it's also a super dangerous attribute. If that function takes or returns f32/f64, or calls any function that takes or returns f32/f64, we have ABI conflicts. Here's a demo of that. We could allow enabling neon without enabling fp-armv8, but unfortunately Rust enables both when just enabling neon...

But given current LLVM I don't think there is a sound way to tell it to build code that makes use of the FPU on an aarch64 softfloat target. So seems like the proper fix is to match what other targets do, and add a soft-float target feature that forces the use of the softfloat ABI even if neon / an FPU is present?

@RalfJung
Copy link
Member Author

RalfJung commented Oct 1, 2024

I filed an LLVM issue; I hope I used the right words here since there's still a lot I don't understand around target features. ;)
llvm/llvm-project#110632

@pinskia
Copy link

pinskia commented Oct 1, 2024

Note for aarch64 there is NO defined soft-float ABI. Even though advanced simd (and fp) are optional part of armv8-a, there is no parts out there where it is not included. I have no idea why LLVM has a softfloat aarch64 target. There is a way to forbid all simd and fp usage (in both LLVM and GCC) but that does not change the ABI.

@workingjubilee
Copy link
Member

However, it's also a super dangerous attribute. If that function takes or returns f32/f64, or calls any function that takes or returns f32/f64, we have ABI conflicts. Here's a demo of that. We could allow enabling neon without enabling fp-armv8, but unfortunately Rust enables both when just enabling neon...

I took the position that we should do this because

If FEAT_FP is implemented, then FEAT_AdvSIMD is implemented.

There is no situation where one is enabled and the other is not.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 1, 2024

There is a way to forbid all simd and fp usage (in both LLVM and GCC) but that does not change the ABI.

It does change the ABI though, as one can easily verify on godbolt: https://rust.godbolt.org/z/aP55vT1bf

@RalfJung
Copy link
Member Author

RalfJung commented Oct 1, 2024

There is no situation where one is enabled and the other is not.

Yeah that makes perfect sense if one just looks at which features the hardware supports, but sadly LLVM conflates that aspect with whether a softfloat or hardfloat ABI should be used.

@workingjubilee
Copy link
Member

LLVM has many bugs, yes.

@workingjubilee
Copy link
Member

arguably the first bug is "having a soft-float ABI for AArch64".

@RalfJung
Copy link
Member Author

RalfJung commented Oct 1, 2024

arguably the first bug is "having a soft-float ABI for AArch64".

That's the argument being brought up in the LLVM issue, yeah.

@pinskia
Copy link

pinskia commented Oct 1, 2024

That's the argument being brought up in the LLVM issue, yeah.

Actually it is defined here: ARM-software/abi-aa#232 . But it is only for the R cores rather than the A cores.

@pinskia
Copy link

pinskia commented Oct 1, 2024

From the ABI: "This variant is incompatible with the base procedure call standard and toolchains are not required to support it."

@workingjubilee
Copy link
Member

huh, interesting.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 1, 2024 via email

@RalfJung
Copy link
Member Author

RalfJung commented Oct 1, 2024

Given that there is no standard softfloat ABI for this target, maybe the right thing to do is to make sure that on an aarch64 softfloat target, we never use the LLVM-defined ABI for floats, but instead use PassMode::Cast and pass floats like integers (for all ABIs). Then we are entirely independent from LLVM on those targets.

(Suggested by @Amanieu , or at least I think that is what they meant.)

We'd still have to forbid disabling neon on aarch64 hardfloat targets. But that doesn't seem unreasonable, we're basically saying those targets require float and SIMD support.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 1, 2024

The other (valid) question being raised on the LLVM side is, what is even the usecase for ever allowing the use of neon or the FPU on the softfloat target? The target exists for kernels that want to avoid the cost of saving float registers on a context switch. So that usecase certainly does not want to ever enable the neon feature, right? Who does need the neon feature? Why did we bother stabilizing it?

@workingjubilee
Copy link
Member

workingjubilee commented Oct 3, 2024

The other (valid) question being raised on the LLVM side is, what is even the usecase for ever allowing the use of neon or the FPU on the softfloat target? The target exists for kernels that want to avoid the cost of saving float registers on a context switch. So that usecase certainly does not want to ever enable the neon feature, right? Who does need the neon feature? Why did we bother stabilizing it?

If code is compiled such that it uses float registers implicitly for a kernel, then on EVERY entry and exit of the switch into and from the kernel, it may run into code that has to use those. So it has to pay the cost of saving those registers on EVERY switch, even if it only uses integer operations in actual fact.

If a kernel is a truly hyperminimalist interpretation of a microkernel, this is effectively the end of the story. Such a kernel never does anything that can be done by user code, so it only ever does the absolute minimum in the escalated privilege state.

...However, most kernels are not hyper-minimal. This is not because they are poorly designed, but rather because if something would require an egregious number of context switches to implement this way, it would not be efficient, compared to staying in kernelspace and running a bit more code. So they may need to do something that would be made significant factors faster by paying the cost anyways, like a bit of cryptographic or rendering code. So they explicitly do an additional context save of the extended register state. Then they perform the kernelspace computation with the float register state. Then they restore the float register state. Obviously, this requires tightly guarding the regions where this is done, but there are many such tightly-roped-off segments in e.g. the Linux kernel, and CPUs even build in additional support for doing this faster!

@RalfJung
Copy link
Member Author

RalfJung commented Oct 3, 2024

Is that possible to do with clang? I guess you have to build separate files with separate flags, then you can mix some code that has FPU access with other code that does not.

@workingjubilee
Copy link
Member

workingjubilee commented Oct 3, 2024

Correct. That is exactly what they do:

 TL;DR summary
-------------
* Use only NEON instructions, or VFP instructions that don't rely on support
  code
* Isolate your NEON code in a separate compilation unit, and compile it with
  '-mfpu=neon -mfloat-abi=softfp'
* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
  NEON code
* Don't sleep in your NEON code, and be aware that it will be executed with
  preemption disabled

And yes, the kernel is built with clang for aarch64 devices.

Those instructions are actually in the "Arm" folder and not the "Arm64" folder, but they follow the same protocol for AArch64, e.g.
https://github.com/search?q=repo%3Atorvalds%2Flinux%20crypto_nhpoly1305_update_helper&type=code

@RalfJung
Copy link
Member Author

RalfJung commented Oct 3, 2024 via email

@workingjubilee
Copy link
Member

...I mean we can just do the RFL ping at this point :^)

@workingjubilee
Copy link
Member

workingjubilee commented Oct 3, 2024

It looks like on the C side they throw the -mgeneral-regs-only switch for the general build, which aiui kills compilation if their code emitter tries to use the FPU regs. But then they have the other set of flags for turning on the FPU, or maybe they just don't have to disable it...? And it's not clear what those are going to be because they're a shell command...? https://github.com/torvalds/linux/blob/7ec462100ef9142344ddbf86f2c3008b97acddbe/arch/arm64/Makefile#L39-L51

and so parts alternate between...? https://github.com/Rust-for-Linux/linux/blob/a2f11547052001bd448ccec81dd1e68409078fbb/arch/arm64/lib/Makefile#L8-L12

hmm. and they definitely use -neon in the RUSTFLAGS...
cc @Darksonn @fbq Is my understanding so far correct...? I don't think I understand the precise build process going on here, which we're trying to understand so we understand RFL's requirements?

@workingjubilee workingjubilee added the A-rust-for-linux Relevant for the Rust-for-Linux project label Oct 3, 2024
@RalfJung
Copy link
Member Author

RalfJung commented Oct 3, 2024

I guess they are using pointers or integer types to pass data into "FPU land", so the question of a softfloat ABI does not come up.

@Darksonn
Copy link
Contributor

Darksonn commented Oct 3, 2024

Right now we don't have any Rust code using the FPU, so we just always turn off neon in Rust. What's happening on the C side is that you can modify the flags passed to specific compilation units like this:

CFLAGS_foo.o += $(CC_FLAGS_FPU)
CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU)

which removes the flags in CC_FLAGS_NO_FPU and adds the flags in CC_FLAGS_FPU when compiling that particular compilation unit. I strongly doubt we have float types on the boundary between these different types of compilation units.

See this page in the kernel docs.

@workingjubilee
Copy link
Member

I'd expect they're smuggling everything in structs or longs, yeah.

@Darksonn
Copy link
Contributor

Darksonn commented Oct 3, 2024

In practice, it seems like most users of kernel_fpu_begin/kernel_fpu_end use assembly to access the registers rather than turning it on for a C compilation unit. For example arch/x86/include/asm/xor_32.h with inline assembly or arch/x86/crypto/chacha_glue.c which links with an .S file containing the relevant asm.

@fmease fmease removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ABI Area: Concerning the application binary interface (ABI) A-rust-for-linux Relevant for the Rust-for-Linux project A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants