Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std_detect: Use riscv_hwprobe on RISC-V Linux/Android #1762

Closed
wants to merge 6 commits into from

Conversation

taiki-e
Copy link
Member

@taiki-e taiki-e commented Mar 30, 2025

On RISC-V, detection using auxv only supports single-letter extensions.
So, this PR uses riscv_hwprobe that supports multi-letter extensions if available to support more target features. https://www.kernel.org/doc/html/latest/arch/riscv/hwprobe.html

I had originally planned to open a PR after the nightly release that included rust-lang/rust#138742 merged today, but there was a question about this (rust-lang/rust#139139), so I opened it earlier.

Closes rust-lang/rust#139139

Tested with QEMU 9.2.1 user mode:

riscv64gc-unknown-linux-gnu
rv32i: false
rv32e: false
rv64i: true
rv128i: false
zicsr: true
zicntr: false
zihpm: false
zifencei: false
zihintpause: false
m: true
a: true
zacas: true
zawrs: false
zam: false
ztso: true
f: true
d: true
q: false
zfh: true
zfhmin: true
zfinx: false
zdinx: false
zhinx: false
zhinxmin: false
c: true
zba: true
zbb: true
zbc: true
zbs: true
zbkb: true
zbkc: true
zbkx: true
zknd: true
zkne: true
zknh: true
zksed: true
zksh: true
zkr: false
zksed: true
zksh: true
zkr: false
zkn: true
zks: true
zkt: true
v: true
zvfh: true
zvfhmin: true
zve32x: true
zve32f: true
zve64x: true
zve64f: true
zve64d: true
zvkb: true
zvbb: true
zvbc: true
zvkg: true
zvkned: true
zvknha: true
zvknhb: true
zvksed: true
zvksh: true
zvkn: true
zvknc: true
zvkng: true
zvks: true
zvksc: true
zvksg: true
zvkt: true
unaligned-scalar-mem: true
unaligned-vector-mem: false
j: false
p: false
riscv32gc-unknown-linux-gnu
rv32i: true
rv32e: false
rv64i: false
rv128i: false
zicsr: true
zicntr: false
zihpm: false
zifencei: false
zihintpause: false
m: true
a: true
zacas: true
zawrs: false
zam: false
ztso: true
f: true
d: true
q: false
zfh: true
zfhmin: true
zfinx: false
zdinx: false
zhinx: false
zhinxmin: false
c: true
zba: true
zbb: true
zbc: true
zbs: true
zbkb: true
zbkc: true
zbkx: true
zknd: true
zkne: true
zknh: true
zksed: true
zksh: true
zkr: false
zksed: true
zksh: true
zkr: false
zkn: true
zks: true
zkt: true
v: true
zvfh: true
zvfhmin: true
zve32x: true
zve32f: true
zve64x: true
zve64f: true
zve64d: true
zvkb: true
zvbb: true
zvbc: true
zvkg: true
zvkned: true
zvknha: true
zvknhb: true
zvksed: true
zvksh: true
zvkn: true
zvknc: true
zvkng: true
zvks: true
zvksc: true
zvksg: true
zvkt: true
unaligned-scalar-mem: true
unaligned-vector-mem: false
j: false
p: false

@rustbot
Copy link
Collaborator

rustbot commented Mar 30, 2025

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Comment on lines 318 to 327
// FIXME: e is not exposed in any of asm/hwcap.h, uapi/asm/hwcap.h, uapi/asm/hwprobe.h
#[cfg(target_arch = "riscv32")]
enable_feature(
&mut value,
Feature::rv32e,
bit::test(auxv.hwcap, (b'e' - b'a').into()),
);
// FIXME: h is not exposed in uapi/asm/hwcap.h and uapi/asm/hwprobe.h
enable_feature(
&mut value,
Feature::h,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is no chance for these to be enabled in the current Linux user mode, although...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that all the supervisor features should probably just be removed from feature detection and from the compiler. They are not relevant for the language/compiler.

flags: libc::c_uint,
) -> libc::c_long {
unsafe {
libc::syscall(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use inline assembly for Linux here (like portable-atomic does), but since this module is currently behind the libc feature, and since std currently always depends on libc on all Linux targets, there is no need to complicate the code here.

@taiki-e taiki-e force-pushed the std-detect-riscv-linux branch from 3203fd5 to 1ece49d Compare March 31, 2025 01:35
@a4lg
Copy link
Contributor

a4lg commented Mar 31, 2025

That's pretty much the same as what I've testing and thanks for making progress earlier than me.
So, I'll review some of the details as a RISC-V toolchain maintainer (mainly on GNU).

@a4lg
Copy link
Contributor

a4lg commented Apr 1, 2025

@taiki-e @Amanieu

We need to discuss about one topic before merging this: the fact that Linux supports disabling vector support per thread (current one / next context created by following execve call) by a prctl call: PR_RISCV_V_SET_CONTROL.

The point is, support for vector extensions retrieved by a riscv_hwprobe call is not affected whether the vector extension for the current thread (or the next context created by following execve call) is enabled.

IMHO, the solution here would be, checking (on the first feature test call) whether vector extensions for the current thread are enabled by prctl with PR_RISCV_V_GET_CONTROL and disabling vector-related extensions if not. This method cannot track per-thread status and later enablement after the feature detection but should be fine on most cases. Force-enabling vector extensions is also a solution but can be a problem on a library.

@taiki-e
Copy link
Member Author

taiki-e commented Apr 1, 2025

Given the structure of Rust/LLVM's target feature, I wonder if it is UB to call code that is not written in (inline or external) assembly in the time between disallowing vector instructions and re-allowing it.

For example, if I compile code with -C target-feature=+v, the compiler will use vector instructions for code that copies a size larger than a register from memory, but if I disallowing vector instructions and then reach that code, I will get a trap, right?

@Amanieu
Copy link
Member

Amanieu commented Apr 1, 2025

My reading of https://docs.kernel.org/arch/riscv/vector.html is that it's mainly intended for system-level management of vector availability. They explicitly say that:

To get the availability of V in an ELF program, please read COMPAT_HWCAP_ISA_V bit of ELF_HWCAP in the auxiliary vector.

So we should just do that: hwcaps will reflect whether vector support is enabled for the current process.

Note that hwprobe does not disable V if vector support is disabled for the process, so we need to use hwcaps in this case (or manually check with prctl(PR_RISCV_V_GET_CONTROL)).

@a4lg
Copy link
Contributor

a4lg commented Apr 1, 2025

@Amanieu I confirmed that ELF_HWCAP is in fact affected by availability of the V extension when the program starts (unlike riscv_hwprobe; I remembered ELF_HWCAP incorrectly and I worried a breaking change if ELF_HWCAP is not affected by prctl (in fact, that was not the case)).
Still, I think we'd better use prctl not to accidentally enable vector extensions through riscv_hwprobe.

@a4lg
Copy link
Contributor

a4lg commented Apr 1, 2025

@taiki-e I think that's correct and I'm not worrying about static feature checking (if we use -C target-feature=+v, we expect that the V extension is enabled even on runtime, right?).
I'm worrying about the situation where the vector extensions are soft-disabled on runtime (see intended usage as described in https://docs.kernel.org/arch/riscv/vector.html).

@taiki-e taiki-e force-pushed the std-detect-riscv-linux branch from 525a848 to e26a29f Compare April 1, 2025 18:40
It reflects Vector enablement status, unlike hwprobe.
@taiki-e taiki-e force-pushed the std-detect-riscv-linux branch from e26a29f to 1474c1d Compare April 1, 2025 19:01
@taiki-e
Copy link
Member Author

taiki-e commented Apr 1, 2025

Updated PR (mostly based on feedback).


Note that hwprobe does not disable V if vector support is disabled for the process, so we need to use hwcaps in this
case (or manually check with prctl(PR_RISCV_V_GET_CONTROL)).

Still, I think we'd better use prctl not to accidentally enable vector extensions through riscv_hwprobe.

I thought that the use of hwprobe and prctl is probably more preferable because hwcap only supports detection of V extension, but hwprobe also supports detection of cases where only subset of V extension are available.

However, qemu-user 9.2.1 does not seem to support that, so for now I implemented it using hwcap.

[crates/std_detect/src/detect/os/linux/riscv.rs:211:25] libc::prctl(70) = -1

@a4lg
Copy link
Contributor

a4lg commented Apr 2, 2025

Let me give a time to test that (I'm struggling to prepare RISC-V 32-bit Linux environment with QEMU system emulation while preparing 64-bit counterpart worked well).

@taiki-e
Copy link
Member Author

taiki-e commented Apr 2, 2025

(I tested this with qemu-riscv32 (user mode) using toolchain used in setup-cross-toolchain-action, and updated PR description to include the result.)

@a4lg
Copy link
Contributor

a4lg commented Apr 3, 2025

Okay, finally I've built full Linux 6.14 + Buildroot + glibc RISC-V environments (32/64) + Rust toolchain and tested your code is working as expected (I could not accept your changes while no real Linux environment is tested). I will likely make some tidying but after you make most of the changes (because I found your prctl-based version branch).

Recommendation: Approval

Sidenote: not just changing the -cpu option to QEMU, I also confirmed that a sysctl entry /proc/sys/abi/riscv_v_default_allow (1: enabled, 0: disabled) changes the result of the V extension retrieved by std::arch::is_riscv_feature_detected!.

Comment on lines +77 to +78
// const RISCV_HWPROBE_EXT_SUPM: u64 = 1 << 49;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, Linux 6.15 will also add the following.
(Although we cannot include them in this PR because 6.15 is not yet released and could be changed.)

const RISCV_HWPROBE_EXT_ZICNTR: u64 = 1 << 50;
const RISCV_HWPROBE_EXT_ZIHPM: u64 = 1 << 51;
const RISCV_HWPROBE_EXT_ZFBFMIN: u64 = 1 << 52;
const RISCV_HWPROBE_EXT_ZVFBFMIN: u64 = 1 << 53;
const RISCV_HWPROBE_EXT_ZVFBFWMA: u64 = 1 << 54;
const RISCV_HWPROBE_EXT_ZICBOM: u64 = 1 << 55;
const RISCV_HWPROBE_EXT_ZAAMO: u64 = 1 << 56;
const RISCV_HWPROBE_EXT_ZALRSC: u64 = 1 << 57;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I'm writing full riscv_hwprobe feature probing (including OS-independent logic for stdarch) and related feature sets (for Rust) and... I think that will exceed 93, the current limit of the 12-byte cache.

Comment on lines 208 to 209
// FIXME: we can implement this by getting the current vlen
// zvl*b: Minimum Vector Length Standard Extensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite that I can allow this comment alone (so that you can keep up with this PR), I disagree (actually) adding all Zvl*b extensions (in total of 12; * being a power of two between 32 and 65536 inclusive) for being too redundant (e.g. 256 implies 128, 128 implies 64 and so on) and either:

  • Unlikely to be referred by real-world programs (due to the software-side nature of RVV) or
  • Likely that (for specialized programs) just knowing the vector length is not enough (it is said that implementing RVV to the hardware is harder than implementing regular SIMD instructions and some reports suggested that the actual performance boost heavily depends on the actual vector engine implementation) so that adding Zvl*b alone will not help them much.

I acknowledge that there's still a need to retrieve the vector register size and I think an intrinsic (which returns the vector register size) would be a better solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It indeed makes more sense to have it as a core::arch intrinsic, since nothing OS-specific is needed to get the VLEN, and the actual length information can be obtained in a single call.

Removed that code comment.

Comment on lines +77 to +78
// const RISCV_HWPROBE_EXT_SUPM: u64 = 1 << 49;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I'm writing full riscv_hwprobe feature probing (including OS-independent logic for stdarch) and related feature sets (for Rust) and... I think that will exceed 93, the current limit of the 12-byte cache.

@a4lg
Copy link
Contributor

a4lg commented Apr 5, 2025

I noticed that I haven't pressed the "Submit review" button.

}
};
if out[0].key != -1 {
let ima_ext_0 = out[0].value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems using this key alone is not robust enough (checking the IMA base is necessary).

I opened #1770 for my similar proposal with OS-independent extension implication logic and with fixes including this part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Earlier versions of this PR actually included that check, but it was accidentally removed when I changed to use hwcap.
3203fd5#diff-dfab5b675012c4be84a409e8c4f92c8e172e797caf07a23516d1fda390fb9818R152

@a4lg
Copy link
Contributor

a4lg commented Apr 11, 2025

I opened #1770 to shape the best feature detection logic in my mind.

@a4lg
Copy link
Contributor

a4lg commented Apr 11, 2025

Interestingly, in the process reviewing your PR, I found that your direct syscall approach (not calling glibc's __riscv_hwprobe and userland vDSO) succeeds to workaround a weird behavior I found when we call glibc's __riscv_hwprobe which calls vDSO prepared by the Linux kernel (likely a Linux-side bug possibly related to vDSO).

@taiki-e
Copy link
Member Author

taiki-e commented Apr 11, 2025

Closing in favor of #1770.

@taiki-e taiki-e closed this Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

is_riscv_feature_detected doesn't seem to actually detect anything at runtime
4 participants