std::arch SIMD intrinsics #171

gnzlbg · 2018-11-20T20:05:56Z

Currently the SIMD intrinsics are implemented in stdsimd using link_llvm_intrinsics to directly call the llvm intrinsics via their C ABI, and using a handful of "generic" simd intrinsics.

Is there a way to directly call Cretonne intrinsics?
Is there a cfg() macro available to detect whether the codegen backend is LLVM or Cranelift ?

The text was updated successfully, but these errors were encountered:

bjorn3 · 2018-11-21T07:29:07Z

Is there a way to directly call Cretonne intrinsics?

Cranelift doesn't have intrinsics. It does have simd types, which can be used with normal instructiosn like iadd. However I dont think all simd intrinsics have a cranelift instruction counterpart and implementing all of them takes time I would rather use to implement other things.

Is there a cfg() macro available to detect whether the codegen backend is LLVM or Cranelift?

Not yet, have been thinking about adding one though.

gnzlbg · 2018-11-21T07:59:53Z

@bjorn3 check this out:

https://github.com/rust-lang-nursery/stdsimd/blob/master/coresimd/x86/sse.rs#L1986

This is how std::arch calls into the different simd instructions (not all of them, but many of them). The question is, would it be possible to write similar code to target Cranelift ? Or should we move all of these into platform-intrinsics and add the abstraction layer at the Rust codegen level ?

It does have simd types, which can be used with normal instructiosn like iadd

That sounds more like what packed_simd does, which uses some of rustc's generic simd intrinsics, e.g., see here: https://github.com/rust-lang-nursery/packed_simd/blob/master/src/codegen/llvm.rs

It might be easier to get packed_simd to work with Cranelift than to get std::arch working, but note that packed_simd also uses std::arch intrinsics in many cases to work arounds codegen bugs in LLVM... A cfg macro to detect the backend would be needed here to detect Cranelift and remove these workarounds, but we might have to potentially add other workarounds for Cranelift, by "somehow" calling Cranelift SIMD operations.

Even if this work is not there yet, I think work to "prepare" std::arch and packed_simdfor Cranelift can already start, and ideally using such a cfg macro we would get one single std::arch intrinsic first, write down the process to "support Cranelift", and try to get people involved, mentor them, etc.

Removing link_llvm_intrinsics from std::arch would be a bit of work, but it is possible. And there are python generators in Rust upstream that generate platform-intrinsics for these "automatically". Maybe porting those to Cranelift would be a way forward. We can then use cfg macros in std::arch to only expose the simd instructions that Cranelift supports, and can work on adding support to Cranelift for more simd instructions until we reach parity.

bjorn3 · 2018-11-21T12:54:56Z

cg_clif currently puts non-primitives in stackslots which would kill simd performance.

bjorn3 · 2018-11-21T12:57:01Z

The question is, would it be possible to write similar code to target Cranelift ?

Not without changes to cg_clif to intercept those intrinsics.

Even if this work is not there yet, I think work to "prepare" std::arch and packed_simdfor Cranelift can already start, and ideally using such a cfg macro we would get one single std::arch intrinsic first, write down the process to "support Cranelift", and try to get people involved, mentor them, etc.

+1

gnzlbg · 2018-11-21T13:03:44Z

What would be desirable to start preparing stdsimd and packed_simd, is to clarify how the result should look like in those crates. cc @eddyb @sunfishcode

@eddyb was of the strong opinion that stdsimd should stop using link_llvm_intrinsics and start using platform intrinsics instead, but not all platform-intrinsics might be available for cranelift at least initially, so having some #[cfg(rustc_backend_llvm)] and #[cfg(rustc_backend_cranelift)] macros behind a feature gate in rustc would be useful for that, and also, to gate the llvm-specific workarounds of packed_simd.

The lowest-hanging fruit is probably to get packed_simd to work with cranelift, since many crates using std::arch also offer a packed_simd version behind a cargo feature to make their crate portable (e.g. rand). If we put all llvm workarounds behind feature gates in packed_simd, we would "only" have to implement the ~20 generic SIMD intrinsics for the cranelift backend.

eddyb · 2018-11-21T13:50:20Z

I don't think littering stdsimd with #[cfg]s is a good idea - the proper solution is to allow mixing LLVM and Cranelift codegen units like @sunfishcode suggested, and long-term have something like platform-intrinsics built into Cranelift, used as the source of truth (many of the instructions can be dealt with in a compact declarative fashion), with a mapping to LLVM names as a secondary interface.

bjorn3 · 2018-11-21T13:50:38Z

Using platform-intrinsics to get packed_simd working seems doable. The simd_reduce_* family doesn't seem to have cranelift counterparts though. (cc @sunfishcode)

bjorn3 · 2018-11-21T13:55:16Z

the proper solution is to allow mixing LLVM and Cranelift codegen units like @sunfishcode suggested

I like the idea. There is currently an abi incompatibility between cg_clif and cg_llvm. (cg_clif always passes non primitives by-ref and uses the cranelift fast calling convention instead of System-V like cg_llvm) Other than that it kind of works already today. (metadata is put in the same place and symbol names are made the same way)

gnzlbg · 2018-11-21T14:27:59Z

The simd_reduce_* family doesn't seem to have cranelift counterparts though.

I am not sure if these are necessary for an MVP of packed_simd working with cranelift or not. Maybe we could workaround these in cranelift, at least initially (e.g. by falling back to scalar code).

eddyb · 2018-11-21T14:34:03Z

Yeah I suspect any upstreamed backend to use e.g. FnType and strictly adhere to the ABI.
We can even come up with a Cranelift-friendly ABI that we use for LLVM too - we just have to be consistent and do everything through FnType.

bjorn3 · 2018-11-21T14:39:11Z

I don't understand what to do when getting PassMode::Cast.

eddyb · 2018-11-21T14:43:37Z

You're supposed to pass the type's bytes as one or more immediates, as indicated by the information in the cast (e.g. rustc_target::abi::call::Reg tells you the register kind and size).

bjorn3 · 2018-11-21T14:45:04Z

Got it. Thanks!

bjorn3 · 2019-07-27T15:55:41Z

I implemented support for some simd_* intrinsics on the simd_emulation branch.

gnzlbg · 2019-07-27T16:13:14Z

I suppose that doing a scalar emulation might be initially ok, but doesn't cranelift support emitting the appropriate instructions?

bjorn3 · 2019-07-27T17:18:41Z

Yes, it does for most intrinsics, but cranelift is currently implementing real simd, instead of emulation like I did here. (bytecodealliance/cranelift#833, bytecodealliance/cranelift#855, bytecodealliance/cranelift#868) Because of this I think for example adding two vectors is broken. (cc @abrown, am I correct?) Also using real SIMD will not give much performance benifit yet, until some changes to the rest of cg_clif to not always store the vectors on the stack are performed and inlining of the std::arch functions is performed.

abrown · 2019-07-29T16:19:02Z

Because of this I think for example adding two vectors is broken

I think this is only broken if you turn on the enable_simd setting because only a handful of SIMD instructions are implemented. Otherwise, the vectors are split up and--as far as I understand cranelift--the vector addition should work (see https://github.com/CraneStation/cranelift/blob/a5b17e8a0f044ec03b6982baea3757af43e70b7b/cranelift-codegen/src/isa/x86/abi.rs#L87-L95). More SIMD instructions are coming but we need to review and merge foundational stuff like bytecodealliance/cranelift#855 and bytecodealliance/cranelift#868 --any help is appreciated 😄.

bjorn3 · 2019-07-29T16:27:43Z

I think this is only broken if you turn on the enable_simd setting

Of cource, forgot about that setting.

bjorn3 · 2019-07-30T12:57:08Z

Opened #650.

bjorn3 · 2023-07-24T13:30:41Z

https://github.com/bjorn3/rustc_codegen_cranelift/pull/1378 and https://github.com/bjorn3/rustc_codegen_cranelift/pull/1380 implemented a bunch more intrinsics.

bjorn3 · 2023-10-14T13:55:40Z

Enough intrinsics are supported now that it seems like removing the hack to make is_x86_feature_detected!() return false in https://github.com/bjorn3/rustc_codegen_cranelift/pull/1397 didn't cause issues for anyone. Or at least nobody reported an issue because of this change.

bjorn3 · 2023-10-24T12:32:46Z

Rav1e and image now thanks to e5ba1e8 and a558968 respectively.

bjorn3 · 2023-11-07T18:54:25Z

#1417 implemented a lot of intrinsics that were found to be missing.

benwis · 2023-11-08T19:23:56Z

So I'm testing cranelift a bit in our group, so far we've hit these missing instructions. Not sure there's a workaround for that right now. Also curious how people figure out what dep is calling these, and how to best track when PRs land in rust nightly

core::arch::x86::avx::_mm256_ldqu_si256 llvm.x86.avx.ldu.dq.256
llvm.x86.aesni.aesenc
llvm.x86.aesni.aesimc
llvm.x86.aesni.aesenclast

I think the first one might be fixed by #1417. Loving cranelift where we can use this though!

bjorn3 · 2023-11-08T19:39:17Z

Also curious how people figure out what dep is calling these

I just get a backtrace at the crash site using a debugger.

and how to best track when PRs land in rust nightly

I will post a comment in the respective issue once it lands on nightly.

Not sure there's a workaround for that right now.

For the aes intrinsics you can use RUSTFLAGS="--cfg aes_force_soft" when compiling as workaround to force the aes crate to use a software implementation instead of the simd intrinsics.

AppleSheeple · 2023-11-09T04:15:04Z

Also curious how people figure out what dep is calling these

I just get a backtrace at the crash site using a debugger.

FWIW, I went with a simple cargo vendor and grep approach, which lead to #1410.

This should be simpler, as it doesn't burden you with running the code and trying to hit the traps. And also, there could be more than one dependency using the same missing intrinsics reported, so run and debug may miss some usage cases in dependencies.

pothos · 2023-11-26T18:52:14Z

For making the codegen production-ready I think it's important to fail compilation when an unsupported intrinsic is hit. The replacement with a trap should be opt-in (probably only through a debug env var) for a safe behaviour. Think about not noticing a regression from a crate update in a build/release pipeline because there is just this warning instead of failing compilation hard.

Edit: Also, IDE users might easily miss the warning if the IDE triggers the compilation in the background.

bjorn3 · 2023-11-26T19:27:53Z

Rustc_codegen_cranelift is meant for use during development. For release builds I recommend the default LLVM backend or in the future the GCC backend as both produce much faster executables. In the future I agree that it makes sense to turn this into a hard error by default, but until there are enough simd intrinsics implemented for that to rarely result in a compile error, I did rather keep it as warning + trap at runtime.

bjorn3 · 2024-01-03T19:44:27Z

#1443 by @Nilstrieb implemented a couple of x86 pack intrinsics and fixed a couple other of these intrinsics.

oriongonza · 2025-03-07T13:52:58Z

is there a -Ctarget_cpu that prevents the emission of the unsupported intrinsics?

bjorn3 · 2025-03-07T14:06:30Z

Unsupported intrinsics are already codegened as aborting functions. However many crates at runtime detect if cpu features are available and if so try to use the associated intrinsics. There is no reliable way for cg_clif to prevent the cpu features from being detected as the standard library which implements this may have been compiled with the LLVM backend.

caspark · 2025-04-08T06:28:28Z

When running cranelift-built (most recently using nightly d5b4c2e4f 2025-04-02) debug builds of my game I almost immediately got a crash with:

trap at Instance { def: Item(DefId(1:15726 ~ core[2546]::core_arch::x86::sse::_mm_cvtss_si32)), args: [] } (_ZN4core9core_arch3x863sse14_mm_cvtss_si3217h92700e12c6bbbdd8E): llvm.x86.sse.cvtss2si

Eventually, after a year,¹ I stumbled on this issue's cargo vendor + grep approach, which together with cargo tree finally let me pin it down to my nalgebra (which uses wide, which uses safe_arch) and fontdue dependencies.

Knowing that, I was able to set my non-workspace dependencies to use the llvm backend (i.e. profile.dev.package."*".codegen-backend = "llvm" in Cargo.toml), which worked around the issue. (I could have set just the problematic crates to use llvm, but the extra speed in llvm-compiled code is helpful in a game.)

So, to give some constructive suggestions that might help the next person, here are some things that would have helped me figure this out much much faster:

A listing of all intrinsics that are unsupported, in code or an issue. I googled for llvm.x86.sse.cvtss2si cranelift and similar but got no relevant hits.
Even better, a link to a tracking issue for each (group of?) unsupported intrinsics directly from the error message - or even just to this issue if it's SIMD intrinsics.
Readme instructions for "how to figure out which crate is causing me to hit the unsupported intrinstic" and "how to swap back to the llvm backend for problematic or all non-workspace crates". Or even encouraging folks to still use the llvm backend for non-workspace crates from the get-go.

A year of on-again, off-again experimentation, spending a few hours every few months - partly because I wasn't sure if the issue was expected and would fix itself if I just waited for further work to happen on cranelift (I'm still not sure, to be honest - is the plan to (at least) emulate all such intrinsics eventually?). I also couldn't figure out how to get a backtrace from that error either, which would have been really useful. ↩

bjorn3 · 2025-04-08T10:10:49Z

A listing of all intrinsics that are unsupported, in code or an issue. I googled for llvm.x86.sse.cvtss2si cranelift and similar but got no relevant hits.

There are 5482 LLVM intrinsics that std::arch uses. Only a 125 of these are currently supported by cg_clif. Listing all remaining 5357 unimplemented intrinsics is not really feasible. And the set changes between rustc versions anyway.

Even better, a link to a tracking issue for each (group of?) unsupported intrinsics directly from the error message - or even just to this issue if it's SIMD intrinsics.

Makes sense. Will add a link to this issue.

Readme instructions for "how to figure out which crate is causing me to hit the unsupported intrinstic" and "how to swap back to the llvm backend for problematic or all non-workspace crates". Or even encouraging folks to still use the llvm backend for non-workspace crates from the get-go.

On some targets there are still a couple of abi issues remaining that make it feel a bit iffy to me to suggest.

I also couldn't figure out how to get a backtrace from that error either, which would have been really useful.

You need a debugger for that. This is not a panic, but printing a message using puts from libc followed by an illegal instruction to crash the process. I guess I could make it a non-unwinding panic nowadays. These didn't yet exist back when this error was introduced.

Edit: Done in #1568

(I'm still not sure, to be honest - is the plan to (at least) emulate all such intrinsics eventually?).

#1547 is probably the way forward. Manually emulating all intrinsics is just way too much work.

This will show a backtrace. Also added a reference to #171 in the unimplemented intrinsic error message.

bjorn3 mentioned this issue Dec 25, 2018

rustc: Rewrite platform intrinsics crate rust-lang/rust#57048

Closed

bjorn3 added the optimize-speed The speed of the generated executable label Feb 5, 2019

bjorn3 mentioned this issue Feb 8, 2019

Remove the need for core/std patches #337

Closed

16 tasks

bjorn3 mentioned this issue Jul 30, 2019

Simd emulation #650

Merged

bjorn3 added the A-core-arch Area: Necessary for full core::arch support label Sep 14, 2019

bjorn3 mentioned this issue Dec 23, 2019

Codegen simd intrinsics as simd clif instructions #842

Merged

bjorn3 mentioned this issue Dec 15, 2022

AES encryption unsupported intrinsics warning #1224

Closed

bjorn3 mentioned this issue Nov 7, 2023

Implement a lot of SIMD intrinsics #1417

Merged

bjorn3 mentioned this issue Nov 11, 2023

Implement AES-NI and SHA256 crypto intrinsics using inline asm #1425

Merged

bjorn3 mentioned this issue Nov 24, 2023

Implement another batch of vendor intrinsics #1431

Merged

bjorn3 mentioned this issue Sep 20, 2024

Aarch64 macOS trap on vld1q #1534

Closed

bjorn3 added a commit that referenced this issue Apr 8, 2025

Replace trap_unimplemented calls with codegen_panic_nounwind

6424f0a

This will show a backtrace. Also added a reference to #171 in the unimplemented intrinsic error message.

bjorn3 mentioned this issue Apr 8, 2025

Replace trap_unimplemented calls with codegen_panic_nounwind #1568

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std::arch SIMD intrinsics #171

std::arch SIMD intrinsics #171

gnzlbg commented Nov 20, 2018 •

edited

Loading

bjorn3 commented Nov 21, 2018 •

edited

Loading

gnzlbg commented Nov 21, 2018 •

edited

Loading

bjorn3 commented Nov 21, 2018

bjorn3 commented Nov 21, 2018

gnzlbg commented Nov 21, 2018

eddyb commented Nov 21, 2018

bjorn3 commented Nov 21, 2018

bjorn3 commented Nov 21, 2018

gnzlbg commented Nov 21, 2018 •

edited

Loading

eddyb commented Nov 21, 2018

bjorn3 commented Nov 21, 2018

eddyb commented Nov 21, 2018

bjorn3 commented Nov 21, 2018

bjorn3 commented Jul 27, 2019

gnzlbg commented Jul 27, 2019

bjorn3 commented Jul 27, 2019 •

edited

Loading

abrown commented Jul 29, 2019 •

edited

Loading

bjorn3 commented Jul 29, 2019

bjorn3 commented Jul 30, 2019

bjorn3 commented Jul 24, 2023

bjorn3 commented Oct 14, 2023

bjorn3 commented Oct 24, 2023

bjorn3 commented Nov 7, 2023

benwis commented Nov 8, 2023

bjorn3 commented Nov 8, 2023

AppleSheeple commented Nov 9, 2023

pothos commented Nov 26, 2023 •

edited

Loading

bjorn3 commented Nov 26, 2023

bjorn3 commented Jan 3, 2024

oriongonza commented Mar 7, 2025

bjorn3 commented Mar 7, 2025

caspark commented Apr 8, 2025

bjorn3 commented Apr 8, 2025 •

edited

Loading

std::arch SIMD intrinsics #171

std::arch SIMD intrinsics #171

Comments

gnzlbg commented Nov 20, 2018 • edited Loading

bjorn3 commented Nov 21, 2018 • edited Loading

gnzlbg commented Nov 21, 2018 • edited Loading

bjorn3 commented Nov 21, 2018

bjorn3 commented Nov 21, 2018

gnzlbg commented Nov 21, 2018

eddyb commented Nov 21, 2018

bjorn3 commented Nov 21, 2018

bjorn3 commented Nov 21, 2018

gnzlbg commented Nov 21, 2018 • edited Loading

eddyb commented Nov 21, 2018

bjorn3 commented Nov 21, 2018

eddyb commented Nov 21, 2018

bjorn3 commented Nov 21, 2018

bjorn3 commented Jul 27, 2019

gnzlbg commented Jul 27, 2019

bjorn3 commented Jul 27, 2019 • edited Loading

abrown commented Jul 29, 2019 • edited Loading

bjorn3 commented Jul 29, 2019

bjorn3 commented Jul 30, 2019

bjorn3 commented Jul 24, 2023

bjorn3 commented Oct 14, 2023

bjorn3 commented Oct 24, 2023

bjorn3 commented Nov 7, 2023

benwis commented Nov 8, 2023

bjorn3 commented Nov 8, 2023

AppleSheeple commented Nov 9, 2023

pothos commented Nov 26, 2023 • edited Loading

bjorn3 commented Nov 26, 2023

bjorn3 commented Jan 3, 2024

oriongonza commented Mar 7, 2025

bjorn3 commented Mar 7, 2025

caspark commented Apr 8, 2025

Footnotes

bjorn3 commented Apr 8, 2025 • edited Loading

gnzlbg commented Nov 20, 2018 •

edited

Loading

bjorn3 commented Nov 21, 2018 •

edited

Loading

gnzlbg commented Nov 21, 2018 •

edited

Loading

gnzlbg commented Nov 21, 2018 •

edited

Loading

bjorn3 commented Jul 27, 2019 •

edited

Loading

abrown commented Jul 29, 2019 •

edited

Loading

pothos commented Nov 26, 2023 •

edited

Loading

bjorn3 commented Apr 8, 2025 •

edited

Loading