-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
add simd_splat intrinsic
#151346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add simd_splat intrinsic
#151346
Conversation
6ba73bd to
3eb3d19
Compare
This comment has been minimized.
This comment has been minimized.
|
this would also be useful for @rust-lang/project-portable-simd |
3eb3d19 to
b456f32
Compare
This comment has been minimized.
This comment has been minimized.
b456f32 to
f11e2d7
Compare
|
r? @workingjubilee (or anyone else with intrinsic/simd knowledge) |
|
|
|
Some changes occurred to the platform-builtins intrinsics. Make sure the cc @antoyo, @GuillaumeGomez, @bjorn3, @calebzulawski, @programmerjake Some changes occurred to the CTFE machinery Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 Some changes occurred in compiler/rustc_codegen_gcc Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr Some changes occurred to the CTFE / Miri interpreter cc @rust-lang/miri |
f11e2d7 to
858f251
Compare
9092475 to
21d65c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me on the const_eval and codegen_ssa changes.
This comment has been minimized.
This comment has been minimized.
21d65c1 to
34481f8
Compare
|
or make |
It does, but different CI jobs use different |
|
I think PR CI should at least catch this, having rollups fail because of it is such a waste of time. |
|
Could you file an issue? I think I agree with Jubilee's option of just not applying noopt to codegen tests; those tests are expected to be highly sensitive to optimization flags. However, for some tests it's the opt flags of the standard library build that matter, that could be more tricky to deal with. |
|
@bors r=workingjubilee |
…bilee
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- rust-lang#60637
- rust-lang#137407
- rust-lang#122623
- rust-lang#97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
…bilee
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- rust-lang#60637
- rust-lang#137407
- rust-lang#122623
- rust-lang#97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
…bilee
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- rust-lang#60637
- rust-lang#137407
- rust-lang#122623
- rust-lang#97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
…bilee
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- rust-lang#60637
- rust-lang#137407
- rust-lang#122623
- rust-lang#97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
Rollup of 11 pull requests Successful merges: - #149962 (Promote powerpc64-unknown-linux-musl to tier 2 with host tools) - #150138 (Add new Tier 3 targets for ARMv6) - #150905 (Fix(lib/win/net): Remove hostname support under Win7) - #151094 (remote-test-server: Fix compilation on UEFI targets) - #151346 (add `simd_splat` intrinsic) - #151353 (compiletest: Make `aux-crate` directive explicitly handle `--extern` modifiers) - #151538 (std: `sleep_until` on Motor and VEX) - #151098 (Add Korean translation to Rust By Example) - #151157 (Extend build-manifest local test guide) - #151403 (std: use 64-bit `clock_nanosleep` on GNU/Linux if available) - #151571 (Fix cstring-merging test for Hexagon target)
Rollup merge of #151346 - folkertdev:simd-splat, r=workingjubilee add `simd_splat` intrinsic Add `simd_splat` which lowers to the LLVM canonical splat sequence. ```llvm insertelement <N x elem> poison, elem %x, i32 0 shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer ``` Right now we try to fake it using one of ```rust fn splat(x: u32) -> u32x8 { u32x8::from_array([x; 8]) } ``` or (in `stdarch`) ```rust fn splat(value: $elem_type) -> $name { #[derive(Copy, Clone)] #[repr(simd)] struct JustOne([$elem_type; 1]); let one = JustOne([value]); // SAFETY: 0 is always in-bounds because we're shuffling // a simd type with exactly one element. unsafe { simd_shuffle!(one, one, [0; $len]) } } ``` Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples: - #60637 - #137407 - #122623 - #97804 --- As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends. Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below. Currently this just adds the intrinsic, it does not actually use it anywhere yet.
Rollup of 11 pull requests Successful merges: - rust-lang/rust#149962 (Promote powerpc64-unknown-linux-musl to tier 2 with host tools) - rust-lang/rust#150138 (Add new Tier 3 targets for ARMv6) - rust-lang/rust#150905 (Fix(lib/win/net): Remove hostname support under Win7) - rust-lang/rust#151094 (remote-test-server: Fix compilation on UEFI targets) - rust-lang/rust#151346 (add `simd_splat` intrinsic) - rust-lang/rust#151353 (compiletest: Make `aux-crate` directive explicitly handle `--extern` modifiers) - rust-lang/rust#151538 (std: `sleep_until` on Motor and VEX) - rust-lang/rust#151098 (Add Korean translation to Rust By Example) - rust-lang/rust#151157 (Extend build-manifest local test guide) - rust-lang/rust#151403 (std: use 64-bit `clock_nanosleep` on GNU/Linux if available) - rust-lang/rust#151571 (Fix cstring-merging test for Hexagon target)
Rollup of 11 pull requests Successful merges: - rust-lang/rust#149962 (Promote powerpc64-unknown-linux-musl to tier 2 with host tools) - rust-lang/rust#150138 (Add new Tier 3 targets for ARMv6) - rust-lang/rust#150905 (Fix(lib/win/net): Remove hostname support under Win7) - rust-lang/rust#151094 (remote-test-server: Fix compilation on UEFI targets) - rust-lang/rust#151346 (add `simd_splat` intrinsic) - rust-lang/rust#151353 (compiletest: Make `aux-crate` directive explicitly handle `--extern` modifiers) - rust-lang/rust#151538 (std: `sleep_until` on Motor and VEX) - rust-lang/rust#151098 (Add Korean translation to Rust By Example) - rust-lang/rust#151157 (Extend build-manifest local test guide) - rust-lang/rust#151403 (std: use 64-bit `clock_nanosleep` on GNU/Linux if available) - rust-lang/rust#151571 (Fix cstring-merging test for Hexagon target)
Rollup of 11 pull requests Successful merges: - rust-lang/rust#149962 (Promote powerpc64-unknown-linux-musl to tier 2 with host tools) - rust-lang/rust#150138 (Add new Tier 3 targets for ARMv6) - rust-lang/rust#150905 (Fix(lib/win/net): Remove hostname support under Win7) - rust-lang/rust#151094 (remote-test-server: Fix compilation on UEFI targets) - rust-lang/rust#151346 (add `simd_splat` intrinsic) - rust-lang/rust#151353 (compiletest: Make `aux-crate` directive explicitly handle `--extern` modifiers) - rust-lang/rust#151538 (std: `sleep_until` on Motor and VEX) - rust-lang/rust#151098 (Add Korean translation to Rust By Example) - rust-lang/rust#151157 (Extend build-manifest local test guide) - rust-lang/rust#151403 (std: use 64-bit `clock_nanosleep` on GNU/Linux if available) - rust-lang/rust#151571 (Fix cstring-merging test for Hexagon target)
Add
simd_splatwhich lowers to the LLVM canonical splat sequence.Right now we try to fake it using one of
or (in
stdarch)Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
vdupq_n_*#137407a.simd_eq(Simd::splat(b)).any()deteriorates as width increases #122623As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no
constway of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.