-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add AVX broadcast and conversion intrinsics #32140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This defines `_mm256_broadcast_ps` and `_mm256_broadcast_pd`. The `_ss` and `_sd` variants are not supported by LLVM. In Clang these intrinsics are implemented as inline functions in C++. Intel reference: https://software.intel.com/en-us/node/514144. Note: the argument type should really be "0hPc" (a pointer to a vector of half the width), but internally the LLVM intrinsic takes a pointer to a signed integer, and for any other type LLVM will complain. This means that a transmute is required to call these intrinsics. The AVX2 broadcast intrinsics `_mm256_broadcastss_ps` and `_mm256_broadcastsd_pd` are not available as LLVM intrinsics. In Clang they are implemented using the shufflevector builtin.
This defines the following intrinsics: * `_mm256_cvtepi32_pd` * `_mm256_cvtepi32_ps` * `_mm256_cvtpd_epi32` * `_mm256_cvtpd_ps` * `_mm256_cvtps_epi32` * `_mm256_cvtps_pd` * `_mm256_cvttpd_epi32` * `_mm256_cvttps_epi32` Intel reference: https://software.intel.com/en-us/node/514130.
The exact command used was: $ cd src/etc/platform-intrinsics/x86 $ python2 ../generator.py --format compiler-defs -i info.json \ sse.json sse2.json sse3.json ssse3.json sse41.json sse42.json \ avx.json avx2.json fma.json \ > ../../../librustc_platform_intrinsics/x86.rs
I also have a test for these, but I am not sure where to add that. There is #![feature(repr_simd, platform_intrinsics)]
use std::mem::transmute;
#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F64x2(f64, f64);
#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F64x4(f64, f64, f64, f64);
#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F32x4(f32, f32, f32, f32);
#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F32x8(f32, f32, f32, f32, f32, f32, f32, f32);
#[repr(simd)]
#[derive(Debug, PartialEq)]
struct I32x4(i32, i32, i32, i32);
#[repr(simd)]
#[derive(Debug, PartialEq)]
struct I32x8(i32, i32, i32, i32, i32, i32, i32, i32);
extern "platform-intrinsic" {
fn x86_mm256_broadcast_pd(ptr: *const i8) -> F64x4;
fn x86_mm256_broadcast_ps(ptr: *const i8) -> F32x8;
fn x86_mm256_cvtepi32_pd(x: I32x4) -> F64x4;
fn x86_mm256_cvtepi32_ps(x: I32x8) -> F32x8;
fn x86_mm256_cvtpd_epi32(x: F64x4) -> I32x4;
fn x86_mm256_cvtpd_ps(x: F64x4) -> F32x4;
fn x86_mm256_cvtps_epi32(x: F32x8) -> I32x8;
fn x86_mm256_cvtps_pd(x: F32x4) -> F64x4;
fn x86_mm256_cvttpd_epi32(x: F64x4) -> I32x4;
fn x86_mm256_cvttps_epi32(x: F32x8) -> I32x8;
}
fn main() {
let a = F64x2(1.0, 2.0);
let b = unsafe { x86_mm256_broadcast_pd(transmute(&a)) };
let c = F64x4(1.0, 2.0, 1.0, 2.0);
assert_eq!(b, c);
let a = F32x4(1.0, 2.0, 3.0, 4.0);
let b = unsafe { x86_mm256_broadcast_ps(transmute(&a)) };
let c = F32x8(1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0);
assert_eq!(b, c);
let a = I32x4(1, -2, 3, -4);
let b = unsafe { x86_mm256_cvtepi32_pd(a) };
let c = F64x4(1.0, -2.0, 3.0, -4.0);
assert_eq!(b, c);
let a = I32x8(1, -2, 3, -4, 5, -6, 7, -8);
let b = unsafe { x86_mm256_cvtepi32_ps(a) };
let c = F32x8(1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0);
assert_eq!(b, c);
let a = F64x4(1.0, -2.0, 3.0, -4.0);
let b = unsafe { x86_mm256_cvtpd_epi32(a) };
let c = I32x4(1, -2, 3, -4);
assert_eq!(b, c);
let a = F64x4(1.0, -2.0, 3.0, -4.0);
let b = unsafe { x86_mm256_cvtpd_ps(a) };
let c = F32x4(1.0, -2.0, 3.0, -4.0);
assert_eq!(b, c);
let a = F32x8(1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0);
let b = unsafe { x86_mm256_cvtps_epi32(a) };
let c = I32x8(1, -2, 3, -4, 5, -6, 7, -8);
assert_eq!(b, c);
let a = F32x4(1.0, -2.0, 3.0, -4.0);
let b = unsafe { x86_mm256_cvtps_pd(a) };
let c = F64x4(1.0, -2.0, 3.0, -4.0);
assert_eq!(b, c);
let a = F64x4(1.6, -2.0, 3.0, -4.0);
let b = unsafe { x86_mm256_cvttpd_epi32(a) };
let c = I32x4(1, -2, 3, -4);
assert_eq!(b, c);
let a = F32x8(1.6, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0);
let b = unsafe { x86_mm256_cvttps_epi32(a) };
let c = I32x8(1, -2, 3, -4, 5, -6, 7, -8);
assert_eq!(b, c);
} |
…hton Add AVX broadcast and conversion intrinsics This adds the following intrinsics: * `_mm256_broadcast_pd` * `_mm256_broadcast_ps` * `_mm256_cvtepi32_pd` * `_mm256_cvtepi32_ps` * `_mm256_cvtpd_epi32` * `_mm256_cvtpd_ps` * `_mm256_cvtps_epi32` * `_mm256_cvtps_pd` * `_mm256_cvttpd_epi32` * `_mm256_cvttps_epi32` The "avx" codegen feature must be enabled to use these.
⌛ Testing commit c306853 with merge cfcbbf1... |
💔 Test failed - auto-win-gnu-32-nopt-t |
@bors: retry On Sat, Mar 12, 2016 at 5:23 AM, bors notifications@github.com wrote:
|
Add AVX broadcast and conversion intrinsics This adds the following intrinsics: * `_mm256_broadcast_pd` * `_mm256_broadcast_ps` * `_mm256_cvtepi32_pd` * `_mm256_cvtepi32_ps` * `_mm256_cvtpd_epi32` * `_mm256_cvtpd_ps` * `_mm256_cvtps_epi32` * `_mm256_cvtps_pd` * `_mm256_cvttpd_epi32` * `_mm256_cvttps_epi32` The "avx" codegen feature must be enabled to use these.
This adds the following intrinsics:
_mm256_broadcast_pd
_mm256_broadcast_ps
_mm256_cvtepi32_pd
_mm256_cvtepi32_ps
_mm256_cvtpd_epi32
_mm256_cvtpd_ps
_mm256_cvtps_epi32
_mm256_cvtps_pd
_mm256_cvttpd_epi32
_mm256_cvttps_epi32
The "avx" codegen feature must be enabled to use these.