Add AVX broadcast and conversion intrinsics #32140

ruuda · 2016-03-09T00:23:11Z

This adds the following intrinsics:

_mm256_broadcast_pd
_mm256_broadcast_ps
_mm256_cvtepi32_pd
_mm256_cvtepi32_ps
_mm256_cvtpd_epi32
_mm256_cvtpd_ps
_mm256_cvtps_epi32
_mm256_cvtps_pd
_mm256_cvttpd_epi32
_mm256_cvttps_epi32

The "avx" codegen feature must be enabled to use these.

This defines `_mm256_broadcast_ps` and `_mm256_broadcast_pd`. The `_ss` and `_sd` variants are not supported by LLVM. In Clang these intrinsics are implemented as inline functions in C++. Intel reference: https://software.intel.com/en-us/node/514144. Note: the argument type should really be "0hPc" (a pointer to a vector of half the width), but internally the LLVM intrinsic takes a pointer to a signed integer, and for any other type LLVM will complain. This means that a transmute is required to call these intrinsics. The AVX2 broadcast intrinsics `_mm256_broadcastss_ps` and `_mm256_broadcastsd_pd` are not available as LLVM intrinsics. In Clang they are implemented using the shufflevector builtin.

This defines the following intrinsics: * `_mm256_cvtepi32_pd` * `_mm256_cvtepi32_ps` * `_mm256_cvtpd_epi32` * `_mm256_cvtpd_ps` * `_mm256_cvtps_epi32` * `_mm256_cvtps_pd` * `_mm256_cvttpd_epi32` * `_mm256_cvttps_epi32` Intel reference: https://software.intel.com/en-us/node/514130.

The exact command used was: $ cd src/etc/platform-intrinsics/x86 $ python2 ../generator.py --format compiler-defs -i info.json \ sse.json sse2.json sse3.json ssse3.json sse41.json sse42.json \ avx.json avx2.json fma.json \ > ../../../librustc_platform_intrinsics/x86.rs

ruuda · 2016-03-09T00:28:35Z

I also have a test for these, but I am not sure where to add that. There is tests/run-pass/simd-generic.rs, but that one works even if no SIMD features are enabled. To test these specific intrinsics, the file has to be compiled with -C target-feature=+avx, and running the test requires a CPU with AVX support.

#![feature(repr_simd, platform_intrinsics)]

use std::mem::transmute;

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F64x2(f64, f64); 

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F64x4(f64, f64, f64, f64);

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F32x4(f32, f32, f32, f32);

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F32x8(f32, f32, f32, f32, f32, f32, f32, f32);

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct I32x4(i32, i32, i32, i32);

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct I32x8(i32, i32, i32, i32, i32, i32, i32, i32);

extern "platform-intrinsic" {
    fn x86_mm256_broadcast_pd(ptr: *const i8) -> F64x4;
    fn x86_mm256_broadcast_ps(ptr: *const i8) -> F32x8;
    fn x86_mm256_cvtepi32_pd(x: I32x4) -> F64x4;
    fn x86_mm256_cvtepi32_ps(x: I32x8) -> F32x8;
    fn x86_mm256_cvtpd_epi32(x: F64x4) -> I32x4;
    fn x86_mm256_cvtpd_ps(x: F64x4) -> F32x4;
    fn x86_mm256_cvtps_epi32(x: F32x8) -> I32x8;
    fn x86_mm256_cvtps_pd(x: F32x4) -> F64x4;
    fn x86_mm256_cvttpd_epi32(x: F64x4) -> I32x4;
    fn x86_mm256_cvttps_epi32(x: F32x8) -> I32x8;
}

fn main() {
    let a = F64x2(1.0, 2.0);
    let b = unsafe { x86_mm256_broadcast_pd(transmute(&a)) };
    let c = F64x4(1.0, 2.0, 1.0, 2.0);
    assert_eq!(b, c);

    let a = F32x4(1.0, 2.0, 3.0, 4.0);
    let b = unsafe { x86_mm256_broadcast_ps(transmute(&a)) };
    let c = F32x8(1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0);
    assert_eq!(b, c);

    let a = I32x4(1, -2, 3, -4);
    let b = unsafe { x86_mm256_cvtepi32_pd(a) };
    let c = F64x4(1.0, -2.0, 3.0, -4.0);
    assert_eq!(b, c);

    let a = I32x8(1, -2, 3, -4, 5, -6, 7, -8);
    let b = unsafe { x86_mm256_cvtepi32_ps(a) };
    let c = F32x8(1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0);
    assert_eq!(b, c);

    let a = F64x4(1.0, -2.0, 3.0, -4.0);
    let b = unsafe { x86_mm256_cvtpd_epi32(a) };
    let c = I32x4(1, -2, 3, -4);
    assert_eq!(b, c);

    let a = F64x4(1.0, -2.0, 3.0, -4.0);
    let b = unsafe { x86_mm256_cvtpd_ps(a) };
    let c = F32x4(1.0, -2.0, 3.0, -4.0);
    assert_eq!(b, c);

    let a = F32x8(1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0);
    let b = unsafe { x86_mm256_cvtps_epi32(a) };
    let c = I32x8(1, -2, 3, -4, 5, -6, 7, -8);
    assert_eq!(b, c);

    let a = F32x4(1.0, -2.0, 3.0, -4.0);
    let b = unsafe { x86_mm256_cvtps_pd(a) };
    let c = F64x4(1.0, -2.0, 3.0, -4.0);
    assert_eq!(b, c);

    let a = F64x4(1.6, -2.0, 3.0, -4.0);
    let b = unsafe { x86_mm256_cvttpd_epi32(a) };
    let c = I32x4(1, -2, 3, -4);
    assert_eq!(b, c);

    let a = F32x8(1.6, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0);
    let b = unsafe { x86_mm256_cvttps_epi32(a) };
    let c = I32x8(1, -2, 3, -4, 5, -6, 7, -8);
    assert_eq!(b, c);
}

alexcrichton · 2016-03-09T01:44:54Z

@bors: r+ c306853

Thanks!

Yeah we don't have many tests for the other SIMD intrinsics so it's fine that you've tested these locally for now.

…hton Add AVX broadcast and conversion intrinsics This adds the following intrinsics: * `_mm256_broadcast_pd` * `_mm256_broadcast_ps` * `_mm256_cvtepi32_pd` * `_mm256_cvtepi32_ps` * `_mm256_cvtpd_epi32` * `_mm256_cvtpd_ps` * `_mm256_cvtps_epi32` * `_mm256_cvtps_pd` * `_mm256_cvttpd_epi32` * `_mm256_cvttps_epi32` The "avx" codegen feature must be enabled to use these.

bors · 2016-03-12T12:12:29Z

⌛ Testing commit c306853 with merge cfcbbf1...

bors · 2016-03-12T13:23:05Z

💔 Test failed - auto-win-gnu-32-nopt-t

alexcrichton · 2016-03-12T17:57:40Z

@bors: retry

On Sat, Mar 12, 2016 at 5:23 AM, bors notifications@github.com wrote:

[image: 💔] Test failed - auto-win-gnu-32-nopt-t
http://buildbot.rust-lang.org/builders/auto-win-gnu-32-nopt-t/builds/3397

—
Reply to this email directly or view it on GitHub
#32140 (comment).

bors · 2016-03-13T00:18:35Z

⌛ Testing commit c306853 with merge 531b928...

Add AVX broadcast and conversion intrinsics This adds the following intrinsics: * `_mm256_broadcast_pd` * `_mm256_broadcast_ps` * `_mm256_cvtepi32_pd` * `_mm256_cvtepi32_ps` * `_mm256_cvtpd_epi32` * `_mm256_cvtpd_ps` * `_mm256_cvtps_epi32` * `_mm256_cvtps_pd` * `_mm256_cvttpd_epi32` * `_mm256_cvttps_epi32` The "avx" codegen feature must be enabled to use these.

bors · 2016-03-13T02:58:13Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-debug-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-opt-rustbuild, auto-linux-64-x-android-t, auto-linux-cross-opt, auto-linux-musl-64-opt, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-mac-ios-opt, auto-win-gnu-32-nopt-t, auto-win-gnu-32-opt, auto-win-gnu-64-nopt-t, auto-win-gnu-64-opt, auto-win-msvc-32-opt, auto-win-msvc-64-opt

ruuda added 3 commits March 9, 2016 01:18

alexcrichton self-assigned this Mar 10, 2016

Manishearth mentioned this pull request Mar 12, 2016

Rollup of 8 pull requests #32209

Closed

bors merged commit c306853 into rust-lang:master Mar 13, 2016

ruuda deleted the avx-intrinsics branch November 30, 2016 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AVX broadcast and conversion intrinsics #32140

Add AVX broadcast and conversion intrinsics #32140

ruuda commented Mar 9, 2016

ruuda commented Mar 9, 2016

alexcrichton commented Mar 9, 2016

bors commented Mar 12, 2016

bors commented Mar 12, 2016

alexcrichton commented Mar 12, 2016

bors commented Mar 13, 2016

bors commented Mar 13, 2016

Add AVX broadcast and conversion intrinsics #32140

Add AVX broadcast and conversion intrinsics #32140

Conversation

ruuda commented Mar 9, 2016

ruuda commented Mar 9, 2016

alexcrichton commented Mar 9, 2016

bors commented Mar 12, 2016

bors commented Mar 12, 2016

alexcrichton commented Mar 12, 2016

bors commented Mar 13, 2016

bors commented Mar 13, 2016