Skip to content

Add AVX broadcast and conversion intrinsics #32140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 13, 2016
Merged

Conversation

ruuda
Copy link
Contributor

@ruuda ruuda commented Mar 9, 2016

This adds the following intrinsics:

  • _mm256_broadcast_pd
  • _mm256_broadcast_ps
  • _mm256_cvtepi32_pd
  • _mm256_cvtepi32_ps
  • _mm256_cvtpd_epi32
  • _mm256_cvtpd_ps
  • _mm256_cvtps_epi32
  • _mm256_cvtps_pd
  • _mm256_cvttpd_epi32
  • _mm256_cvttps_epi32

The "avx" codegen feature must be enabled to use these.

ruuda added 3 commits March 9, 2016 01:18
This defines `_mm256_broadcast_ps` and `_mm256_broadcast_pd`. The `_ss`
and `_sd` variants are not supported by LLVM. In Clang these intrinsics
are implemented as inline functions in C++.

Intel reference: https://software.intel.com/en-us/node/514144.

Note: the argument type should really be "0hPc" (a pointer to a vector
of half the width), but internally the LLVM intrinsic takes a pointer to
a signed integer, and for any other type LLVM will complain. This means
that a transmute is required to call these intrinsics.

The AVX2 broadcast intrinsics `_mm256_broadcastss_ps` and
`_mm256_broadcastsd_pd` are not available as LLVM intrinsics. In Clang
they are implemented using the shufflevector builtin.
This defines the following intrinsics:

 * `_mm256_cvtepi32_pd`
 * `_mm256_cvtepi32_ps`
 * `_mm256_cvtpd_epi32`
 * `_mm256_cvtpd_ps`
 * `_mm256_cvtps_epi32`
 * `_mm256_cvtps_pd`
 * `_mm256_cvttpd_epi32`
 * `_mm256_cvttps_epi32`

Intel reference: https://software.intel.com/en-us/node/514130.
The exact command used was:

    $ cd src/etc/platform-intrinsics/x86
    $ python2 ../generator.py --format compiler-defs -i info.json   \
      sse.json sse2.json sse3.json ssse3.json sse41.json sse42.json \
      avx.json avx2.json fma.json                                   \
      > ../../../librustc_platform_intrinsics/x86.rs
@ruuda
Copy link
Contributor Author

ruuda commented Mar 9, 2016

I also have a test for these, but I am not sure where to add that. There is tests/run-pass/simd-generic.rs, but that one works even if no SIMD features are enabled. To test these specific intrinsics, the file has to be compiled with -C target-feature=+avx, and running the test requires a CPU with AVX support.

#![feature(repr_simd, platform_intrinsics)]

use std::mem::transmute;

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F64x2(f64, f64); 

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F64x4(f64, f64, f64, f64);

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F32x4(f32, f32, f32, f32);

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct F32x8(f32, f32, f32, f32, f32, f32, f32, f32);

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct I32x4(i32, i32, i32, i32);

#[repr(simd)]
#[derive(Debug, PartialEq)]
struct I32x8(i32, i32, i32, i32, i32, i32, i32, i32);

extern "platform-intrinsic" {
    fn x86_mm256_broadcast_pd(ptr: *const i8) -> F64x4;
    fn x86_mm256_broadcast_ps(ptr: *const i8) -> F32x8;
    fn x86_mm256_cvtepi32_pd(x: I32x4) -> F64x4;
    fn x86_mm256_cvtepi32_ps(x: I32x8) -> F32x8;
    fn x86_mm256_cvtpd_epi32(x: F64x4) -> I32x4;
    fn x86_mm256_cvtpd_ps(x: F64x4) -> F32x4;
    fn x86_mm256_cvtps_epi32(x: F32x8) -> I32x8;
    fn x86_mm256_cvtps_pd(x: F32x4) -> F64x4;
    fn x86_mm256_cvttpd_epi32(x: F64x4) -> I32x4;
    fn x86_mm256_cvttps_epi32(x: F32x8) -> I32x8;
}

fn main() {
    let a = F64x2(1.0, 2.0);
    let b = unsafe { x86_mm256_broadcast_pd(transmute(&a)) };
    let c = F64x4(1.0, 2.0, 1.0, 2.0);
    assert_eq!(b, c);

    let a = F32x4(1.0, 2.0, 3.0, 4.0);
    let b = unsafe { x86_mm256_broadcast_ps(transmute(&a)) };
    let c = F32x8(1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0, 4.0);
    assert_eq!(b, c);

    let a = I32x4(1, -2, 3, -4);
    let b = unsafe { x86_mm256_cvtepi32_pd(a) };
    let c = F64x4(1.0, -2.0, 3.0, -4.0);
    assert_eq!(b, c);

    let a = I32x8(1, -2, 3, -4, 5, -6, 7, -8);
    let b = unsafe { x86_mm256_cvtepi32_ps(a) };
    let c = F32x8(1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0);
    assert_eq!(b, c);

    let a = F64x4(1.0, -2.0, 3.0, -4.0);
    let b = unsafe { x86_mm256_cvtpd_epi32(a) };
    let c = I32x4(1, -2, 3, -4);
    assert_eq!(b, c);

    let a = F64x4(1.0, -2.0, 3.0, -4.0);
    let b = unsafe { x86_mm256_cvtpd_ps(a) };
    let c = F32x4(1.0, -2.0, 3.0, -4.0);
    assert_eq!(b, c);

    let a = F32x8(1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0);
    let b = unsafe { x86_mm256_cvtps_epi32(a) };
    let c = I32x8(1, -2, 3, -4, 5, -6, 7, -8);
    assert_eq!(b, c);

    let a = F32x4(1.0, -2.0, 3.0, -4.0);
    let b = unsafe { x86_mm256_cvtps_pd(a) };
    let c = F64x4(1.0, -2.0, 3.0, -4.0);
    assert_eq!(b, c);

    let a = F64x4(1.6, -2.0, 3.0, -4.0);
    let b = unsafe { x86_mm256_cvttpd_epi32(a) };
    let c = I32x4(1, -2, 3, -4);
    assert_eq!(b, c);

    let a = F32x8(1.6, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0);
    let b = unsafe { x86_mm256_cvttps_epi32(a) };
    let c = I32x8(1, -2, 3, -4, 5, -6, 7, -8);
    assert_eq!(b, c);
}

@alexcrichton
Copy link
Member

@bors: r+ c306853

Thanks!

Yeah we don't have many tests for the other SIMD intrinsics so it's fine that you've tested these locally for now.

@alexcrichton alexcrichton self-assigned this Mar 10, 2016
Manishearth added a commit to Manishearth/rust that referenced this pull request Mar 12, 2016
…hton

Add AVX broadcast and conversion intrinsics

This adds the following intrinsics:

 * `_mm256_broadcast_pd`
 * `_mm256_broadcast_ps`
 * `_mm256_cvtepi32_pd`
 * `_mm256_cvtepi32_ps`
 * `_mm256_cvtpd_epi32`
 * `_mm256_cvtpd_ps`
 * `_mm256_cvtps_epi32`
 * `_mm256_cvtps_pd`
 * `_mm256_cvttpd_epi32`
 * `_mm256_cvttps_epi32`

The "avx" codegen feature must be enabled to use these.
@bors
Copy link
Collaborator

bors commented Mar 12, 2016

⌛ Testing commit c306853 with merge cfcbbf1...

@bors
Copy link
Collaborator

bors commented Mar 12, 2016

💔 Test failed - auto-win-gnu-32-nopt-t

@alexcrichton
Copy link
Member

@bors: retry

On Sat, Mar 12, 2016 at 5:23 AM, bors notifications@github.com wrote:

[image: 💔] Test failed - auto-win-gnu-32-nopt-t
http://buildbot.rust-lang.org/builders/auto-win-gnu-32-nopt-t/builds/3397


Reply to this email directly or view it on GitHub
#32140 (comment).

@bors
Copy link
Collaborator

bors commented Mar 13, 2016

⌛ Testing commit c306853 with merge 531b928...

bors added a commit that referenced this pull request Mar 13, 2016
Add AVX broadcast and conversion intrinsics

This adds the following intrinsics:

 * `_mm256_broadcast_pd`
 * `_mm256_broadcast_ps`
 * `_mm256_cvtepi32_pd`
 * `_mm256_cvtepi32_ps`
 * `_mm256_cvtpd_epi32`
 * `_mm256_cvtpd_ps`
 * `_mm256_cvtps_epi32`
 * `_mm256_cvtps_pd`
 * `_mm256_cvttpd_epi32`
 * `_mm256_cvttps_epi32`

The "avx" codegen feature must be enabled to use these.
@bors bors merged commit c306853 into rust-lang:master Mar 13, 2016
@ruuda ruuda deleted the avx-intrinsics branch November 30, 2016 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants