Skip to content

Commit

Permalink
teddy: replace _mm_extract_epi64 with transmute
Browse files Browse the repository at this point in the history
It turns out that _mm_extract_epi64 requires SSE 4.1. While it would be
fine to just require that (virtually all CPUs have it available), the
rest of Teddy only requires SSSE3. I don't love bumping the mininum
required just to get the lanes out of a vector. So just replace it with
a transmute.

The AVX2 variant isn't impacted by this since AVX2 came with
_mm256_extract_epi64.

Kudos to llogiq/bytecount#85 for making me
check this.
  • Loading branch information
BurntSushi committed Sep 20, 2023
1 parent 3ed2c38 commit 3c811c2
Showing 1 changed file with 7 additions and 5 deletions.
12 changes: 7 additions & 5 deletions src/packed/vector.rs
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ pub(crate) trait FatVector: Vector {
mod x86_64_ssse3 {
use core::arch::x86_64::*;

use crate::util::int::{I32, I64, I8};
use crate::util::int::{I32, I8};

use super::Vector;

Expand Down Expand Up @@ -394,12 +394,14 @@ mod x86_64_ssse3 {
self,
mut f: impl FnMut(usize, u64) -> Option<T>,
) -> Option<T> {
let lane = _mm_extract_epi64(self, 0).to_bits();
if let Some(t) = f(0, lane) {
// We could just use _mm_extract_epi64 here, but that requires
// SSE 4.1. It isn't necessarily a problem to just require SSE 4.1,
// but everything else works with SSSE3 so we stick to that subset.
let lanes: [u64; 2] = core::mem::transmute(self);
if let Some(t) = f(0, lanes[0]) {
return Some(t);
}
let lane = _mm_extract_epi64(self, 1).to_bits();
if let Some(t) = f(1, lane) {
if let Some(t) = f(1, lanes[1]) {
return Some(t);
}
None
Expand Down

0 comments on commit 3c811c2

Please sign in to comment.