-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX2 related regression introduced by rust 1.56.0 and onwards #91839
Comments
I can confirm that the test failure can be tracked down to #87254, though I currently have no idea why that change would cause an issue here. A reduced testcase would help with the investigation. As an aside, functions or trait methods will only be inlined if the caller is compiled with the same (or, unreliably, a superset of the) target features as the callee. E.g. after the change introduced in cloudflare/sliceslice-rs@820a28f calls from |
Thank you for looking into this @hkratz !
Here is a reduced testcase, even though it's still quite big#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
trait Vector: Copy + std::fmt::Debug {
const LANES: usize;
unsafe fn set1_epi8(a: i8) -> Self;
unsafe fn loadu_si(a: *const u8) -> Self;
unsafe fn cmpeq_epi8(a: Self, b: Self) -> Self;
unsafe fn and_si(a: Self, b: Self) -> Self;
unsafe fn movemask_epi8(a: Self) -> i32;
}
impl Vector for __m256i {
const LANES: usize = 32;
#[inline]
#[target_feature(enable = "avx2")]
unsafe fn set1_epi8(a: i8) -> Self {
_mm256_set1_epi8(a)
}
#[inline]
#[target_feature(enable = "avx2")]
unsafe fn loadu_si(a: *const u8) -> Self {
_mm256_loadu_si256(a as *const Self)
}
#[inline]
#[target_feature(enable = "avx2")]
unsafe fn cmpeq_epi8(a: Self, b: Self) -> Self {
_mm256_cmpeq_epi8(a, b)
}
#[inline]
#[target_feature(enable = "avx2")]
unsafe fn and_si(a: Self, b: Self) -> Self {
_mm256_and_si256(a, b)
}
#[inline]
#[target_feature(enable = "avx2")]
unsafe fn movemask_epi8(a: Self) -> i32 {
_mm256_movemask_epi8(a)
}
}
struct VectorHash<V: Vector> {
first: V,
last: V,
}
impl<V: Vector> VectorHash<V> {
unsafe fn new(first: u8, last: u8) -> Self {
Self {
first: Vector::set1_epi8(first as i8),
last: Vector::set1_epi8(last as i8),
}
}
}
pub struct Avx2Searcher {
position: usize,
avx2_hash: VectorHash<__m256i>,
needle_len: usize,
}
impl Avx2Searcher {
#[target_feature(enable = "avx2")]
pub unsafe fn with_position(needle: &[u8], position: usize) -> Self {
// Implicitly checks that the needle is not empty because position is an
// unsized integer.
assert!(position < needle.len());
let avx2_hash = VectorHash::new(needle[0], needle[position]);
Self {
position,
avx2_hash,
needle_len: needle.len(),
}
}
/// Performs a substring search for the `needle` within `haystack`.
#[target_feature(enable = "avx2")]
pub unsafe fn search_in(&self, haystack: &[u8]) -> bool {
println!("[search_in] haystack({})={:?}", haystack.len(), haystack);
let end = haystack.len() - self.needle_len + 1;
self.vector_search_in(haystack, end, &self.avx2_hash)
}
}
trait Searcher {
fn position(&self) -> usize;
#[inline]
unsafe fn vector_search_in_chunk<V: Vector>(
&self,
hash: &VectorHash<V>,
start: *const u8,
mask: i32,
) -> bool {
let first = Vector::loadu_si(start);
let last = Vector::loadu_si(start.add(self.position()));
// Uncommenting the following lines makes it work
// println!("[vector_search_in_chunk] hash.first={:?}", hash.first);
println!("[vector_search_in_chunk] first={:?}", first);
let eq_first = Vector::cmpeq_epi8(hash.first, first);
println!(
"[vector_search_in_chunk] eq_first={:?}",
eq_first
);
let eq_last = Vector::cmpeq_epi8(hash.last, last);
println!(
"[vector_search_in_chunk] eq_last ={:?}",
eq_last
);
let eq = Vector::and_si(eq_first, eq_last);
println!(
"[vector_search_in_chunk] eq ={:?}",
eq
);
let eq = (Vector::movemask_epi8(eq) & mask) as u32;
println!(
"[vector_search_in_chunk] eq ={:?}",
eq
);
eq != 0
}
#[inline]
// Uncommenting the following lines makes it work
// #[target_feature(enable = "avx2")]
unsafe fn vector_search_in<V: Vector>(
&self,
haystack: &[u8],
end: usize,
hash: &VectorHash<V>,
) -> bool {
println!(
"[vector_search_in] haystack({})={:?}, end={}",
haystack.len(),
haystack,
end
);
// debug_assert!(haystack.len() >= self.needle().size());
let mut chunks = haystack[..end].chunks_exact(V::LANES);
for chunk in &mut chunks {
println!("[vector_search_in] chunk({})={:?}", chunk.len(), chunk);
if self.vector_search_in_chunk(hash, chunk.as_ptr(), -1) {
return true;
}
}
let remainder = chunks.remainder().len();
println!("[vector_search_in] remainder: {}", remainder);
if remainder > 0 {
let start = haystack.as_ptr().add(end - V::LANES);
let mask = -1 << (V::LANES - remainder);
if self.vector_search_in_chunk(hash, start, mask) {
return true;
}
}
false
}
}
impl Searcher for Avx2Searcher {
#[inline(always)]
fn position(&self) -> usize {
self.position
}
}
pub fn main() {
let needle = b"consectetur";
let haystack = b"Lorem ipsum dolor sit amet, consectetur adipiscing elit";
let searcher = unsafe { Avx2Searcher::with_position(needle, 0) };
assert!(unsafe { searcher.search_in(haystack) });
} Compiled in debug mode with rust 1.57.0:
In the working case, Compiled with opt-level=3 with rust 1.57.0:
Here, for whatever reason, Not reproducible with opt-level=1 or opt-level=2
That's kind of what I was experimenting with. Is there a technical reason that forbids inlining here? The trait is only called through methods marked with This a bit of a distraction compared to what this issue is about but what are my options here? Use macro_rules instead? |
Depending on the level optimization, it seems |
Thanks for the testcase. Will need some quiet time to look into this.
This is a restriction of both Rust (see RFC 2045) and LLVM (see e.g. the implementation for X86). Essentially the backend optimizes based on the assumption that functions are the smallest cohesive units for feature selection. What you would probably want here is for the compiler to prove that a monomorphised method can only be called from AVX2-enabled contexts and thus promote it to AVX2 as well. We don't have that (yet).
Either use macros like I did in simdutf8 or use something like @calebzulawski's multiversion crate. |
FWIW miscompilations in this area can be related to target_features mismatches due to an LLVM bug, see #79865. I haven't checked whether this is the case here or not. |
Minimized: #[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
pub struct TwoVecs(__m256i, __m256i);
#[target_feature(enable = "avx2")]
pub unsafe fn avx2_any_upper_bit_set(vecs: &TwoVecs) -> bool {
inlined_non_avx2(&vecs)
}
#[inline]
unsafe fn inlined_non_avx2(
vecs: &TwoVecs
) -> bool {
non_inlined_non_avx2(vecs)
}
#[inline(never)]
unsafe fn non_inlined_non_avx2(
vecs: &TwoVecs,
) -> bool {
let eq = _mm256_or_si256(vecs.0, vecs.1);
_mm256_movemask_epi8(eq) != 0
}
pub fn main() {
unsafe {
let vecs = TwoVecs(_mm256_set1_epi8(0x00_u8 as i8), _mm256_set1_epi8(0x00_u8 as i8));
assert!(!avx2_any_upper_bit_set(&vecs));
}
} This asserts in release but not debug. Essentially it is an ABI mismatch between #87254 exposes this problem by removing zero-sized padding fields, which previously prevented the (broken) LLVM optimization of putting the struct values into |
Wow that was fast! Thank you so much @hkratz ! 👍
This kind of issue is really outside of my area of expertise. What's the process to create an llvm bug report? I must admit that I don't feel comfortable doing it myself. |
Okay, that does look like the same issue as #79865. I'll file a bug with LLVM. |
Opened llvm/llvm-project#52660. |
Thank you very much @nikic ! In the mean time, would it be possible (or even desirable) to detect such ABI mismatch cases on the rust compiler side and issue an error? |
Assigning priority as discussed in the Zulip thread of the Prioritization Working Group. @rustbot label -I-prioritize +P-medium |
Closing this as a duplicate of #79865, as they have the same root cause. |
Code
I tried this code:
I expected to see this happen: tests pass with success
Instead, this happened: some tests are failing:
Version it worked on
It most recently worked on: Rust 1.55.0
Version with regression
rustc +1.56.0 --version --verbose
:and onwards (1.56.1, 1.57.0, beta and nightly)
How to reproduce
Just run the following command with the previously mentioned commit checked out:
What does commit cloudflare/sliceslice-rs@a7d4556 do
It just moves method
vector_search_in
from theAvx2Searcher
struct to theSearcher
trait. Loosing the#[target_feature(enable = "avx2")]
attribute at the same time.I found two (unsatisfactory) way to make the tests pass:
But that won't work ultimately because the trait is going to be implemented for other architectures.
hash.first
to the standard output: cloudflare/sliceslice-rs@a7d4556#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759R212Absolutely no idea why printing something would have any influence!
Everything works properly for rust versions <= 1.55.0
Bisection attempt
I have tried to use
cargo-bisect-rustc
to bisect and it seems introduced between nightly-2021-08-10 and nightly-2021-08-12 (there are no nightly for 2021-08-11).Further (manual) bisection points to #87254 but I am not sure my bisection is actually correct so any help is welcome 👍
The text was updated successfully, but these errors were encountered: