-
Notifications
You must be signed in to change notification settings - Fork 99
Description
This is the issue version of this Zulip thread https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Very.20bad.20.60u16x8.3A.3Asplat.60.20codegen.20on.20x86_64. (It's plausible we'll want to link to this in a comment in the code for however we solve it)
I have a case where on x86_64
core::simd::u16x8::splat(v)compiles to a genuinely huge mess of instructions, compared tocore::arch::x86_64::_mm_set1_epi16which does the same thing in nothing (well, in the normal ~2 instruction sequence).In https://godbolt.org/z/ccWKT7q19 I have the example (It should have come comments explaining the situation and what the code is doing, in case that's not clear). Take a look at
sloppy_memset_pat16_core_arch(good but nonportable) vssloppy_memset_pat16_portable(shockingly bad).FWIW, the other
core::arch::$vector::splatseem to be completely fine (these versions are also in the file further down assloppy_memset_pat$N_portable). It's just making a u16x8 (or i16x8 -- both are the same here) that is a real issue for portable simd for some reason.Is this a known issue? Or something someone has a hunch about? Does anybody have insight into what's up here?
As @programmerjake points out, It seems likely that the loop we get from splat is confusing LLVM somewhat. As such, we have a few options:
- Defining
splatin terms of a newsplat_lane(Add splat_lane #282). - Adding a new
simd_splatintrinsic. - ???
I kind of am weakly in favor of the 2nd, since I think V::splat will be very common, and special casing it isn't that bad (it also should improve compile times of SIMD code, although perhaps there won't be enough SIMD code for this to be really a metric worth sweating)