Skip to content

Very bad u16x8::splat codegen on x86_64 #283

@thomcc

Description

@thomcc

This is the issue version of this Zulip thread https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Very.20bad.20.60u16x8.3A.3Asplat.60.20codegen.20on.20x86_64. (It's plausible we'll want to link to this in a comment in the code for however we solve it)

I have a case where on x86_64 core::simd::u16x8::splat(v) compiles to a genuinely huge mess of instructions, compared to core::arch::x86_64::_mm_set1_epi16 which does the same thing in nothing (well, in the normal ~2 instruction sequence).

In https://godbolt.org/z/ccWKT7q19 I have the example (It should have come comments explaining the situation and what the code is doing, in case that's not clear). Take a look at sloppy_memset_pat16_core_arch (good but nonportable) vs sloppy_memset_pat16_portable (shockingly bad).

FWIW, the other core::arch::$vector::splat seem to be completely fine (these versions are also in the file further down as sloppy_memset_pat$N_portable). It's just making a u16x8 (or i16x8 -- both are the same here) that is a real issue for portable simd for some reason.

Is this a known issue? Or something someone has a hunch about? Does anybody have insight into what's up here?

As @programmerjake points out, It seems likely that the loop we get from splat is confusing LLVM somewhat. As such, we have a few options:

  1. Defining splat in terms of a new splat_lane (Add splat_lane #282).
  2. Adding a new simd_splat intrinsic.
  3. ???

I kind of am weakly in favor of the 2nd, since I think V::splat will be very common, and special casing it isn't that bad (it also should improve compile times of SIMD code, although perhaps there won't be enough SIMD code for this to be really a metric worth sweating)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions