Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate SIMD construction as insertelements and a single store. #18615

Merged
merged 1 commit into from
Nov 7, 2014

Conversation

huonw
Copy link
Member

@huonw huonw commented Nov 4, 2014

This almost completely avoids GEPi's and pointer manipulation,
postponing it until the end with one big write of the whole vector. This
leads to a tiny speed-up in compilation, and makes it easier for LLVM
to work with the values, e.g. with --opt-level=0,

pub fn foo() -> f32x4 {
    f32x4(0.,0.,0.,0.)
}

was previously compiled to

define <4 x float> @_ZN3foo20h74913e8b13d89666eaaE() unnamed_addr #0 {
entry-block:
  %sret_slot = alloca <4 x float>
  %0 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 0
  store float 0.000000e+00, float* %0
  %1 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 1
  store float 0.000000e+00, float* %1
  %2 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 2
  store float 0.000000e+00, float* %2
  %3 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 3
  store float 0.000000e+00, float* %3
  %4 = load <4 x float>* %sret_slot
  ret <4 x float> %4
}

but now becomes

define <4 x float> @_ZN3foo20h74913e8b13d89666eaaE() unnamed_addr #0 {
entry-block:
  ret <4 x float> zeroinitializer
}

(And similarly non-zero constants become a literal vector expression.)

This almost completely avoids GEPi's and pointer manipulation,
postponing it until the end with one big write of the whole vector. This
leads to a small speed-up in compilation, and makes it easier for LLVM
to work with the values, e.g. with `--opt-level=0`,

    pub fn foo() -> f32x4 {
        f32x4(0.,0.,0.,0.)
    }

was previously compiled to

    define <4 x float> @_ZN3foo20h74913e8b13d89666eaaE() unnamed_addr #0 {
    entry-block:
      %sret_slot = alloca <4 x float>
      %0 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 0
      store float 0.000000e+00, float* %0
      %1 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 1
      store float 0.000000e+00, float* %1
      %2 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 2
      store float 0.000000e+00, float* %2
      %3 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 3
      store float 0.000000e+00, float* %3
      %4 = load <4 x float>* %sret_slot
      ret <4 x float> %4
    }

but now becomes

    define <4 x float> @_ZN3foo20h74913e8b13d89666eaaE() unnamed_addr #0 {
    entry-block:
      ret <4 x float> zeroinitializer
    }
@huonw
Copy link
Member Author

huonw commented Nov 4, 2014

cc #18147, #18148

(As the example in the PR shows, this makes it easier for LLVM to do those, but doesn't actually go all the way.)

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Nov 6, 2014
@bors bors merged commit 071c411 into rust-lang:master Nov 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants