Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Translate SIMD construction as
insertelement
s and a single store.
This almost completely avoids GEPi's and pointer manipulation, postponing it until the end with one big write of the whole vector. This leads to a small speed-up in compilation, and makes it easier for LLVM to work with the values, e.g. with `--opt-level=0`, pub fn foo() -> f32x4 { f32x4(0.,0.,0.,0.) } was previously compiled to define <4 x float> @_ZN3foo20h74913e8b13d89666eaaE() unnamed_addr #0 { entry-block: %sret_slot = alloca <4 x float> %0 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 0 store float 0.000000e+00, float* %0 %1 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 1 store float 0.000000e+00, float* %1 %2 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 2 store float 0.000000e+00, float* %2 %3 = getelementptr inbounds <4 x float>* %sret_slot, i32 0, i32 3 store float 0.000000e+00, float* %3 %4 = load <4 x float>* %sret_slot ret <4 x float> %4 } but now becomes define <4 x float> @_ZN3foo20h74913e8b13d89666eaaE() unnamed_addr #0 { entry-block: ret <4 x float> zeroinitializer }
- Loading branch information