Project status #2

cuihantao · 2022-12-04T17:27:04Z

Hello!

I'm new to rust and found this repository from here: rust-ndarray/ndarray#46. Are the changes in this repository going to make into the compiler? It will save a significant amount of efforts in vectorization.

Thanks

SparrowLii · 2022-12-05T00:54:18Z

Thans for your attention! But I don't think there's anything to do with the ndarray and the compiler. Vectorization in the compiler is oriented towards more general numerical computations, without the need for scenario-specific functionality such as ndarray.

SparrowLii · 2022-12-05T00:57:27Z

About the project status, some previous discussions in Rust's internal forum: https://internals.rust-lang.org/t/mir-optimization-pass-that-implements-auto-vectorization/16360

In general, the community thinks that automatic vectorization should be the work of LLVM and not in rustc.

cuihantao · 2022-12-05T02:23:08Z

Thank you for letting me know!

cuihantao · 2022-12-07T16:58:07Z

I later used Vec from the stdlib instead of ndarray for element-wise multiplication. To my surprise, the compiler vectorizes the code very well. Code as simple as below just works with SIMD.

    pub fn g_update(&mut self) -> &Self{
        for (dest, p1, p2, p3, p4, p5) in izip!(
            &mut self.dest,
            &self.p1,
            &self.p2,
            &self.p3,
            &self.p4,
            &self.p5
        ) {
            *dest = p1 * p2 * p3 * p4 * p5;
        }
        self
    }

The asm reads below

.LBB20_9:  // major loop for packs of 4
 movupd  xmm0, xmmword, ptr, [r8, +, 8*rbx]
 movupd  xmm1, xmmword, ptr, [r8, +, 8*rbx, +, 16]
 movupd  xmm2, xmmword, ptr, [r9, +, 8*rbx]
 mulpd   xmm2, xmm0
 movupd  xmm0, xmmword, ptr, [r9, +, 8*rbx, +, 16]
 mulpd   xmm0, xmm1
 movupd  xmm1, xmmword, ptr, [r10, +, 8*rbx]
 mulpd   xmm1, xmm2
 movupd  xmm2, xmmword, ptr, [r10, +, 8*rbx, +, 16]
 mulpd   xmm2, xmm0
 movupd  xmm0, xmmword, ptr, [rdi, +, 8*rbx]
 mulpd   xmm0, xmm1
 movupd  xmm1, xmmword, ptr, [rdi, +, 8*rbx, +, 16]
 mulpd   xmm1, xmm2
 movupd  xmm2, xmmword, ptr, [rsi, +, 8*rbx]
 mulpd   xmm2, xmm0
 movupd  xmm0, xmmword, ptr, [rsi, +, 8*rbx, +, 16]
 mulpd   xmm0, xmm1
 movupd  xmmword, ptr, [r14, +, 8*rbx], xmm2
 movupd  xmmword, ptr, [r14, +, 8*rbx, +, 16], xmm0
 add     rbx, 4
 cmp     r11, rbx
 jne     .LBB20_9
 cmp     r15, r11
 je      .LBB20_16
.LBB20_11:  // remaining entries
 mov     rcx, r11
 or      rcx, 1
 test    r15b, 1
 je      .LBB20_13
 movsd   xmm0, qword, ptr, [r8, +, 8*r11]
 mulsd   xmm0, qword, ptr, [r9, +, 8*r11]
 mulsd   xmm0, qword, ptr, [r10, +, 8*r11]
 mulsd   xmm0, qword, ptr, [rdi, +, 8*r11]
 mulsd   xmm0, qword, ptr, [rsi, +, 8*r11]
 movsd   qword, ptr, [r14, +, 8*r11], xmm0
 mov     r11, rcx

I haven't gotten ndarray to vectorize other than using Zip or azip!, which is limited to six iterants. The standard Vec just works fine for my purpose. Posting it for reference, but more than likely you are already aware of it.

cuihantao closed this as completed Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project status #2

Project status #2

cuihantao commented Dec 4, 2022

SparrowLii commented Dec 5, 2022

SparrowLii commented Dec 5, 2022 •

edited

Loading

cuihantao commented Dec 5, 2022

cuihantao commented Dec 7, 2022

Project status #2

Project status #2

Comments

cuihantao commented Dec 4, 2022

SparrowLii commented Dec 5, 2022

SparrowLii commented Dec 5, 2022 • edited Loading

cuihantao commented Dec 5, 2022

cuihantao commented Dec 7, 2022

SparrowLii commented Dec 5, 2022 •

edited

Loading