Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project status #2

Closed
cuihantao opened this issue Dec 4, 2022 · 4 comments
Closed

Project status #2

cuihantao opened this issue Dec 4, 2022 · 4 comments

Comments

@cuihantao
Copy link

Hello!

I'm new to rust and found this repository from here: rust-ndarray/ndarray#46. Are the changes in this repository going to make into the compiler? It will save a significant amount of efforts in vectorization.

Thanks

@SparrowLii
Copy link
Owner

Thans for your attention! But I don't think there's anything to do with the ndarray and the compiler. Vectorization in the compiler is oriented towards more general numerical computations, without the need for scenario-specific functionality such as ndarray.

@SparrowLii
Copy link
Owner

SparrowLii commented Dec 5, 2022

About the project status, some previous discussions in Rust's internal forum: https://internals.rust-lang.org/t/mir-optimization-pass-that-implements-auto-vectorization/16360

In general, the community thinks that automatic vectorization should be the work of LLVM and not in rustc.

@cuihantao
Copy link
Author

Thank you for letting me know!

@cuihantao
Copy link
Author

I later used Vec from the stdlib instead of ndarray for element-wise multiplication. To my surprise, the compiler vectorizes the code very well. Code as simple as below just works with SIMD.

    pub fn g_update(&mut self) -> &Self{
        for (dest, p1, p2, p3, p4, p5) in izip!(
            &mut self.dest,
            &self.p1,
            &self.p2,
            &self.p3,
            &self.p4,
            &self.p5
        ) {
            *dest = p1 * p2 * p3 * p4 * p5;
        }
        self
    }

The asm reads below

.LBB20_9:  // major loop for packs of 4
 movupd  xmm0, xmmword, ptr, [r8, +, 8*rbx]
 movupd  xmm1, xmmword, ptr, [r8, +, 8*rbx, +, 16]
 movupd  xmm2, xmmword, ptr, [r9, +, 8*rbx]
 mulpd   xmm2, xmm0
 movupd  xmm0, xmmword, ptr, [r9, +, 8*rbx, +, 16]
 mulpd   xmm0, xmm1
 movupd  xmm1, xmmword, ptr, [r10, +, 8*rbx]
 mulpd   xmm1, xmm2
 movupd  xmm2, xmmword, ptr, [r10, +, 8*rbx, +, 16]
 mulpd   xmm2, xmm0
 movupd  xmm0, xmmword, ptr, [rdi, +, 8*rbx]
 mulpd   xmm0, xmm1
 movupd  xmm1, xmmword, ptr, [rdi, +, 8*rbx, +, 16]
 mulpd   xmm1, xmm2
 movupd  xmm2, xmmword, ptr, [rsi, +, 8*rbx]
 mulpd   xmm2, xmm0
 movupd  xmm0, xmmword, ptr, [rsi, +, 8*rbx, +, 16]
 mulpd   xmm0, xmm1
 movupd  xmmword, ptr, [r14, +, 8*rbx], xmm2
 movupd  xmmword, ptr, [r14, +, 8*rbx, +, 16], xmm0
 add     rbx, 4
 cmp     r11, rbx
 jne     .LBB20_9
 cmp     r15, r11
 je      .LBB20_16
.LBB20_11:  // remaining entries
 mov     rcx, r11
 or      rcx, 1
 test    r15b, 1
 je      .LBB20_13
 movsd   xmm0, qword, ptr, [r8, +, 8*r11]
 mulsd   xmm0, qword, ptr, [r9, +, 8*r11]
 mulsd   xmm0, qword, ptr, [r10, +, 8*r11]
 mulsd   xmm0, qword, ptr, [rdi, +, 8*r11]
 mulsd   xmm0, qword, ptr, [rsi, +, 8*r11]
 movsd   qword, ptr, [r14, +, 8*r11], xmm0
 mov     r11, rcx

I haven't gotten ndarray to vectorize other than using Zip or azip!, which is limited to six iterants. The standard Vec just works fine for my purpose. Posting it for reference, but more than likely you are already aware of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants