-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression of array::IntoIter
vs slice::Iter
#115339
Comments
WG-prioritization assigning priority (Zulip discussion). cc @nikic since this regressed in the LLVM 15 update @rustbot label -I-prioritize +P-medium +T-compiler |
I've seen llvm doing these strange 1-byte-at-a-time "vectorizations" in other places too. This might be a more general problem. In this case we can probably paper over it by implementing TrustedRandomAccess for the array iter. |
Unfortunately, the array iterator is fundamentally worse right now, because indexing into it -- a fundamental part of being a by-value array iterator -- keeps it from SRoAing and thus doesn't optimize well. This is why when I was making Sadly, for the foreseeable future you're better off using the "weaker" slice iterator when you can. |
My guess is that the SROA is missing something in the opaque pointer mode. @rustbot claim |
It's not about vectorization. This optimization can be restored by using Upstream issue: llvm/llvm-project#65763. |
The upstream issue is closed. |
optimize zipping over array iterators Fixes rust-lang#115339 (somewhat) the new assembly: ```asm zip_arrays: .cfi_startproc vmovups (%rdx), %ymm0 leaq 32(%rsi), %rcx vxorps %xmm1, %xmm1, %xmm1 vmovups %xmm1, -24(%rsp) movq $0, -8(%rsp) movq %rsi, -88(%rsp) movq %rdi, %rax movq %rcx, -80(%rsp) vmovups %ymm0, -72(%rsp) movq $0, -40(%rsp) movq $32, -32(%rsp) movq -24(%rsp), %rcx vmovups (%rsi,%rcx), %ymm0 vorps -72(%rsp,%rcx), %ymm0, %ymm0 vmovups %ymm0, (%rsi,%rcx) vmovups (%rsi), %ymm0 vmovups %ymm0, (%rdi) vzeroupper retq ``` This is still longer than the slice version given in the issue but at least it eliminates the terrible `vpextrb`/`orb` chain. I guess this is due to excessive memcpys again (haven't looked at the llvmir)? The `TrustedLen` specialization is a drive-by change since I had to do something for the default impl anyway to be able to specialize the `TrustedRandomAccessNoCoerce` impl.
Code
I tried this code:
I expected to see this happen: same codegen or similar in performance
Instead, this happened: array has way worse codegen
godbolt
Assembly
Version it worked on
It most recently worked on: 1.64
Version with regression
1.65 till now.
Regressed in nightly-2022-08-13, maybe LLVM 15 #99464?:
rustc --version --verbose
:Backtrace
Backtrace
@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged
The text was updated successfully, but these errors were encountered: