-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newtype impedes vectorization #24963
Comments
That second one should be |
I agree, that's what I was intending @Stebalien, but it shouldn't affect codegen either. When I tried it doesn't affect benchmarks. Vectorization is cool btw, compiling with corei7-avx decreases the plain u32 fold's runtime even more, increasing the benchmark difference. :-) |
Updated code & bench. |
Ah goddammit, I actually wrote a patch that unwrapped newtypes, at least for simple cases (which includes this one), but abandoned it because I couldn't see a noticable difference in optimisation/performance. |
Not vectorizing isn't actually correct. It just does it less efficiently. That's some puzzle to try to understand. |
cc @dotdash if you have time & interest |
Triage: Still an issue with |
I feel like this is fixed judging by the bench results below. Please reopen if that's not the case, preferably with a summary of what we should be looking for to close this issue. Without
and with:
|
Nice! I can confirm that too. According to playpen right now, not fixed in stable not fixed in But it is fixed in |
In the following example, the newtype is not zero-cost in practice since it seems to impede optimizations in llvm. The plain u32 sum vectorizes while Foo(u32) does not. The newtype fold needs 3 times the runtime of the plain u32 fold.
rustc version: rustc 1.1.0-nightly (97d4e76 2015-04-27) (built 2015-04-28)
code (playpen link)
(The code has been updated)
bench results vary with compilation setting
The text was updated successfully, but these errors were encountered: