-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hungry for more SIMD vectorization #7687
Comments
Hah. I turns out the lack of vectorization in @ArchRobison, should I leave this open for the Float64 issue, or just close? |
Let's leave it open. I tried a similar C++ example with the Intel compiler and it vectorized the code, so presumably in principle Julia could vectorize it too. I'll try it with LLVM trunk to see if it does any better. |
The LLVM trunk vectorizer rejects the code because Julia codegen is generating volatile loads/stores in the loop. Here is a trivial example:
The problem is in
This approach for rounding was introduced to fix issue #41. I suspect there is a cleaner way to enforce the rounding in LLVM. I'll go see what Clang does in similar circumstances. For example, C99 has a mode where extra precision is not allowed. |
What about we emit more efficient codes when the source codes are tagged with |
I'd like to fix the problem in general since it impacts optimization of scalar code too. For example, the volatile sequence foils constant folding. |
64-bit systems tend to forsake the x87 unit except in rare circumstances. I'd like to be able to reproduce the original problem before trying the bitcast fix. Is there an easy way to configure Julia on a 64-bit system to build as a 32-bit executable? |
While it would be great to fix these mixed-type operations, I bet in practice many people will use computations that are all Float64 or a mix of Float64 and Int. AFAICT those are still not vectorizing: compare |
The failure of LLVM to vectorize |
@ArchRobison to build julia as a 32-bit executable, I think you'll just need to pass |
As a related aside ... |
It's the passing |
You'll probably have to |
Starting a 32-bit VM is probably easier. |
@ArchRobison I have some 32-bit VMs already up and running, if you need them. I can email you ssh usernames and passwords if you want. |
@staticfloat Thanks for the offer. I may take it up if the few 32-bit machines/VMs that I found here don't work out. |
…ore/load to remove extra bits.
If you set ARCH=i686 (on a clean checkout), Julia will configure and build for 32-bit i686, assuming you have all the 32-bit dependencies. This is a very recent addition, and I think it is mentioned in the readme. You can optionally also set MARCH to further refine the target selection. |
…g#41 to use volatile store/load on 32-bit x86 only.
…g#41 to use volatile store/load on 32-bit x86 only.
…g#41 to use volatile store/load on 32-bit x86 only.
The first of these vectorizes (given
Matrix{Float32}
inputs), but the second doesn't.However, the following vectorizes with a
Vector{Float32}
input:This makes me wonder whether there might be something relatively simple needed to turn on vectorization for
myblur_x!
?Also, it would be great to be able to vectorize for Float64, although I recognize that the incremental benefit will be smaller.
CC @ArchRobison.
The text was updated successfully, but these errors were encountered: