-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuples made me sad (so I fixed them) #4042
Conversation
@@ -112,6 +115,13 @@ void __attribute__(()) __stack_chk_fail() | |||
static DIBuilder *dbuilder; | |||
static std::map<int, std::string> argNumberStrings; | |||
static FunctionPassManager *FPM; | |||
static PassManager *PM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I overlooking something, or is this unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're right. I used to have an inlining pass in there, but that didn't actually work.
This is huge! Really amazing work. |
Very cool stuff. Have you observed any major speedups due to this change? |
You should benchmark complex |
Really cool! |
Here's a fun one:
|
Marking this as 0.3 milestone. |
Took longer than I had hoped, but I feel this is done now. Complex sqrt for example got about 2x faster. |
Excellent! I think the |
With the recent cleanup of floating point internals as part float16, it should be much easier to get simd capabilities. I hope we will have it in 0.3. |
I should have been more clear. This also adds simd types. In particular the |
That is really cool. |
I still need to figure out how to do SIMD loading and storing of data and I don't know what it takes to obtain more exotic SIMD instructions like palignr, but llvmcall can definitely be used to generate some SIMD instructions. Here's another teaser, computing x*y+z and producing very clean code on a Sandy Bridge processor with 64-bit Linux: typealias Float32x4 (Float32,Float32,Float32,Float32)
function bar(x::Float32x4, y::Float32x4, z::Float32x4)
xy = Base.llvmcall("""%3 = fmul <4 x float> %1, %0
ret <4 x float> %3""", Float32x4, (Float32x4, Float32x4), x, y)
Base.llvmcall("""%3 = fadd <4 x float> %1, %0
ret <4 x float> %3""", Float32x4, (Float32x4, Float32x4), xy, z)
end julia> code_native(bar, (Float32x4, Float32x4, Float32x4))
.text
Filename: none
Source line: 4
push RBP
mov RBP, RSP
vmulps XMM0, XMM0, XMM1
vaddps XMM0, XMM0, XMM2
Source line: 4
pop RBP
ret 256 bit SIMD with AVX also works: typealias Float32x8 (Float32, Float32, Float32, Float32, Float32, Float32, Float32, Float32)
function bar(x::Float32x8, y::Float32x8, z::Float32x8)
xy = Base.llvmcall("""%3 = fmul <8 x float> %1, %0
ret <8 x float> %3""", Float32x8, (Float32x8, Float32x8), x, y)
Base.llvmcall("""%3 = fadd <8 x float> %1, %0
ret <8 x float> %3""", Float32x8, (Float32x8, Float32x8), xy, z)
end julia> code_native(bar, (Float32x8, Float32x8, Float32x8))
.text
Filename: none
Source line: 4
push RBP
mov RBP, RSP
vmulps YMM0, YMM0, YMM1
vaddps YMM0, YMM0, YMM2
Source line: 4
pop RBP
ret |
I should really make |
|
It didn't work when I tested it, but I will have another look. |
Just tested, Looking forward to using |
Must've been doing sth. wrong then. Thanks for testing. |
Rebased, for those following along at home ;). |
Mapping tuples to the native vector datatypes is nifty. I want to try it out, but the fork is only working partially for me. I'm on a 64-bit Ubuntu 12.0 Linux box with a Haswell processor, and running into a problem. The symptom is that
If I put the the examples in an input file, it works fine. For example:
Newbie question: What's the recommended way to debug failure of the REPL? I started using gdb on |
Try running it inside of
That can sometimes print out interesting information. |
I believe this is going to be merged soon, as we will release 0.2 this weekend and branch. |
@ArchRobison I hadn't noticed since I usually use my own REPL. That turns out to be an LLVM bug (sigh): http://llvm.org/bugs/show_bug.cgi?id=12618. I'll see if I can work around it. |
Or we could just merge your REPL before we merge this patch? Hopefully we're talking a matter of days here... |
Now, that 0.2 is tagged, can we finally merge this? |
+1 |
Yes. This is a non-breaking change, right? So does that mean it can be merged into the release-0.2 branch? |
Conflicts: src/cgutils.cpp
@JeffBezanson, anything else? I'd love to merge this soon so I can put up some of the followup stuff. |
@JeffBezanson bump. Let's get this merged soon. |
Tuples made me sad (so I fixed them)
This sped up the library I'm working on to 66% of its previous runtime. Well done! |
oh boy oh boy oh boy oh boy
We are finally on-par with Matlab's griddedInterpolant. |
if(iity->getBitWidth() > 32) | ||
i = builder.CreateTrunc(i,T_int32); | ||
else if(iity->getBitWidth() < 32) | ||
i = builder.CreateZExt(i,T_int32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theres also a CreateZExtOrTrunc function, although it just implements this same logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. Fair point. I was trying to remember what it was called, but then again as you noted it doesn't really matter.
Prelim speed center results:
|
No change for sparse matvec. I was not expecting it, but tried it just in case. |
|
I should note I was looking at "criid" |
@ViralBShah Already been reported; in fact, none of the |
* upstream/master: (53 commits) edit embedding chapter formatting fixes Fix JuliaLang#5056 more consitent/useful description of predicate in ie all() NEWS for JuliaLang#4042 Add sparse matvec to perf benchmark (#4707) Use warn_once instead of warn (JuliaLang#5038) Use warn() instead of println in base/sprase/sparsematrix.jl allow scale(b,A) or scale(A,b) when b is a scalar as well as a vector, don't restrict scale unnecessarily to arrays of numbers (e.g. scaling arrays of arrays should work), and improve documentation for scale\! Added section about memory management added negative bitarray rolling More accurate linspace for types with greater precision than Float64 add realmin realmax for BigFloat fix eps(realmax), add typemin and typemax for BigFloat fix JuliaLang#5025, ordering used by nextfloat/prevfloat roll back on errors during type definitions. fixes JuliaLang#5009 improve method signature errors. fixes JuliaLang#5018 update juliadoc add more asserts for casts Fix segfaulting on strstr() failure ...
Alright, this rabbit hole turned out to be a giant dungeon instead, but I think I have finally found the exit.
Basically what this does is prevent heap allocation of tuples whenever possible, instead passing them in registers or on the stack (or wherever llvm thinks is most efficient). An easy example is this:
whereas before it used to heap allocate it (albeit it was smart enough to recognize that the result was constant, it only did it once, but still: (from the old version)
A perhaps more exciting and real work application is
size
on various array, e.g.Note that there is now no allocation involved, whereas before (again from the old version):
Another exciting part of this pull request that if your types happen to correspond to a machine vector type (aka SIMD type), I make it just that, allowing fun things like
Note in particular the
paddd
in the native disassembly. Thellvmcall
used here is a new intrinsic to easily embed any LLVM IR into your julia functions. It is part of this pull request for demonstration purposes mostly, but is fully featured and ready to be used, but if people would rather not have it, I can rebase it out.The tuple unboxing currently is only enabled for immutable types due to GC considerations, but I will follow up with that patch after discussion on this one is done (it already work fine for other types if you disable the GC).
Fixes #2496
Fixes #2299
/cc @JeffBezanson