-
Notifications
You must be signed in to change notification settings - Fork 7
Generic implementations? #15
Comments
See #14 We can certainly do quite a bit in this direction. |
Here's the situation the two major things that change between Float32 and Float64 methods are the constants used and the degree of the polynomial approximant. Is it possible to do something where based on the type of the function we could, with zero overhead, call the correct constants array (two arrays of different length, one for Float32 and the other for Float64) and sum the polynomial via the horner macro using this array. I.e. build the two constant arrays, one for Float64 the other for Float32, then depending on the type signature call the correct table and apply the horner macro. I'm not sure if this can be done with zero overhead. |
The two easiest options would be to either use an inlined function, e.g. function foo(x)
...
y = _foo(x)
...
end
@inline _foo(x::Float32) = @horner x ...
@inline _foo(x::Float64) = @horner x ... or an function foo{T}(x::T)
...
if T == Float32
y = @horner x ...
else
y = @horner x ...
end
...
end |
The inlined function will generalise to |
So I discovered that in some cases having things inside let blocks can really throw of the compiler resulting in very bad code gen, e.g: let
function foo(x)
...
y = _foo(x)
...
end
@inline _foo(x::Float32) = @horner x ...
@inline _foo(x::Float64) = @horner x ...
end
Edit: already reported by simon as a bug |
@simonbyrne do you recall the exact issue number for the bug above? |
I opened JuliaLang/julia#18201, but it's due to JuliaLang/julia#15276 |
In C-based libm libraries, it doesn't make too much sense to share code between
Float32
andFloat64
implementations, let aloneComplex{Float32}
andComplex{Float64}
. The monomorphism of the language and the complexity of writing generic implementations make it counterproductive – it's easier to understand what's happening when each implementation is concretely spelled out.In Julia, I think we could do considerably better: multiple dispatch, macros, the occasional generated function, and other features should make it possible to write very generic versions of algorithms and still get maximal performance. Generic versions are also often easier to understand in much the same way that a more general theorem is easier to understand than a very specific application of it may be. This way we could fairly easily get efficient libm code for
Float16
,DoubleDouble
, and ultimatelyFloat128
once any hardware supports that. Thoughts?The text was updated successfully, but these errors were encountered: