Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sincos from libm in cis #21589

Merged
merged 2 commits into from
May 25, 2017
Merged

Use sincos from libm in cis #21589

merged 2 commits into from
May 25, 2017

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented Apr 27, 2017

I came up with this implementation for my research code last week that doesn't need heap allocation and is almost as good as the ccall version apart from the PLT entry part (so one more branch than an actual ccall in sysimg) so I guess people might be interested.

I don't think #10442 can be closed by this since it's too late to add another function in 0.6.

Do we have benchmarks for cis on nanosoldier?

Anyway, local benchmarks (the code_warntype and code_llvm are just to check that the invoke is correctly optimized)

julia> using BenchmarkTools

julia> f1(v) = cis(v)
f1 (generic function with 1 method)

julia> f2(v) = invoke(cis, Tuple{AbstractFloat}, v)
f2 (generic function with 1 method)

julia> @code_warntype f1(1.0)
Variables:
  #self#::#f1
  v::Float64

Body:
  begin 
      return $(Expr(:invoke, MethodInstance for cis(::Float64), :(Main.cis), :(v)))
  end::Complex{Float64}

julia> @code_warntype f2(1.0)
Variables:
  #self#::#f2
  v::Float64

Body:
  begin 
      SSAValue(0) = Main.cis
      return $(Expr(:invoke, MethodInstance for cis(::Float64), SSAValue(0), :(v)))
  end::Complex{Float64}

julia> @code_llvm f2(1.0)

define void @julia_f2_61158(%Complex.61* noalias nocapture sret, double) #0 !dbg !5 {
top:
  %2 = alloca %Complex.61, align 8
  call void @julia_cis_61159(%Complex.61* noalias nocapture nonnull sret %2, double %1)
  %3 = bitcast %Complex.61* %2 to i8*
  %4 = bitcast %Complex.61* %0 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %4, i8* %3, i32 16, i32 8, i1 false)
  ret void
}

julia> @code_llvm f1(1.0)

define void @julia_f1_61160(%Complex.61* noalias nocapture sret, double) #0 !dbg !5 {
top:
  %2 = alloca %Complex.61, align 8
  call void @julia_cis_61161(%Complex.61* noalias nocapture nonnull sret %2, double %1)
  %3 = bitcast %Complex.61* %2 to i8*
  %4 = bitcast %Complex.61* %0 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %4, i8* %3, i32 16, i32 8, i1 false)
  ret void
}

julia> @benchmark f1(1.0)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     12.538 ns (0.00% GC)
  median time:      12.569 ns (0.00% GC)
  mean time:        13.719 ns (0.00% GC)
  maximum time:     54.756 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> @benchmark f2(1.0)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     15.018 ns (0.00% GC)
  median time:      15.113 ns (0.00% GC)
  mean time:        15.705 ns (0.00% GC)
  maximum time:     70.411 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> @benchmark f1(1f0)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.552 ns (0.00% GC)
  median time:      6.987 ns (0.00% GC)
  mean time:        7.532 ns (0.00% GC)
  maximum time:     36.562 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark f2(1f0)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     9.849 ns (0.00% GC)
  median time:      9.951 ns (0.00% GC)
  mean time:        10.802 ns (0.00% GC)
  maximum time:     52.768 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

@yuyichao yuyichao added performance Must go faster maths Mathematical functions labels Apr 27, 2017
base/complex.jl Outdated
%res = load [2 x float], [2 x float]* %pres
ret [2 x float] %res
""", Tuple{Float32,Float32}, Tuple{Float32,Ptr{Void}}, v, cglobal((:sincosf, libm_name)))
if !isnan(v) && (isnan(s) || isnan(c))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would & and | be faster here?

base/fastmath.jl Outdated
@@ -273,6 +273,36 @@ atan2_fast(x::Float64, y::Float64) =

# explicit implementations

# FIXME: Change to `ccall((:sincos, libm_name))` when `Ref` calling convention can be
# stack allocated. Also in `complex.jl`
@inline function cis_fast(v::Float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could cis just call cis_fast to avoid the code duplication?

@yuyichao
Copy link
Contributor Author

Hmm, why is @stevengj 's comment not showing up..... probably being mis-identified as bot by @github ?

Anyway, I changed the complex.jl version to use the fastmath.jl version which I didn't do before since it was not available yet (shouldn't matter in practice as long as no one actually calls cis before fastmath.jl is included) (which also exposes a few errors since it's in the FastMath module.....)

Also noticed that I can just use the complex version of nan_dom_err so that fixes the |/& part.

@StefanKarpinski
Copy link
Member

@github seems to have misidentified @stevengj as a spambot, which is a terrible miscategorization, and a really urgent situation for them to fix ASAP.

@stevengj
Copy link
Member

Looks like it is fixed now?

@StefanKarpinski
Copy link
Member

YAY! I'm so relieved to see you back, @stevengj!!

@yuyichao yuyichao force-pushed the yyc/cis branch 2 times, most recently from d5a3d02 to fa1e3c3 Compare April 30, 2017 02:44
@yuyichao
Copy link
Contributor Author

yuyichao commented May 3, 2017

So apart from the off topic discussion that I started myself.... Any other comments about the actual PR?

base/complex.jl Outdated
function cis(v::Union{Float64,Float32})
res = FastMath.cis_fast(v) # Note: not yet defined at this point
return Math.nan_dom_err(res, v)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maybe define cis(v::Float16) = Complex{Float16}(cis(Float32(v))), both to use cis_fast and to avoid the double conversion of v to Float32 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that MPFR defines mpfr_sin_cos, which we can use for the BigFloat version of this method.

base/complex.jl Outdated
# compute exp(im*theta)
cis(theta::Real) = Complex(cos(theta),sin(theta))
cis(theta::AbstractFloat) = Complex(cos(theta),sin(theta))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have test coverage of this method now that there are specialized Float32 and Float64 versions?

@yuyichao yuyichao force-pushed the yyc/cis branch 2 times, most recently from 2512e39 to 3228cf9 Compare May 6, 2017 18:50
@yuyichao
Copy link
Contributor Author

yuyichao commented May 6, 2017

Addressed most comments. Since we've branched. I decided to just implement sincos instead and use it in many other functions. Will need to check if any of the functions changed needs to be tested and also add tests directly for sincos.

This does no include sincosd or sincospi. There are also still other optimization opportunities for some of the functions. I don't know how to get the expected precision around 0 but it should be possible to compute sinh and cosh faster than 3x-4x that of exp....

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@@ -405,6 +405,7 @@ export
significand,
sin,
sinc,
sincos,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a docstring, and to be added to the stdlib

@yuyichao
Copy link
Contributor Author

Test and doc added.

@yuyichao yuyichao merged commit bac32d3 into master May 25, 2017
@yuyichao yuyichao deleted the yyc/cis branch May 25, 2017 14:19
@giordano
Copy link
Contributor

Should this be mentioned in NEWS.md?

@StefanKarpinski StefanKarpinski added the needs news A NEWS entry is required for this change label May 30, 2017
@StefanKarpinski
Copy link
Member

Have added the "needs news" tag, @yuyichao, could you write an entry?

@fredrikekre fredrikekre mentioned this pull request Mar 18, 2018
@KristofferC KristofferC removed the needs news A NEWS entry is required for this change label Nov 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants