Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ieee754_rem_pio2 in favor of a rem_pio2_kernel written in Julia. #22603

Merged
merged 39 commits into from
Aug 2, 2017

Conversation

pkofod
Copy link
Contributor

@pkofod pkofod commented Jun 28, 2017

fixes #22004

So, this will probably require some more proof of accuracy and performance, and I will provide it if needed. I will also add more tests. For now, I would very much like some feedback.

Much of this can benefit from further comments, more tests (as noted above), and thorough testing, but since bikeshedding will occur, let us speak names of functions upfront. What you see is just what I came up with.

I also need to add a license-comment as the cody waite part is pretty much straight out of openlibm. How should I go about this? Add the original license header somewhere?

/* @(#)k_rem_pio2.c 1.3 95/01/18 */
/*
 * ====================================================
 * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
 *
 * Developed at SunSoft, a Sun Microsystems, Inc. business.
 * Permission to use, copy, modify, and distribute this
 * software is freely granted, provided that this notice 
 * is preserved.
 * ====================================================
 */

** Performance **
The timings below are for values around 0. They are from the RemPiO2 package. Should prob do it from base as well now that I have this Pr.
timings_medium

For the positive values I ran (notice that there is a huge gap, that is why you have that downwards slope instead of a wiggly line on top), it looks something like this (x values are log(x) so the largest value is for something like 2.0^1018).
newvsoldpos

Performance is strictly better, and accuracy is about the same. Havn't found anything > 1ulp, and nothing worse than the about 0.6 ulp largest error of openlibm/openspecfun.

@pkofod
Copy link
Contributor Author

pkofod commented Jun 28, 2017

(should probably cc @simonbyrne )

@ararslan ararslan requested a review from simonbyrne June 28, 2017 21:51
@ararslan ararslan added the maths Mathematical functions label Jun 28, 2017
base/math.jl Outdated
returns the remainder of x modulo π/2 as a TwicePrecision number, along with a k
such that k mod 3 == K mod 3 where K*π/2 = x -rem.
"""

Copy link
Contributor

@tkelman tkelman Jun 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no blank line here

@tkelman
Copy link
Contributor

tkelman commented Jun 28, 2017

Since this is a derived work, I'd move it to its own file under base/special and add it to the exception lists in LICENSE.md and contrib/add_license_to_files.jl

@musm
Copy link
Contributor

musm commented Jun 28, 2017

What about the Float32 case https://github.com/JuliaLang/openlibm/blob/master/src/e_rem_pio2f.c ?

@pkofod
Copy link
Contributor Author

pkofod commented Jun 29, 2017

What about the Float32 case https://github.com/JuliaLang/openlibm/blob/master/src/e_rem_pio2f.c ?

Sure, but the current method (that I'm removing) is specifically for Float64. Since it's only used in rem2pi(x::Float64, roundingmode) I guess it would only be relevant to add at this point if a rem2pi for Float32s was added, or if Float32 methods for trig functions were added. There are fallbacks to rem2pi of course

rem2pi(x::Float32, r::RoundingMode) = Float32(rem2pi(Float64(x), r))

There's not much to the Float32 case (calculate medium case directly, or use the kernel function included in this PR for large values), but I think it makes sense to add it only when it's going to be used (PRs to come), as it's not really something we export.

@pkofod
Copy link
Contributor Author

pkofod commented Jun 29, 2017

Since this is a derived work, I'd move it to its own file under base/special and add it to the exception lists in LICENSE.md and contrib/add_license_to_files.jl

Alright, will do that.

@musm
Copy link
Contributor

musm commented Jun 29, 2017

Sure, but the current method (that I'm removing) is specifically for Float64. Since it's only used in rem2pi(x::Float64, roundingmode) I guess it would only be relevant to add at this point if a rem2pi for Float32s was added, or if Float32 methods for trig functions were added. There are fallbacks to rem2pi of course

Yep, the Float32 case should be quite straightforward after a correct Float64 method. I think we might as well also port the Float32 case, that way we can get rid of the slower fallback and just use the faster Float32 implementation for rem2pi, which I think is worth it if we are going through the trouble of the Float64 port.

@musm
Copy link
Contributor

musm commented Jun 29, 2017

BTW do you have any ideas as to why the performance and accuracy is so much better than the C version, this is really surprising to me (I would expect perhaps a ~15% speed improvement and essentially equivalent accuracy).

Not sure how much you plan on abstracting and simplifying, but a lot of the openlibm stuff can be simplified if you work at it hard enough and with careful benchmarking, i.e. the high and low splitting for comparisons is not necessary in many cases (so in principle you could defer that until you need it in the cody_waite_ext_pio2 method) This are some general comments from my experience with the openlibm code base, since I have only taken a cursory glance at the PR.

@pkofod
Copy link
Contributor Author

pkofod commented Jun 29, 2017

BTW do you have any ideas as to why the performance and accuracy is so much better than the C version, this is really surprising to me (I would expect perhaps a ~15% speed improvement and essentially equivalent accuracy).

Well, I expected the same thing. This is the sort of performance difference I get for the trig kernels for example (more like 8%-12%).

My very best bet would be that this is somehow related to the fact that I don't pass around a vector. The Payne Hanek implementation is not the same one as in openlibm (it's a modified version of the code Simon Byrne had floating around), but the relative speed-up seems consistent across x, so...

Not sure how much you plan on abstracting and simplifying, but a lot of the openlibm stuff can be simplified if you work at it hard enough and with careful benchmarking, i.e. the high and low splitting for comparisons is not necessary in many cases (so in principle you could defer that until you need it in the cody_waite_ext_pio2 method)

Yes, I guess I could defer getting the high word until reaching a branch where the more precise C&W scheme is used or reaching Payne Hanek. You think it would have a measurable impact, or is it more for "neat code" reasons? Fwiw, when I'm calling (or going to call) this function from sin, cos, etc, it'll already have the xh and xhp ready, unless that is changed again.

@pkofod
Copy link
Contributor Author

pkofod commented Jun 29, 2017

Since this is a derived work, I'd move it to its own file under base/special and add it to the exception lists in LICENSE.md and contrib/add_license_to_files.jl

Alright, I tried to do that, but now I'm getting:

/home/pkm/julia/julia/base/precompile.jl
AddrSpaceCast must be between different address spaces
  %.sroa_cast = addrspacecast i8* %65 to i8*, !dbg !75879
LLVM ERROR: Broken function found, compilation aborted!
*** This error is usually fixed by running `make clean`. If the error persists, try `make cleanall`. ***
Makefile:233: recipe for target '/home/pkm/julia/julia/usr/lib/julia/sys.o' failed
make[1]: *** [/home/pkm/julia/julia/usr/lib/julia/sys.o] Error 1
Makefile:109: recipe for target 'julia-sysimg-release' failed
make: *** [julia-sysimg-release] Error 2
pkm@pkm:~/julia/julia$ 

Wonder what I did wrong (and yes I did try to follow the clean instructions). I placed a new file in special, included it in math, included a skip in contrib, and changed LICENSE.md.

@tkelman
Copy link
Contributor

tkelman commented Jun 29, 2017

you switched branches since building llvm. do make cleanall, or failing that make -C deps distclean-llvm

@pkofod
Copy link
Contributor Author

pkofod commented Jun 29, 2017

That is actually true, will try it, thanks!

Edit: that worked.

What's the consensus on constants such as the precalculated hi and lo values of some numbers? Should they just be defined inside the functions if they're only used in one function, or outside?

LICENSE.md Outdated
@@ -34,6 +34,7 @@ Julia includes code from the following projects, which have their own licenses:
- [LLVM](http://releases.llvm.org/3.9.0/LICENSE.TXT) (for parts of src/jitlayers.cpp and src/disasm.cpp) [BSD-3, effectively]
- [MUSL](http://git.musl-libc.org/cgit/musl/tree/COPYRIGHT) (for getopt implementation on Windows) [MIT]
- [MINGW](https://sourceforge.net/p/mingw/mingw-org-wsl/ci/legacy/tree/mingwrt/mingwex/dirname.c) (for dirname implementation on Windows) [MIT]
- [OPENLIBM](https://github.com/JuliaLang/openlibm/blob/master/LICENSE.md) [MIT, BSD-2, ISC]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was this particular piece originally derived from? openlibm copied almost everything from somewhere else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FDLIBM

end

if xhp <= 0x401c463b # |x| ~<= 9pi/4, use Cody Waite with two constants
if (xhp <= 0x4015fdbc) # |x| ~<= 7pi/4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no parens around conditions

rem_pio2_kernel(x, xh, xhp)

returns the remainder of x modulo π/2 as a TwicePrecision number, along with a k
such that k mod 3 == K mod 3 where K*π/2 = x -rem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe some code / math highlighting here? is this supposed to be x - rem or something else?

returns the remainder of x modulo π/2 as a TwicePrecision number, along with a k
such that k mod 3 == K mod 3 where K*π/2 = x -rem.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no blank line here

"""
highword(x)

returns the high word of x as a UInt32.
Copy link
Contributor

@tkelman tkelman Jun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capitalize and use imperative (Return ...) for docstrings, code highlight UInt32

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and x

end

function cody_waite_ext_pio2(x, xʰ⁺)
fn = x*invpio2+0x1.8p52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given #6349 it's a bit more portable to not rely on hex floats for bootstrap-necessary code

LICENSE.md Outdated
@@ -34,6 +34,7 @@ Julia includes code from the following projects, which have their own licenses:
- [LLVM](http://releases.llvm.org/3.9.0/LICENSE.TXT) (for parts of src/jitlayers.cpp and src/disasm.cpp) [BSD-3, effectively]
- [MUSL](http://git.musl-libc.org/cgit/musl/tree/COPYRIGHT) (for getopt implementation on Windows) [MIT]
- [MINGW](https://sourceforge.net/p/mingw/mingw-org-wsl/ci/legacy/tree/mingwrt/mingwex/dirname.c) (for dirname implementation on Windows) [MIT]
- [FDLIBM](http://www.netlib.org/fdlibm/readme) [Freely distributable]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look down this file I think you also need to update and add a line below

base/special/exp.jl (see FREEBSD MSUN [FreeBSD/2-clause BSD/Simplified BSD License])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, alright, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this section and the one below? It's not obvious to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

projects we use vs how we use them? I think this section might actually be for core language/compiler, not base

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes, it's not very clear. It looks like this section only applies to core features, in which case FDLIB shouldn't be added here (just like MSUN is listed below, but not here).

BTW, if it's kept here, the new line should mention what it's used for, like existing lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this section is the right place for non-core julia code under base

LICENSE.md Outdated
@@ -74,6 +75,7 @@ The following components of Julia's standard library have separate licenses:
- base/sparse/umfpack.jl (see [SUITESPARSE](http://faculty.cse.tamu.edu/davis/suitesparse.html))
- base/sparse/cholmod.jl (see [SUITESPARSE](http://faculty.cse.tamu.edu/davis/suitesparse.html))
- base/special/exp.jl (see [FREEBSD MSUN](https://github.com/freebsd/freebsd) [FreeBSD/2-clause BSD/Simplified BSD License])
- base/special/rem_pio2.jl [FDLIBM](http://www.netlib.org/fdlibm/readme) [Freely distributable]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should point to a copy of the license, as "freely distributable" is a bit vague (and in particular it doesn't say that the copyright attribution should be preserved when distributing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can point to http://www.netlib.org/fdlibm/e_rem_pio2.c ? There isn't really a license file in fdlibm as far as I can see. I could write "Freely distributable with preserved copyright notice."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing both sounds like a good idea to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license is the same as the exp code line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I mean where is it stated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I know next to nothing about licensing, but is it necessary to go with the msun license if they simply took it from fdlibm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, good point, maybe someone more knowledgeable here can comment, if you do end up changing it best to also modify the one for exp.jl since they should be the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually what you have probably makes the most sense

test/mod2pi.jl Outdated
n, ret = Base.Math.rem_pio2_kernel(-case)
ret_sum = ret.hi+ret.lo
ulp_error = (ret_sum-ieee754_rem_pio2_return[1, i])/abs(ret_sum-nextfloat(ret_sum))
@test ulp_error < 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<=

@musm
Copy link
Contributor

musm commented Jul 26, 2017

lgtm (minor comments added)

Doesn't look like performance is any different than in yyc's branch 👍

@pkofod
Copy link
Contributor Author

pkofod commented Jul 30, 2017

Just for good measure, can this be benchmarked against master now that yuyichao's PR is merged (and formatting is fixed).

@KristofferC
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@pkofod
Copy link
Contributor Author

pkofod commented Jul 30, 2017

The relevant benchmarks seem okay. I would love to use a two coefficient Cody Waite up into some of the Payne Hanek interval if hardware fma is available, but let's see if that's the best use of my time. I think the current implementation is good enough that we can use it, and ditch openspecfuns.

test/mod2pi.jl Outdated
2.0^80*pi/4] # |x| >= 2.0^20π/2, idx > 0-0.22370138542135648

# ieee754_rem_pio2_return contains the returned value from the ieee754_rem_pio2
# function in openlibm https://github.com/JuliaLang/openlibm/blob/master/src/e_rem_pio2.c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to point to a release tag instead of master on this link in case this file changes or gets moved around in the future

@musm
Copy link
Contributor

musm commented Jul 31, 2017

The relevant benchmarks seem okay. I would love to use a two coefficient Cody Waite up into some of the Payne Hanek interval if hardware fma is available, but let's see if that's the best use of my time. I think the current implementation is good enough that we can use it, and ditch openspecfuns.

@pkofod what's the ulp error without hardware fma for Float32 and Flaot64 in that case? Note the exp10 function for the Float64 case on non-hardware fma has error slightly greater than 1 ulp, but is otherwise (< 1 ulp) for Float32 and Float64 with hardware fma.
So it may be tolerable.

@musm
Copy link
Contributor

musm commented Jul 31, 2017

On an unrelated note, I find a lot of these files hard to read to to the lack of spaces between the arguments

@pkofod
Copy link
Contributor Author

pkofod commented Jul 31, 2017

On an unrelated note, I find a lot of these files hard to read to to the lack of spaces between the arguments

arguments as in f(a,b,c)? That should only be in math.jl as I generally add spaces, but I didn't want to change all those lines just to add spaces (is there an official style for this?). Where I may be a lot more inconsistent is in a+b vs a + b or a*b+c vs a*b + c or variants of that...

@musm
Copy link
Contributor

musm commented Jul 31, 2017

arguments as in f(a,b,c)? That should only be in math.jl as I generally add spaces, but I didn't want to change all those lines just to add spaces (is there an official style for this?). Where I may be a lot more inconsistent is in a+b vs a + b or ab+c vs ab + c or variants of that...

e.g

https://github.com/JuliaLang/julia/pull/22603/files#diff-8278b779f2ea681192ba5b020a2c3e2bR923

@pkofod
Copy link
Contributor Author

pkofod commented Jul 31, 2017

e.g

https://github.com/JuliaLang/julia/pull/22603/files#diff-8278b779f2ea681192ba5b020a2c3e2bR923

I agree, I can change that, but I figured that I just wanted to make the relevant changes.

@tkelman
Copy link
Contributor

tkelman commented Aug 1, 2017

Are we overall satisfied with the performance and accuracy of this? What else needs doing here?

@pkofod
Copy link
Contributor Author

pkofod commented Aug 1, 2017

The only real change I can think of would be to change the branch conditionals to things like abs(x)<pi/4 (so, float comparisons instead of the higword-trickery), but that could be changed later; I don't really think it's that important beyond style.

@pkofod
Copy link
Contributor Author

pkofod commented Aug 1, 2017

is the arpack error on travis JuliaLang/LinearAlgebra.jl#354 ?

@tkelman
Copy link
Contributor

tkelman commented Aug 1, 2017

Might be? Triggered by #22963 as far as I can tell, which we probably should have tried harder to verify it could get through CI (despite all the timeouts lately)

@pkofod
Copy link
Contributor Author

pkofod commented Aug 1, 2017

(despite all the timeouts lately)

yeah, that's the other failure on travis

@simonbyrne
Copy link
Contributor

Okay then, I'll pull the trigger on this tomorrow, unless there are any further objections.

@tkelman
Copy link
Contributor

tkelman commented Aug 1, 2017

definitely squash since there are a lot of little commits here and I think some of the intermediate states had failed

@ararslan
Copy link
Member

ararslan commented Aug 1, 2017

I just want to say thanks again for this. This is really exciting, because 1.) it means we can excise a binary dependency from Base, and 2.) it's further proof of how powerful Julia can really be for mathematical computing. You've done an awesome job here!

@pkofod
Copy link
Contributor Author

pkofod commented Aug 1, 2017

Okay then, I'll pull the trigger on this tomorrow, unless there are any further objections.

Cool, thanks.

definitely squash since there are a lot of little commits here and I think some of the intermediate states had failed

Definitely!

I just want to say thanks again for this. This is really exciting, because 1.) it means we can excise a binary dependency from Base, and 2.) it's further proof of how powerful Julia can really be for mathematical computing. You've done an awesome job here!

Pleasure is on my side! Learned a lot about floating point arithmetic 😄 and deadlines 😑

@simonbyrne simonbyrne merged commit 27852fd into JuliaLang:master Aug 2, 2017
@pkofod pkofod deleted the rempio2gsoc branch August 2, 2017 20:24
vchuravy pushed a commit to JuliaPackaging/LazyArtifacts.jl that referenced this pull request Oct 2, 2023
…a. (JuliaLang/julia#22603)

* Remove ieee754_rem_pio2 in favor of a rem_pio2_kernel written in Julia.

* Add missing begin key.

* Remove _approx.

* Move to separate files.

* Fix LICENSE.md to mention FDLIBM instead of Openlibm.

* Address comments.

* Strengthen test to faithfully rounded.

* Fix LICENSE.md message for rem_pio2.

* Fix style in LICENSE.md entry.

* Remove semicolons.

* Move highword up, and remove duplicate unsafe_trunc.

* Fix LICENSE.md by removing a bullet and changing license of base/special/exp.jl.

* Change license info for base/special/exp.jl.

* Small changes.

* Get and reset precision for BigFloats, and space before rem in -rem.

* setprecision do

* Add comments, move test, and switch to muladd in some places.

* Fix y1 branches of rem2pi.

* Small changes.

* Move comment in rem_pio2.jl and add test for fast branch of mod2pi.

* rint docstring fix and make it clear what the constant is.

* Update comment for INV2PI.

* Fix wrong test set name.

* Tests against ieee754_rem_pio2 output.

* Inline cody_waite functions.

* rint -> round, remove rint, remove one argument cody waite, replace Int(x) with trunc(Int, x).

* Add some tests.

* Inline rem_pio2_kernel, and rearrange code slightly.

* fix xhp

* Use DoubleFloat64.

* Move constants into functions.

* Fix escaping of mod

* Fix tests and remove specific variables.

* Fix tests

* Fix issues raised in comments.

* More appropriate ulp test (test against eps of reference number).

* Change link to a stable github link.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improving ieee754_rem_pio2