-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigFloat
incorrectly rounded to Float16
(subnormals)
#50642
Comments
I think this is a dup of something. the problem is we convert to float64 first. I've meant to fix this for a while but haven't bothered yet |
Yeah, if the Is there an easy workaround, even if not performant? I'm trying to revisit the exact rational conversion PR, and the many broken tests are bothering me. |
Wait a minute, this actually means the bug is not just caused by the conversion to |
I mean, |
The bug is caused by double rounding. The workaround is relatively easy and relatively performant. You just need to read the bits out of the Bigfloat mantisa and do the rounding yourself. It's conceptually pretty easy but somewhat annoying to implement. |
Currently the conversion from Would that be OK as a quick fix? |
I'm pretty sure I made that PR and then reverted it because it was causing problems (but I forget exactly why). |
MPFR determines special values according to sentinel values of the exponent field. Although these constants are not documented (they're defined in MPFR's `src/mpfr-impl.h`), they're now already used in `Base.MPFR` for converting IEEE 754 to `BigFloat`, so I guess it makes sense to avoid the `ccall` overhead for predicates like `iszero` and `isnan`, too. The context here is that I'm working on generic IEEE 754-`BigFloat` conversion implementations that would work without using MPFR and improve correctness (JuliaLang#50642) and performance, so this PR seems like the obvious prerequisite for being able to use the Julian predicates like `iszero` without calling libmpfr.
MPFR determines special values according to sentinel values of the exponent field. Although these constants are not documented (they're defined in MPFR's `src/mpfr-impl.h`), they're already used in `Base.MPFR` in the `BigFloat` inner constructor and for converting IEEE 754 FP types to `BigFloat`, so I guess it makes sense to avoid the `ccall` overhead for predicates like `iszero` and `isnan`, too. The context here is that I'm working on generic IEEE 754-`BigFloat` conversion implementations that would work without using MPFR and improve correctness (JuliaLang#50642) and performance, so this PR seems like the obvious prerequisite for being able to use the Julian predicates like `iszero` without calling libmpfr.
MPFR determines special values according to sentinel values of the exponent field. Although these constants are not documented (they're defined in MPFR's `src/mpfr-impl.h`), they're already used in `Base.MPFR` in the `BigFloat` inner constructor and for converting IEEE 754 FP types to `BigFloat`, so I guess it makes sense to avoid the `ccall` overhead for predicates like `iszero` and `isnan`, too. The context here is that I'm working on generic IEEE 754-`BigFloat` conversion implementations that would work without using MPFR and improve correctness (JuliaLang#50642) and performance, so this PR seems like the obvious prerequisite for being able to use the Julian predicates like `iszero` without calling libmpfr unnecessarily.
There's lots of code, but most of it seems like it will be useful in general. For example, I think I'll use the changes in float.jl and rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could also be used to refactor float.jl to remove many magic constants. Benchmarking script: ```julia using BenchmarkTools f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n) g!(u, v) = map!(eltype(u), u, v) @Btime g!(u, v) setup=(u = f(Float16); v = f();) @Btime g!(u, v) setup=(u = f(Float32); v = f();) @Btime g!(u, v) setup=(u = f(Float64); v = f();) ``` On master (dc06468): ``` 46.116 μs (0 allocations: 0 bytes) 38.842 μs (0 allocations: 0 bytes) 37.039 μs (0 allocations: 0 bytes) ``` With both this commit and JuliaLang#50674 applied: ``` 42.870 μs (0 allocations: 0 bytes) 42.950 μs (0 allocations: 0 bytes) 42.158 μs (0 allocations: 0 bytes) ``` So, with this benchmark at least, on an AMD Zen 2 laptop, conversion to `Float16` is faster, but there's a slowdown for `Float32` and `Float64`. Fixes JuliaLang#50642 (exact conversion to `Float16`)
There's lots of code, but most of it seems like it will be useful in general. For example, I think I'll use the changes in float.jl and rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could also be used to refactor float.jl to remove many magic constants. Benchmarking script: ```julia using BenchmarkTools f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n) g!(u, v) = map!(eltype(u), u, v) @Btime g!(u, v) setup=(u = f(Float16); v = f();) @Btime g!(u, v) setup=(u = f(Float32); v = f();) @Btime g!(u, v) setup=(u = f(Float64); v = f();) ``` On master (dc06468): ``` 46.116 μs (0 allocations: 0 bytes) 38.842 μs (0 allocations: 0 bytes) 37.039 μs (0 allocations: 0 bytes) ``` With both this commit and JuliaLang#50674 applied: ``` 42.310 μs (0 allocations: 0 bytes) 42.661 μs (0 allocations: 0 bytes) 41.608 μs (0 allocations: 0 bytes) ``` So, with this benchmark at least, on an AMD Zen 2 laptop, conversion to `Float16` is faster, but there's a slowdown for `Float32` and `Float64`. Fixes JuliaLang#50642 (exact conversion to `Float16`)
There's lots of code, but most of it seems like it will be useful in general. For example, I think I'll use the changes in float.jl and rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could also be used to refactor float.jl to remove many magic constants. Benchmarking script: ```julia using BenchmarkTools f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n) g!(u, v) = map!(eltype(u), u, v) @Btime g!(u, v) setup=(u = f(Float16); v = f();) @Btime g!(u, v) setup=(u = f(Float32); v = f();) @Btime g!(u, v) setup=(u = f(Float64); v = f();) ``` On master (dc06468): ``` 46.116 μs (0 allocations: 0 bytes) 38.842 μs (0 allocations: 0 bytes) 37.039 μs (0 allocations: 0 bytes) ``` With both this commit and JuliaLang#50674 applied: ``` 42.310 μs (0 allocations: 0 bytes) 42.661 μs (0 allocations: 0 bytes) 41.608 μs (0 allocations: 0 bytes) ``` So, with this benchmark at least, on an AMD Zen 2 laptop, conversion to `Float16` is faster, but there's a slowdown for `Float32` and `Float64`. Fixes JuliaLang#50642 (exact conversion to `Float16`)
There's lots of code, but most of it seems like it will be useful in general. For example, I think I'll use the changes in float.jl and rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could also be used to refactor float.jl to remove many magic constants. Benchmarking script: ```julia using BenchmarkTools f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n) g!(u, v) = map!(eltype(u), u, v) @Btime g!(u, v) setup=(u = f(Float16); v = f();) @Btime g!(u, v) setup=(u = f(Float32); v = f();) @Btime g!(u, v) setup=(u = f(Float64); v = f();) ``` On master (dc06468): ``` 46.116 μs (0 allocations: 0 bytes) 38.842 μs (0 allocations: 0 bytes) 37.039 μs (0 allocations: 0 bytes) ``` With both this commit and JuliaLang#50674 applied: ``` 42.310 μs (0 allocations: 0 bytes) 42.661 μs (0 allocations: 0 bytes) 41.608 μs (0 allocations: 0 bytes) ``` So, with this benchmark at least, on an AMD Zen 2 laptop, conversion to `Float16` is faster, but there's a slowdown for `Float32` and `Float64`. Fixes JuliaLang#50642 (exact conversion to `Float16`)
There's lots of code, but most of it seems like it will be useful in general. For example, I think I'll use the changes in float.jl and rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could also be used to refactor float.jl to remove many magic constants. Benchmarking script: ```julia using BenchmarkTools f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n) g!(u, v) = map!(eltype(u), u, v) @Btime g!(u, v) setup=(u = f(Float16); v = f();) @Btime g!(u, v) setup=(u = f(Float32); v = f();) @Btime g!(u, v) setup=(u = f(Float64); v = f();) ``` On master (dc06468): ``` 46.116 μs (0 allocations: 0 bytes) 38.842 μs (0 allocations: 0 bytes) 37.039 μs (0 allocations: 0 bytes) ``` With both this commit and JuliaLang#50674 applied: ``` 42.310 μs (0 allocations: 0 bytes) 42.661 μs (0 allocations: 0 bytes) 41.608 μs (0 allocations: 0 bytes) ``` So, with this benchmark at least, on an AMD Zen 2 laptop, conversion to `Float16` is faster, but there's a slowdown for `Float32` and `Float64`. Fixes JuliaLang#50642 (exact conversion to `Float16`)
MPFR determines special values according to sentinel values of the exponent field. Although these constants are not documented (they're defined in MPFR's `src/mpfr-impl.h`), they're already used in `Base.MPFR` in the `BigFloat` inner constructor and for converting IEEE 754 FP types to `BigFloat`, so I guess it makes sense to avoid the `ccall` overhead for predicates like `iszero` and `isnan`, too. The context here is that I'm working on generic IEEE 754-`BigFloat` conversion implementations that would work without using MPFR and improve correctness (#50642) and performance, so this PR seems like the obvious prerequisite for being able to use the Julian predicates like `iszero` without calling libmpfr unnecessarily.
There's lots of code, but most of it seems like it will be useful in general. For example, I think I'll use the changes in float.jl and rounding.jl to improve the #49749 PR. The changes in float.jl could also be used to refactor float.jl to remove many magic constants. Benchmarking script: ```julia using BenchmarkTools f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n) g!(u, v) = map!(eltype(u), u, v) @Btime g!(u, v) setup=(u = f(Float16); v = f();) @Btime g!(u, v) setup=(u = f(Float32); v = f();) @Btime g!(u, v) setup=(u = f(Float64); v = f();) ``` On master (dc06468): ``` 46.116 μs (0 allocations: 0 bytes) 38.842 μs (0 allocations: 0 bytes) 37.039 μs (0 allocations: 0 bytes) ``` With both this commit and #50674 applied: ``` 42.310 μs (0 allocations: 0 bytes) 42.661 μs (0 allocations: 0 bytes) 41.608 μs (0 allocations: 0 bytes) ``` So, with this benchmark at least, on an AMD Zen 2 laptop, conversion to `Float16` is faster, but there's a slowdown for `Float32` and `Float64`. Fixes #50642 (exact conversion to `Float16`) Co-authored-by: Oscar Smith <oscardssmith@gmail.com>
julia_rounding_experiment.jl
The text was updated successfully, but these errors were encountered: