`BigFloat` incorrectly rounded to `Float16` (subnormals) #50642

nsajko · 2023-07-23T13:04:05Z

julia_rounding_experiment.jl

distance(x, y) = abs(x - y)

function rounding_experiment(bf::BigFloat, r::AbstractFloat)
  p = prevfloat(r)
  n = nextfloat(r)

  r_dist = distance(bf, r)
  p_dist = distance(bf, p)
  n_dist = distance(bf, n)

  if !((r_dist ≤ p_dist) & (r_dist ≤ n_dist))
    print("rounding $r not correct: ")
    if p_dist ≤ n_dist
      println("$p is the correct rounding")
    else
      println("$n is the correct rounding")
    end
  end

  nothing
end

f16_rounding_experiment(bf::BigFloat) = rounding_experiment(bf, Float16(bf))

setprecision(BigFloat, 500)

f16_rounding_experiment(big"1.4901162082026128889687591176485489397376143775948511e-07")

julia> include("/home/nsajko/julia_rounding_experiment.jl")
rounding 1.0e-7 not correct: 2.0e-7 is the correct rounding

julia> versioninfo()
Julia Version 1.11.0-DEV.139
Commit bf00ff4a110 (2023-07-21 19:34 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 1 on 8 virtual cores

The text was updated successfully, but these errors were encountered:

oscardssmith · 2023-07-23T13:09:09Z

I think this is a dup of something. the problem is we convert to float64 first. I've meant to fix this for a while but haven't bothered yet

nsajko · 2023-07-23T13:15:28Z

Yeah, if the BigFloat is converted to Float64 before converting it to Float16, the result is the same.

Is there an easy workaround, even if not performant? I'm trying to revisit the exact rational conversion PR, and the many broken tests are bothering me.

nsajko · 2023-07-23T13:16:52Z

Yeah, if the BigFloat is converted to Float64 before converting it to Float16, the result is the same.

Wait a minute, this actually means the bug is not just caused by the conversion to Float64?

nsajko · 2023-07-23T13:17:33Z

I mean, f16_rounding_experiment(big(1.490116208202613e-7)) gives the same error report.

oscardssmith · 2023-07-23T13:31:07Z

The bug is caused by double rounding. The workaround is relatively easy and relatively performant. You just need to read the bits out of the Bigfloat mantisa and do the rounding yourself. It's conceptually pretty easy but somewhat annoying to implement.

nsajko · 2023-07-23T14:09:33Z

Currently the conversion from BigFloat to Float16 is implemented by first converting to Float32, not Float64. I suppose the bug would manifest less often if we first converted to Float64, without using Float32 at all.

Would that be OK as a quick fix?

oscardssmith · 2023-07-23T14:30:30Z

I'm pretty sure I made that PR and then reverted it because it was causing problems (but I forget exactly why).

giordano · 2023-07-23T15:27:45Z

#40315?

MPFR determines special values according to sentinel values of the exponent field. Although these constants are not documented (they're defined in MPFR's `src/mpfr-impl.h`), they're now already used in `Base.MPFR` for converting IEEE 754 to `BigFloat`, so I guess it makes sense to avoid the `ccall` overhead for predicates like `iszero` and `isnan`, too. The context here is that I'm working on generic IEEE 754-`BigFloat` conversion implementations that would work without using MPFR and improve correctness (JuliaLang#50642) and performance, so this PR seems like the obvious prerequisite for being able to use the Julian predicates like `iszero` without calling libmpfr.

MPFR determines special values according to sentinel values of the exponent field. Although these constants are not documented (they're defined in MPFR's `src/mpfr-impl.h`), they're already used in `Base.MPFR` in the `BigFloat` inner constructor and for converting IEEE 754 FP types to `BigFloat`, so I guess it makes sense to avoid the `ccall` overhead for predicates like `iszero` and `isnan`, too. The context here is that I'm working on generic IEEE 754-`BigFloat` conversion implementations that would work without using MPFR and improve correctness (JuliaLang#50642) and performance, so this PR seems like the obvious prerequisite for being able to use the Julian predicates like `iszero` without calling libmpfr.

MPFR determines special values according to sentinel values of the exponent field. Although these constants are not documented (they're defined in MPFR's `src/mpfr-impl.h`), they're already used in `Base.MPFR` in the `BigFloat` inner constructor and for converting IEEE 754 FP types to `BigFloat`, so I guess it makes sense to avoid the `ccall` overhead for predicates like `iszero` and `isnan`, too. The context here is that I'm working on generic IEEE 754-`BigFloat` conversion implementations that would work without using MPFR and improve correctness (JuliaLang#50642) and performance, so this PR seems like the obvious prerequisite for being able to use the Julian predicates like `iszero` without calling libmpfr unnecessarily.

There's lots of code, but most of it seems like it will be useful in general. For example, I think I'll use the changes in float.jl and rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could also be used to refactor float.jl to remove many magic constants. Benchmarking script: ```julia using BenchmarkTools f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n) g!(u, v) = map!(eltype(u), u, v) @Btime g!(u, v) setup=(u = f(Float16); v = f();) @Btime g!(u, v) setup=(u = f(Float32); v = f();) @Btime g!(u, v) setup=(u = f(Float64); v = f();) ``` On master (dc06468): ``` 46.116 μs (0 allocations: 0 bytes) 38.842 μs (0 allocations: 0 bytes) 37.039 μs (0 allocations: 0 bytes) ``` With both this commit and JuliaLang#50674 applied: ``` 42.870 μs (0 allocations: 0 bytes) 42.950 μs (0 allocations: 0 bytes) 42.158 μs (0 allocations: 0 bytes) ``` So, with this benchmark at least, on an AMD Zen 2 laptop, conversion to `Float16` is faster, but there's a slowdown for `Float32` and `Float64`. Fixes JuliaLang#50642 (exact conversion to `Float16`)

There's lots of code, but most of it seems like it will be useful in general. For example, I think I'll use the changes in float.jl and rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could also be used to refactor float.jl to remove many magic constants. Benchmarking script: ```julia using BenchmarkTools f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n) g!(u, v) = map!(eltype(u), u, v) @Btime g!(u, v) setup=(u = f(Float16); v = f();) @Btime g!(u, v) setup=(u = f(Float32); v = f();) @Btime g!(u, v) setup=(u = f(Float64); v = f();) ``` On master (dc06468): ``` 46.116 μs (0 allocations: 0 bytes) 38.842 μs (0 allocations: 0 bytes) 37.039 μs (0 allocations: 0 bytes) ``` With both this commit and JuliaLang#50674 applied: ``` 42.310 μs (0 allocations: 0 bytes) 42.661 μs (0 allocations: 0 bytes) 41.608 μs (0 allocations: 0 bytes) ``` So, with this benchmark at least, on an AMD Zen 2 laptop, conversion to `Float16` is faster, but there's a slowdown for `Float32` and `Float64`. Fixes JuliaLang#50642 (exact conversion to `Float16`)

MPFR determines special values according to sentinel values of the exponent field. Although these constants are not documented (they're defined in MPFR's `src/mpfr-impl.h`), they're already used in `Base.MPFR` in the `BigFloat` inner constructor and for converting IEEE 754 FP types to `BigFloat`, so I guess it makes sense to avoid the `ccall` overhead for predicates like `iszero` and `isnan`, too. The context here is that I'm working on generic IEEE 754-`BigFloat` conversion implementations that would work without using MPFR and improve correctness (#50642) and performance, so this PR seems like the obvious prerequisite for being able to use the Julian predicates like `iszero` without calling libmpfr unnecessarily.

There's lots of code, but most of it seems like it will be useful in general. For example, I think I'll use the changes in float.jl and rounding.jl to improve the #49749 PR. The changes in float.jl could also be used to refactor float.jl to remove many magic constants. Benchmarking script: ```julia using BenchmarkTools f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n) g!(u, v) = map!(eltype(u), u, v) @Btime g!(u, v) setup=(u = f(Float16); v = f();) @Btime g!(u, v) setup=(u = f(Float32); v = f();) @Btime g!(u, v) setup=(u = f(Float64); v = f();) ``` On master (dc06468): ``` 46.116 μs (0 allocations: 0 bytes) 38.842 μs (0 allocations: 0 bytes) 37.039 μs (0 allocations: 0 bytes) ``` With both this commit and #50674 applied: ``` 42.310 μs (0 allocations: 0 bytes) 42.661 μs (0 allocations: 0 bytes) 41.608 μs (0 allocations: 0 bytes) ``` So, with this benchmark at least, on an AMD Zen 2 laptop, conversion to `Float16` is faster, but there's a slowdown for `Float32` and `Float64`. Fixes #50642 (exact conversion to `Float16`) Co-authored-by: Oscar Smith <oscardssmith@gmail.com>

giordano added the float16 label Jul 23, 2023

nsajko mentioned this issue Jul 26, 2023

Base.MPFR: don't ccall for special value predicates #50674

Merged

SyxP mentioned this issue Jul 26, 2023

round(::Rational{BigInt}; digits = 3) produces more digits then expected. #50677

Open

nsajko mentioned this issue Jul 27, 2023

exact BigFloat to IEEE FP conversion in pure Julia #50691

Merged

DilumAluthge closed this as completed in #50691 Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`BigFloat` incorrectly rounded to `Float16` (subnormals) #50642

`BigFloat` incorrectly rounded to `Float16` (subnormals) #50642

nsajko commented Jul 23, 2023

oscardssmith commented Jul 23, 2023

nsajko commented Jul 23, 2023

nsajko commented Jul 23, 2023

nsajko commented Jul 23, 2023

oscardssmith commented Jul 23, 2023

nsajko commented Jul 23, 2023

oscardssmith commented Jul 23, 2023

giordano commented Jul 23, 2023

BigFloat incorrectly rounded to Float16 (subnormals) #50642

BigFloat incorrectly rounded to Float16 (subnormals) #50642

Comments

nsajko commented Jul 23, 2023

oscardssmith commented Jul 23, 2023

nsajko commented Jul 23, 2023

nsajko commented Jul 23, 2023

nsajko commented Jul 23, 2023

oscardssmith commented Jul 23, 2023

nsajko commented Jul 23, 2023

oscardssmith commented Jul 23, 2023

giordano commented Jul 23, 2023

`BigFloat` incorrectly rounded to `Float16` (subnormals) #50642

`BigFloat` incorrectly rounded to `Float16` (subnormals) #50642