Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigFloat incorrectly rounded to Float16 (subnormals) #50642

Closed
nsajko opened this issue Jul 23, 2023 · 8 comments · Fixed by #50691
Closed

BigFloat incorrectly rounded to Float16 (subnormals) #50642

nsajko opened this issue Jul 23, 2023 · 8 comments · Fixed by #50691
Labels

Comments

@nsajko
Copy link
Contributor

nsajko commented Jul 23, 2023

julia_rounding_experiment.jl

distance(x, y) = abs(x - y)

function rounding_experiment(bf::BigFloat, r::AbstractFloat)
  p = prevfloat(r)
  n = nextfloat(r)

  r_dist = distance(bf, r)
  p_dist = distance(bf, p)
  n_dist = distance(bf, n)

  if !((r_dist  p_dist) & (r_dist  n_dist))
    print("rounding $r not correct: ")
    if p_dist  n_dist
      println("$p is the correct rounding")
    else
      println("$n is the correct rounding")
    end
  end

  nothing
end

f16_rounding_experiment(bf::BigFloat) = rounding_experiment(bf, Float16(bf))

setprecision(BigFloat, 500)

f16_rounding_experiment(big"1.4901162082026128889687591176485489397376143775948511e-07")
julia> include("/home/nsajko/julia_rounding_experiment.jl")
rounding 1.0e-7 not correct: 2.0e-7 is the correct rounding

julia> versioninfo()
Julia Version 1.11.0-DEV.139
Commit bf00ff4a110 (2023-07-21 19:34 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 1 on 8 virtual cores
@oscardssmith
Copy link
Member

I think this is a dup of something. the problem is we convert to float64 first. I've meant to fix this for a while but haven't bothered yet

@nsajko
Copy link
Contributor Author

nsajko commented Jul 23, 2023

Yeah, if the BigFloat is converted to Float64 before converting it to Float16, the result is the same.

Is there an easy workaround, even if not performant? I'm trying to revisit the exact rational conversion PR, and the many broken tests are bothering me.

@nsajko
Copy link
Contributor Author

nsajko commented Jul 23, 2023

Yeah, if the BigFloat is converted to Float64 before converting it to Float16, the result is the same.

Wait a minute, this actually means the bug is not just caused by the conversion to Float64?

@nsajko
Copy link
Contributor Author

nsajko commented Jul 23, 2023

I mean, f16_rounding_experiment(big(1.490116208202613e-7)) gives the same error report.

@oscardssmith
Copy link
Member

The bug is caused by double rounding. The workaround is relatively easy and relatively performant. You just need to read the bits out of the Bigfloat mantisa and do the rounding yourself. It's conceptually pretty easy but somewhat annoying to implement.

@nsajko
Copy link
Contributor Author

nsajko commented Jul 23, 2023

Currently the conversion from BigFloat to Float16 is implemented by first converting to Float32, not Float64. I suppose the bug would manifest less often if we first converted to Float64, without using Float32 at all.

Would that be OK as a quick fix?

@oscardssmith
Copy link
Member

I'm pretty sure I made that PR and then reverted it because it was causing problems (but I forget exactly why).

@giordano
Copy link
Contributor

#40315?

nsajko added a commit to nsajko/julia that referenced this issue Jul 26, 2023
MPFR determines special values according to sentinel values of the
exponent field. Although these constants are not documented (they're
defined in MPFR's `src/mpfr-impl.h`), they're now already used in
`Base.MPFR` for converting IEEE 754 to `BigFloat`, so I guess it makes
sense to avoid the `ccall` overhead for predicates like `iszero` and
`isnan`, too.

The context here is that I'm working on generic IEEE 754-`BigFloat`
conversion implementations that would work without using MPFR and
improve correctness (JuliaLang#50642) and performance, so this PR seems like the
obvious prerequisite for being able to use the Julian predicates like
`iszero` without calling libmpfr.
nsajko added a commit to nsajko/julia that referenced this issue Jul 26, 2023
MPFR determines special values according to sentinel values of the
exponent field. Although these constants are not documented (they're
defined in MPFR's `src/mpfr-impl.h`), they're already used in
`Base.MPFR` in the `BigFloat` inner constructor and for converting IEEE
754 FP types to `BigFloat`, so I guess it makes sense to avoid the
`ccall` overhead for predicates like `iszero` and `isnan`, too.

The context here is that I'm working on generic IEEE 754-`BigFloat`
conversion implementations that would work without using MPFR and
improve correctness (JuliaLang#50642) and performance, so this PR seems like the
obvious prerequisite for being able to use the Julian predicates like
`iszero` without calling libmpfr.
nsajko added a commit to nsajko/julia that referenced this issue Jul 26, 2023
MPFR determines special values according to sentinel values of the
exponent field. Although these constants are not documented (they're
defined in MPFR's `src/mpfr-impl.h`), they're already used in
`Base.MPFR` in the `BigFloat` inner constructor and for converting IEEE
754 FP types to `BigFloat`, so I guess it makes sense to avoid the
`ccall` overhead for predicates like `iszero` and `isnan`, too.

The context here is that I'm working on generic IEEE 754-`BigFloat`
conversion implementations that would work without using MPFR and
improve correctness (JuliaLang#50642) and performance, so this PR seems like the
obvious prerequisite for being able to use the Julian predicates like
`iszero` without calling libmpfr unnecessarily.
nsajko added a commit to nsajko/julia that referenced this issue Jul 27, 2023
There's lots of code, but most of it seems like it will be useful in
general. For example, I think I'll use the changes in float.jl and
rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could
also be used to refactor float.jl to remove many magic constants.

Benchmarking script:
```julia
using BenchmarkTools
f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n)
g!(u, v) = map!(eltype(u), u, v)
@Btime g!(u, v) setup=(u = f(Float16); v = f();)
@Btime g!(u, v) setup=(u = f(Float32); v = f();)
@Btime g!(u, v) setup=(u = f(Float64); v = f();)
```

On master (dc06468):
```
  46.116 μs (0 allocations: 0 bytes)
  38.842 μs (0 allocations: 0 bytes)
  37.039 μs (0 allocations: 0 bytes)
```

With both this commit and JuliaLang#50674 applied:
```
  42.870 μs (0 allocations: 0 bytes)
  42.950 μs (0 allocations: 0 bytes)
  42.158 μs (0 allocations: 0 bytes)
```

So, with this benchmark at least, on an AMD Zen 2 laptop, conversion
to `Float16` is faster, but there's a slowdown for `Float32` and
`Float64`.

Fixes JuliaLang#50642 (exact conversion to `Float16`)
nsajko added a commit to nsajko/julia that referenced this issue Jul 28, 2023
There's lots of code, but most of it seems like it will be useful in
general. For example, I think I'll use the changes in float.jl and
rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could
also be used to refactor float.jl to remove many magic constants.

Benchmarking script:
```julia
using BenchmarkTools
f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n)
g!(u, v) = map!(eltype(u), u, v)
@Btime g!(u, v) setup=(u = f(Float16); v = f();)
@Btime g!(u, v) setup=(u = f(Float32); v = f();)
@Btime g!(u, v) setup=(u = f(Float64); v = f();)
```

On master (dc06468):
```
  46.116 μs (0 allocations: 0 bytes)
  38.842 μs (0 allocations: 0 bytes)
  37.039 μs (0 allocations: 0 bytes)
```

With both this commit and JuliaLang#50674 applied:
```
  42.310 μs (0 allocations: 0 bytes)
  42.661 μs (0 allocations: 0 bytes)
  41.608 μs (0 allocations: 0 bytes)
```

So, with this benchmark at least, on an AMD Zen 2 laptop, conversion
to `Float16` is faster, but there's a slowdown for `Float32` and
`Float64`.

Fixes JuliaLang#50642 (exact conversion to `Float16`)
nsajko added a commit to nsajko/julia that referenced this issue Jul 28, 2023
There's lots of code, but most of it seems like it will be useful in
general. For example, I think I'll use the changes in float.jl and
rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could
also be used to refactor float.jl to remove many magic constants.

Benchmarking script:
```julia
using BenchmarkTools
f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n)
g!(u, v) = map!(eltype(u), u, v)
@Btime g!(u, v) setup=(u = f(Float16); v = f();)
@Btime g!(u, v) setup=(u = f(Float32); v = f();)
@Btime g!(u, v) setup=(u = f(Float64); v = f();)
```

On master (dc06468):
```
  46.116 μs (0 allocations: 0 bytes)
  38.842 μs (0 allocations: 0 bytes)
  37.039 μs (0 allocations: 0 bytes)
```

With both this commit and JuliaLang#50674 applied:
```
  42.310 μs (0 allocations: 0 bytes)
  42.661 μs (0 allocations: 0 bytes)
  41.608 μs (0 allocations: 0 bytes)
```

So, with this benchmark at least, on an AMD Zen 2 laptop, conversion
to `Float16` is faster, but there's a slowdown for `Float32` and
`Float64`.

Fixes JuliaLang#50642 (exact conversion to `Float16`)
nsajko added a commit to nsajko/julia that referenced this issue Jul 29, 2023
There's lots of code, but most of it seems like it will be useful in
general. For example, I think I'll use the changes in float.jl and
rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could
also be used to refactor float.jl to remove many magic constants.

Benchmarking script:
```julia
using BenchmarkTools
f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n)
g!(u, v) = map!(eltype(u), u, v)
@Btime g!(u, v) setup=(u = f(Float16); v = f();)
@Btime g!(u, v) setup=(u = f(Float32); v = f();)
@Btime g!(u, v) setup=(u = f(Float64); v = f();)
```

On master (dc06468):
```
  46.116 μs (0 allocations: 0 bytes)
  38.842 μs (0 allocations: 0 bytes)
  37.039 μs (0 allocations: 0 bytes)
```

With both this commit and JuliaLang#50674 applied:
```
  42.310 μs (0 allocations: 0 bytes)
  42.661 μs (0 allocations: 0 bytes)
  41.608 μs (0 allocations: 0 bytes)
```

So, with this benchmark at least, on an AMD Zen 2 laptop, conversion
to `Float16` is faster, but there's a slowdown for `Float32` and
`Float64`.

Fixes JuliaLang#50642 (exact conversion to `Float16`)
nsajko added a commit to nsajko/julia that referenced this issue Aug 11, 2023
There's lots of code, but most of it seems like it will be useful in
general. For example, I think I'll use the changes in float.jl and
rounding.jl to improve the JuliaLang#49749 PR. The changes in float.jl could
also be used to refactor float.jl to remove many magic constants.

Benchmarking script:
```julia
using BenchmarkTools
f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n)
g!(u, v) = map!(eltype(u), u, v)
@Btime g!(u, v) setup=(u = f(Float16); v = f();)
@Btime g!(u, v) setup=(u = f(Float32); v = f();)
@Btime g!(u, v) setup=(u = f(Float64); v = f();)
```

On master (dc06468):
```
  46.116 μs (0 allocations: 0 bytes)
  38.842 μs (0 allocations: 0 bytes)
  37.039 μs (0 allocations: 0 bytes)
```

With both this commit and JuliaLang#50674 applied:
```
  42.310 μs (0 allocations: 0 bytes)
  42.661 μs (0 allocations: 0 bytes)
  41.608 μs (0 allocations: 0 bytes)
```

So, with this benchmark at least, on an AMD Zen 2 laptop, conversion
to `Float16` is faster, but there's a slowdown for `Float32` and
`Float64`.

Fixes JuliaLang#50642 (exact conversion to `Float16`)
KristofferC pushed a commit that referenced this issue Aug 14, 2023
MPFR determines special values according to sentinel values of the
exponent field. Although these constants are not documented (they're
defined in MPFR's `src/mpfr-impl.h`), they're already used in
`Base.MPFR` in the `BigFloat` inner constructor and for converting IEEE
754 FP types to `BigFloat`, so I guess it makes sense to avoid the
`ccall` overhead for predicates like `iszero` and `isnan`, too.

The context here is that I'm working on generic IEEE 754-`BigFloat`
conversion implementations that would work without using MPFR and
improve correctness (#50642) and performance, so this PR seems like the
obvious prerequisite for being able to use the Julian predicates like
`iszero` without calling libmpfr unnecessarily.
DilumAluthge pushed a commit that referenced this issue Aug 21, 2023
There's lots of code, but most of it seems like it will be useful in
general. For example, I think I'll use the changes in float.jl and
rounding.jl to improve the #49749 PR. The changes in float.jl could also
be used to refactor float.jl to remove many magic constants.

Benchmarking script:
```julia
using BenchmarkTools
f(::Type{T} = BigFloat, n::Int = 2000) where {T} = rand(T, n)
g!(u, v) = map!(eltype(u), u, v)
@Btime g!(u, v) setup=(u = f(Float16); v = f();)
@Btime g!(u, v) setup=(u = f(Float32); v = f();)
@Btime g!(u, v) setup=(u = f(Float64); v = f();)
```

On master (dc06468):
```
  46.116 μs (0 allocations: 0 bytes)
  38.842 μs (0 allocations: 0 bytes)
  37.039 μs (0 allocations: 0 bytes)
```

With both this commit and #50674 applied:
```
  42.310 μs (0 allocations: 0 bytes)
  42.661 μs (0 allocations: 0 bytes)
  41.608 μs (0 allocations: 0 bytes)
```

So, with this benchmark at least, on an AMD Zen 2 laptop, conversion to
`Float16` is faster, but there's a slowdown for `Float32` and `Float64`.

Fixes #50642 (exact conversion to `Float16`)

Co-authored-by: Oscar Smith <oscardssmith@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants