Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consideration of workarounds for cbrt and exp precompilation problems #425

Closed
kimikage opened this issue May 23, 2020 · 7 comments · Fixed by #491
Closed

Consideration of workarounds for cbrt and exp precompilation problems #425

kimikage opened this issue May 23, 2020 · 7 comments · Fixed by #491
Milestone

Comments

@kimikage
Copy link
Collaborator

kimikage commented May 23, 2020

Edit: For the exp case, a workaround has been introduced in PR #483.

As JuliaLang/julia#35972 revealed, cbrt and exp have problems with precompilation.

In the Colors.jl, cbrt and exp are used in XYZ-->{Lab, Luv} and XYZ <--> RGB conversions.

function fxyz2lab(v)
v > xyz_epsilon ? cbrt(v) : (xyz_kappa * v + 16) / 116
end
function cnvt(::Type{Lab{T}}, c::XYZ, wp::XYZ) where T
fx, fy, fz = fxyz2lab(c.x / wp.x), fxyz2lab(c.y / wp.y), fxyz2lab(c.z / wp.z)
Lab{T}(116fy - 16, 500(fx - fy), 200(fy - fz))
end

function cnvt(::Type{Luv{T}}, c::XYZ, wp::XYZ = WP_DEFAULT) where T
(u_wp, v_wp) = xyz_to_uv(wp)
(u_, v_) = xyz_to_uv(c)
y = c.y / wp.y
l = y > xyz_epsilon ? 116 * cbrt(y) - 16 : xyz_kappa * y
u = 13 * l * (u_ - u_wp) + zero(T)
v = 13 * l * (v_ - v_wp) + zero(T)
Luv{T}(l, u, v)
end

# `pow5_12` is called from `srgb_compand`.
# `cbrt` generates a function call, so there is little benefit of `@fastmath`.
pow5_12(x) = pow3_4(x) / cbrt(x) # 5/12 == 1/2 + 1/4 - 1/3 == 3/4 - 1/3
# `pow12_5` is called from `invert_srgb_compand`.
# x^y ≈ exp(y*log(x)) ≈ exp2(y*log2(y)); the middle form is faster
@noinline pow12_5(x) = x^2 * exp(0.4 * log(x)) # 12/5 == 2.4 == 2 + 0.4

Note that the conversion methods from the DIN99 and its variants also use exp but they don't cause this problem since they are not precompiled.

RGB{Float32}->XYZ and RGB{N0f8}->XYZ are also OK because of the specialization of invert_srgb_compand. Moreover, although I don't clearly understand the reason, RGB{Float64}->XYZ seems to have no problem. (I guess the reason is the @noinline. exp potentially and definitely have the problem.)

In the case of MWE, f() will be eventually recompiled because f() is in a "shallow" place. However, in the Colors.jl, cbrt is in "deep" places from the public API (e.g. colordiff and distinguishable_colors), so it is difficult to avoid the problem.

Even if this problem is fixed by a compiler improvement, that improvement may not be backported to older versions. (It may not even be backported to v1.4 depending on the release timing of v1.4.2 v1.4.3.) So I think we should look for the workarounds which can be handled in Colors.jl.

One simple workaround is not to precompile the functions which cause the problem. However, this detracts from the benefits of precompilation.

@kimikage
Copy link
Collaborator Author

Although I don't want to do it much, specializing cbrt in Colors.jl is an option. Since it is sufficient for Colors.jl to be able to calculate mainly for the [0, 1] domain, some optimizations may be available.

@kimikage
Copy link
Collaborator Author

This issue will be addressed with the measure for issue #477. In other words, this will not be addressed as a precompilation problem, but as a issue of speed optimizations.

@kimikage
Copy link
Collaborator Author

kimikage commented Jun 16, 2021

The following is based on the techniques used in Base.cbrt(::Float64). The main difference is that the following calculates 1 / cbrt(x) and then finds cbrt(x). The 1 / cbrt(x) could be used in the pow5_12.

# Approximation of the reciprocal of the cube root, x^(-1/3).
# Assuming that x > 0.003, the conditional branches are omitted.
@inline function rcbrt(x::Float64)
    ix = reinterpret(UInt64, x)
    e0 = (ix >> 0x34) % UInt32
    ed = e0 ÷ 0x3
    er = e0 - ed * 0x3
    a = 0x000b_f2d7 - 0x0005_5718 * er
    e = (UInt32(1363) - ed) << 0x14 | a
    t1 = reinterpret(Float64, UInt64(e) << 0x20)
    h1 = muladd(t1^2, -x * t1, 1.0)
    t2 = muladd(@evalpoly(h1, 1/3, 2/9, 14/81), h1 * t1, t1)
    h2 = muladd(t2^2, -x * t2, 1.0)
    t3 = muladd(muladd(2/9, h2, 1/3), h2 * t2, t2)
    reinterpret(Float64, reinterpret(UInt64, t3) & 0xffff_ffff_8000_0000)
end

@inline function cbrt01(x::Float64)
    r = rcbrt(x) # x^(-1/3)
    h = muladd(r^2, -x * r, 1.0)
    e = muladd(2/9, h, 1/3) * h * r
    # x * x^(-2/3)
    muladd(r, x * r, x * e * (r + r + e))
end
julia> @noinline cbrt01n(x) = cbrt01(x);

julia> xs = max.(rand(1000), 0.003);

julia> @btime cbrt.($xs);
  6.880 μs (1 allocation: 7.94 KiB)

julia> @btime cbrt01n.($xs);
  6.680 μs (1 allocation: 7.94 KiB)

julia> @btime cbrt01.($xs); # Note that this is different from the actual use case.
  2.378 μs (1 allocation: 7.94 KiB)

Although this is slightly less accurate than Base.cbrt, the accuracy of 1ULP is satisfied.

However, for pow5_12, the spline-based PR #485 is faster than using this. Also, I haven't found an implementation which is faster than Base.cbrt(::Float32).

@kimikage
Copy link
Collaborator Author

The aim of cbrt01 is to eliminate the following branch.

function fxyz2lab(v)
ka = oftype(v, 841 / 108) # (29/6)^2 / 3 = xyz_kappa / 116
kb = oftype(v, 16 / 116) # 4/29
v > oftype(v, xyz_epsilon) ? cbrt(v) : muladd(ka, v, kb)
end

For example:

@inline function fxyz2lab(v)
    ka = oftype(v, 841 / 108) # (29/6)^2 / 3 = xyz_kappa / 116
    kb = oftype(v, 16 / 116) # 4/29
    vc = @fastmath max(v, oftype(v, xyz_epsilon))
    @fastmath min(cbrt01(vc), muladd(ka, v, kb))
end

@kimikage
Copy link
Collaborator Author

The use of cbrt is limited by PR #489 and PR #491. This makes the propagation of precompilation problems less likely. However, there are still problems with some conversions.

It is important to note that functions that internally convert colors (e.g. colordiff and distinguishable_colors) may be affected by the problems.

I will open a new issue, as the discussion has moved beyond specific functions such as exp and cbrt.

@kimikage
Copy link
Collaborator Author

kimikage commented Jun 21, 2021

BTW, the next most problematic function is atan (atand). The function is slow, even if there were no precompilation problems.

In practice, the current atand(::Float32) is not so accurate.

julia> atand(50.85106f0, -24.293373f0), Float32(atand(big(50.85106f0), big(-24.293373f0)))
(115.53545f0, 115.53548f0)

@kimikage

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant