Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JET reports type-instability when sampling from Beta #1866

Open
bvdmitri opened this issue May 31, 2024 · 8 comments
Open

JET reports type-instability when sampling from Beta #1866

bvdmitri opened this issue May 31, 2024 · 8 comments

Comments

@bvdmitri
Copy link
Contributor

Consider the following:

julia> @report_opt rand(Beta(1, 1), 10)
═════ 1 possible error found ═════
┌ rand(s::Beta{Float64}, dims::Int64) @ Distributions /Users/bvdmitri/.julia/packages/Distributions/fgrZq/src/genericrand.jl:22
│┌ rand(::Random.TaskLocalRNG, ::Beta{Float64}, ::Int64) @ Distributions /Users/bvdmitri/.julia/packages/Distributions/fgrZq/src/genericrand.jl:24
││┌ rand(rng::Random.TaskLocalRNG, s::Beta{Float64}, dims::Tuple{Int64}) @ Distributions /Users/bvdmitri/.julia/packages/Distributions/fgrZq/src/genericrand.jl:52
│││ runtime dispatch detected: Distributions.rand!(rng::Random.TaskLocalRNG, %3::Distributions.BetaSampler{Float64}, %2::Vector{Float64})::Vector{Float64}
││└────────────────────

The Normal works just fine

julia> @report_opt rand(Normal(1, 1), 10)
No errors detected

Also sampling a single point is fine

julia> @report_opt rand(Beta(1, 1))
No errors detected

This harms performance, when sampling "inplace" with Beta (with rand!)

@devmotion
Copy link
Member

The reason for this observation is that sampler(Beta(...)) is not type stable: Depending on the parameters, a different algorithm is used for sampling. #1350 "fixed" the performance and allocation issues arising from this type instability with a function barrier. Based on the benchmarks in that PR it seems it doesn't harm performance anymore.

@bvdmitri
Copy link
Contributor Author

Ok, that explains a bit the situation. The real issue is that sampler(Beta(..., ...)) is type-unstable and depends on the parameters. But can it always return the same object, which would simply have an if inside depending on the parameters? E.g.

sampler(d::Beta) = BetaSampler(d, other_fields...) # type-stable, always returns the same

function rand!(rng, sampler::BetaSampler, container)
    if some_condition(sampler)
        one_algorithm!(rng, container)
    else
        another_algorithm!(rng, container)
    end
end

Or a specialized method for rand! for Beta would also solve this issue entirely, e.g.

function rand!(rng, dist::Beta, n)
    if some_condition(dist)
        rand!(rng, OnerBetaSampler(dist), n)
    else 
        rand!(rng, AnotherBetaSampler(dist), n)
    end
end

@devmotion
Copy link
Member

But can it always return the same object

The motivation for sampler is to precompute algorithm-dependent quantities when drawing multiple variates. The number and type of these quantities are quite different for different algorithms, as you can see in https://github.com/JuliaStats/Distributions.jl/blob/master/src/samplers/gamma.jl. Always computing all of these would be a bit wasteful I think.

@devmotion
Copy link
Member

Or a specialized method for rand! for Beta would also solve this issue entirely

I still wonder, is there any issue here? It seems the function barrier in #1350 fixed the performance issues.

@bvdmitri
Copy link
Contributor Author

It shouldn't recompute algorithm-dependent quantities in my second proposal with a specialized rand! since it basically exactly the same code with an explicit static if statement (multiple dispatch is really just a sophisticated runtime if statement).

To answer your question about the performance, yes, it does affect performance in the following case:

julia> foo(x, dist) = foreach(1:100_000) do _
           rand!(dist, x)
       end

julia> x = zeros(10);

julia> dist = Beta(1, 1)
Beta{Float64}=1.0, β=1.0)

julia> sr = sampler(dist)

julia> @btime foo($x, $dist);
  30.232 ms (100000 allocations: 6.10 MiB)

julia> @btime foo($x, $sr);
  27.726 ms (0 allocations: 0 bytes)

It's not much in this particular example, but it does accumulate in our larger program and it also allocates extra (while the whole idea of rand! is to sample in place)

@bvdmitri
Copy link
Contributor Author

bvdmitri commented May 31, 2024

Another thing, I mentioned that sampling a single point is fine and is in fact type-stable.

julia> @report_opt rand(Beta(1, 1))
No errors detected

That suggests that there is a potential for improvement for the in-place version

@devmotion
Copy link
Member

To answer your question about the performance, yes, it does affect performance in the following case:

The example is a bit contrieved since ideally you would use the sampler if you want to sample from the same distribution in a loop (similar to how the Random docs show that you should construct and use a Random.Sampler if you want to call rand repeatedly - BTW there's an issue to unify sampler with Random.Sampler a bit more: #1316).

The single variate sampling does not use samplers, hence it behaves differently.

@bvdmitri
Copy link
Contributor Author

In the real code the distribution is not the same unfortunately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants