What `Distributions.pdf` returns, and when? #1820

aplavin · 2024-01-06T02:13:16Z

The docstring simply states:

  pdf(d::UnivariateDistribution, x::Real)

  Evaluate the probability density (mass) at x.

This surely isn't enough to know when it returns density, and when mass.
I thought I kinda knew the answer to this question, but recently turned out I was wrong.

I've always thought it returns density for Continous distributions, and mass for Discrete. But actually there are distributions for which pdf(d, x) sometimes returns mass and sometimes density – for the same d. One example is censored(...) distribution.

Would be nice to add some clarification to the pdf docstring to make writing generic code feasible. Currently, pdf doesn't guarantee anything useful unless one only considers specific individual distributions. Code like f(d) = ... pdf(d, x) ... without explicit constrains on d is impossible to reason about.

The text was updated successfully, but these errors were encountered:

devmotion · 2024-01-06T02:35:35Z

It always returns "the" probability density, the main question is just with respect to which base measure. Typically for discrete discrete distributions the base measure is the counting measure, which implies that the density function coincides with the probability mass function (see eg https://en.wikipedia.org/wiki/Probability_mass_function#Measure_theoretic_formulation).

aplavin · 2024-01-06T02:43:02Z

That's what I thought originally: regular reals measure for Continous, counting measure for Discrete. But what the measure is for censored() then?

sethaxen · 2024-01-06T07:27:31Z

Censored is an example of a probability measure of mixed type (see e.g. https://www.randomservices.org/random/dist/Mixed.html). While a density function is generally a Radon-Nikodym derivative of a probability measure wrt another measure (which is the measure wrt which it is absolutely continuous), I don't think this concept extends easily to mixed type probability measures. However, they have a clear notion of partial density, which depending on the base measure would give either the density of the continuous component or the discrete component.

For mixture distributions in general it might be more sensible to define the density as the probability-weighted sum of densities of the components wrt their respective measures instead of defining as absolutely continuous wrt some base measure. This definition would still support inference using censored data.

aplavin · 2024-01-06T14:50:33Z

That's a bit over my head... Let me just share my practical concern and motivation for opening this issue.

It was nice to see that Distributions defines the censored distribution with an interface seemingly consistent with all other distributions. But then I noticed that it's basically impossible to use this distribution with a generic function that does some computations based of pdf/logpdf.
For example, plugging censored(...) as part of the prior or likelihood into a bayesian analysis procedure would clearly lead to wrong inferences. pdf(d, x) of d = censored(Normal(0, 1), 0, Inf) is effectively indistinguishable from truncated(Normal(0, 1), 0, Inf) aside from 2x different normalization, and inference results will be the same for both.

Do you think it is possible at all to use Distributions.pdf generically? If so, how to avoid this kind of silently wrong results?

mschauer · 2024-01-09T10:31:47Z

julia> d = censored(Normal(0, 1), 0, Inf)
Censored(Normal{Float64}(μ=0.0, σ=1.0); lower=0.0, upper=Inf)

julia> pdf(d, 0.0001)
0.3989422784067213

julia> pdf(d, 0)
0.5

julia> pdf(d, -0.0001)
0.0

I think a digestible statement is: pdf(..., x) returns the probability density function or zero except when the probability of getting x exactly is larger than 0 (that is, x is an atom)

aplavin · 2024-01-09T12:31:53Z

Yeah, that's what it seems to return.
Then I believe it's indeed impossible to write a generic function that uses Distributions.pdf() and computes something nontrivial...

mschauer · 2024-01-09T12:54:05Z

What do you mean with the last statement? Is this good or bad?

aplavin · 2024-01-09T13:03:39Z

I mean that generic functions that use pdf, like f(dist) = ... pdf(dist, x) ... or f(dist::ContinuousUnivariateDistribution) = ... pdf(dist, x) ..., actually cannot be written. Assuming one wants the results to be correct, of course :)
In particular, integration or MCMC sampling cannot be made to work correctly without dispatching on specific distributions manually. Integral of pdf for a ContinuousUnivariateDistribution isn't always 1 as could be expected by some.

Is this good or bad?

I don't think this is good, but such statements are always subjective.

mschauer · 2024-01-09T13:42:18Z

If you are interested, can you read #1468 which also refers to this discussion: JuliaMath/DensityInterface.jl#4 (comment) and see if that somehow relates to your problem?

aplavin · 2024-01-09T15:37:02Z

Seems like it is related, but doesn't really help much.
If that's indeed the case, that pdf() cannot be used correctly in generic functions for now, should this be stated/warned directly in pdf() docstring?
Together with this definition by @mschauer:

pdf(..., x) returns the probability density function or zero except when the probability of getting x exactly is larger than 0 (that is, x is an atom)

(if this is correct)

mschauer · 2024-01-09T16:31:22Z

It would help if you explain on a concrete example what cannot be done.

aplavin · 2024-01-16T01:11:18Z

Basically any function that uses Distributions.pdf is incorrect for these "mixed-type" distributions. As a more concrete example, take MCMC with the simplest Metropolis sampling: it won't be able to distinguish between eg censored and truncated distributions, while the correct results are clearly different for them.

Actually, I don't know if it is possible to write any function f(dist) or f(dist::ContinuousUnivariateDistribution) that uses pdf() and computes some nontrivial quantity correctly for all dists...

mschauer · 2024-01-16T15:11:42Z

So this is wrong:

using Distributions
function mh(target, proposal; init=0.0, samples=100_000)
    x = init
    xs = [x]
    for iter in 2:samples
        xᵒ = rand(proposal(x))
        l = logpdf(target, xᵒ) - logpdf(target, x) +  logpdf(proposal(xᵒ), x) -  logpdf(proposal(x), xᵒ)
        if -rand(Exponential()) < l
            x = xᵒ
        end
        push!(xs, x)
    end
    return xs
end

xs = mh(Normal(3, 1.2), x->Normal(x, 0.1), init = 3.0) # correct
xs = mh(censored(Normal(3, 1.2), 2.0, Inf), x->Normal(x, 0.1), init = 3.0) # incorrect, samples a truncated normal.

How can we prevent this? @devmotion @sethaxen

aplavin changed the title ~~What Distributions.pdf returns exactly?~~ What Distributions.pdf returns, and when? Jan 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What `Distributions.pdf` returns, and when? #1820

What `Distributions.pdf` returns, and when? #1820

aplavin commented Jan 6, 2024

devmotion commented Jan 6, 2024

aplavin commented Jan 6, 2024 •

edited

Loading

sethaxen commented Jan 6, 2024

aplavin commented Jan 6, 2024 •

edited

Loading

mschauer commented Jan 9, 2024

aplavin commented Jan 9, 2024

mschauer commented Jan 9, 2024

aplavin commented Jan 9, 2024 •

edited

Loading

mschauer commented Jan 9, 2024

aplavin commented Jan 9, 2024

mschauer commented Jan 9, 2024

aplavin commented Jan 16, 2024 •

edited

Loading

mschauer commented Jan 16, 2024 •

edited

Loading

What Distributions.pdf returns, and when? #1820

What Distributions.pdf returns, and when? #1820

Comments

aplavin commented Jan 6, 2024

devmotion commented Jan 6, 2024

aplavin commented Jan 6, 2024 • edited Loading

sethaxen commented Jan 6, 2024

aplavin commented Jan 6, 2024 • edited Loading

mschauer commented Jan 9, 2024

aplavin commented Jan 9, 2024

mschauer commented Jan 9, 2024

aplavin commented Jan 9, 2024 • edited Loading

mschauer commented Jan 9, 2024

aplavin commented Jan 9, 2024

mschauer commented Jan 9, 2024

aplavin commented Jan 16, 2024 • edited Loading

mschauer commented Jan 16, 2024 • edited Loading

What `Distributions.pdf` returns, and when? #1820

What `Distributions.pdf` returns, and when? #1820

aplavin commented Jan 6, 2024 •

edited

Loading

aplavin commented Jan 6, 2024 •

edited

Loading

aplavin commented Jan 9, 2024 •

edited

Loading

aplavin commented Jan 16, 2024 •

edited

Loading

mschauer commented Jan 16, 2024 •

edited

Loading