-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix (log)(c)cdf
with Inf
, -Inf
and NaN
#1348
Conversation
Note: The simplified implementation for |
It's there something to look out for in the review? |
Hmm I'm not sure. Hopefully I could convey the main ideas in the comment above and then I guess it depends on which points you want to examine or discuss more carefully. I assume that src/univariates.jl should show the main structure. The changes in src/truncate.jl, src/univariate/locationscale.jl and src/mixtures/... show how it simplifies (and fixes) the existing implementation. And finally the changes in src/univariate/... fix and simplify different (log)(c)cdf implementations. I extended the tests and (log)(c)cdf are tested for all modified distributions (I only changed these distributions since tests failed), so I am quite confident that the changes are correct. But it's always better if someone else checks it as well 🙂 |
Codecov Report
@@ Coverage Diff @@
## master #1348 +/- ##
==========================================
+ Coverage 82.23% 82.73% +0.50%
==========================================
Files 116 116
Lines 6635 6672 +37
==========================================
+ Hits 5456 5520 +64
+ Misses 1179 1152 -27
Continue to review full report at Codecov.
|
The LogExpFunctions PR was merged and released, so tests pass now. |
This PR simplifies the
(log)(c)cdf
implementations and fixes the evaluation with-Inf
,Inf
, andNaN
.There are some problems with the current implementation:
NaN
consistently: e.g.cdf(d::DiscreteUnivariateDistribution, x::Real) = cdf(d, floor(Int, x))
etc and a default implementation forcdf(::DiscreteUnivariateDistribution, x::Integer)
. This is problematic sincecdf(d, x::Real)
andcdf(d, x::Integer)
for e.g. discreteLocationScale
,Truncated
andMixtureModel
andDiscreteNonParametric
to avoid method ambiguity errorscdf
for integers is defined that might not make sense or in the worst case silently produce incorrect results, e.g.logcdf(Dirac(3.4), 3.5)
is forwarded tologcdf(Dirac(3.4), 3)
which then callslog(cdf(Dirac(3.4, 3))) = log(0) = -Inf
). These bugs are difficult to discover and to avoid when implementing a discrete distribution (I fixed some of these forDiscreteNonParametric
in GeneralizeLocationScale
to discrete distributions #1286).cdf(d, -Inf)
,cdf(d, Inf)
, or `cdf(d, NaN) with these definitions: e.g.Poisson
this can be fixed by generalizing the StatsFuns macro to inputs of typeReal
but this is still an issue for other native implementations such asCategorical
for which the same errors are thrown. The evaluation ofcdf(d, -Inf)
orcdf(d, Inf)
is useful e.g. in the construction of truncated distributions - it allows to remove the current type heuristics (not included in this PR).The PR fixes these problems by
cdf(d::DiscreteUnivariateDistribution, x::Real)
but notcdf(::DiscreteUnivariateDistribution, x::Integer)
, similar to [Breaking] Fix inconsistent fallback behaviour of logpdf and pdf #1171cdf_int(d, x)
which assumes integer-valued support but not inputs of typeInt
cdf_int(d, x)
handlesInf
,-Inf
, andNaN
and callscdf(d, floor(Int, x))
for other values (not implemented!)cdf(d, ::Int)
and gets support for real values includingInf
,-Inf
, andNaN
for freecdf(d, ::Real)
but notcdf(d, ::Int)
(and there exists no incorrect default implementation)cdf(d, ::Int)
explicitlyintegerunitrange_cdf
etc. instead of implementingcdf(d, ::Int)
from scratch if the distribution has a unitrange of integers as support (currently the default implementation ofcdf(d, ::Int)
)Truncated
,LocationScale
andMixtureModel
can be simplified.cdf(d, x::Real)
etc. instead ofcdf(d, x::Int)
for discrete distributionsNaN
,Inf
, and-Inf
correctly and can deal with non-integer inputs (hopefully soon there will be many more native implementations: Use julia implementations for pdfs and some cdf-like functions StatsFuns.jl#113)(log)(c)cdf
of many univariate distributions