Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support intervals #1809

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

support intervals #1809

wants to merge 4 commits into from

Conversation

aplavin
Copy link
Contributor

@aplavin aplavin commented Dec 9, 2023

Add extension for IntervalSets (the most popular intervals package) to support its intervals where it makes sense in Distributions.
I defined and used this Uniform method a few times myself, others are included here for completeness.

No tests for now, looking for a general confirmation first.

@codecov-commenter
Copy link

codecov-commenter commented Dec 26, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (8e231ed) 85.98% compared to head (21b1d6c) 85.82%.
Report is 9 commits behind head on master.

Files Patch % Lines
ext/DistributionIntervalSetsExt.jl 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1809      +/-   ##
==========================================
- Coverage   85.98%   85.82%   -0.17%     
==========================================
  Files         144      145       +1     
  Lines        8648     8654       +6     
==========================================
- Hits         7436     7427       -9     
- Misses       1212     1227      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aplavin
Copy link
Contributor Author

aplavin commented Jan 4, 2024

bump...

@devmotion
Copy link
Member

I wonder if there is general interest for this functionality, the application seems a bit niche to me since most distributions have a fixed support that can't be adjusted based on a given interval. My gut feeling is that it is easy enough to extract the endpoints of an interval and call the standard constructors if one wants to construct e.g. a Uniform distribution on a given interval.

@aplavin
Copy link
Contributor Author

aplavin commented Jan 5, 2024

Many interval operations are "easy enough" to do without a special interval type at all, by working with endpoints directly. The whole point of an interval type (same with many other types, such as Pair or NamedTuple, or even Uniform itself) is that it adds semantics to a bunch of values, and allows to keep this semantic across function calls.

We use these functions not because they are hard to implement without a special type (they're not!), but because they are natural and convenient:

julia> i = 1..3

julia> 2  i
true

julia> range(i, length=5)
1.0:0.5:3.0

Isn't it natural to create a uniform distribution on a given interval?..

Uniform(i)
# instead of
Uniform(leftendpoint(i), rightendpoint(i))

@aplavin
Copy link
Contributor Author

aplavin commented Jan 5, 2024

most distributions have a fixed support that can't be adjusted based on a given interval

Not sure if I understand this part. Of course it doesn't make sense to have a Normal(interval) constructor, but for some distributions such a constructor is a good fit.

@chelate
Copy link

chelate commented Jan 9, 2024

I just want to point out that intervals are somewhat fundamental objects in probability (the Borel algebra), and that prob(interval,dist) = cdf(dist,interval.upper) - cdf(dist,interval.lower) or similar could be pretty neat. Likewise conditional probabilities, empirical samplers, histograms, all kinds of stuff!

@aplavin
Copy link
Contributor Author

aplavin commented Jan 9, 2024

Yes, this kind of probability calculations is convenient sometimes, but orthogonal to this PR. Here, I add intervals support to already existing functions, no new functions added.

For playing with distributions and probabilities, I have a snippet defining an interface similar but more general than your prob(...) suggestion.
Usage:

((0..1), Normal(0, 1))
(<(1), Normal(0, 1))
...
Some function definitions
(f::Base.Fix2{typeof(<=)}, d::UnivariateDistribution) = cdf(d, f.x)
(f::Base.Fix2{typeof(> )}, d::UnivariateDistribution) = ccdf(d, f.x)

function (f::Base.Fix2{typeof(∈), <:Interval}, d::UnivariateDistribution)
    Pr = isrightclosed(f.x) ? (<=(rightendpoint(f.x)), d) : (<(rightendpoint(f.x)), d)
    Pl = isleftclosed(f.x)  ? (<(leftendpoint(f.x)), d)   : (<=(leftendpoint(f.x)), d)
    return Pr - Pl
end
Can potentially make sense to add to DIstributions as well, but not in this PR as these are completely unrelated. Or just make a separate package with the `ℙ` function, it doesn't need to live here in `Distributions`.

@aplavin
Copy link
Contributor Author

aplavin commented Feb 1, 2024

Gentle bump...
This PR adds support for intervals arguments to existing methods where it makes sense and is totally unambiguous. Everything else including @chelate's suggestions is out of scope here and can easily live in a separate package.

@devmotion
Copy link
Member

I guess such definitions might be convenient for someone working with IntervalSets, but my feeling is that these should rather be put in an extension of IntervalSets (similar to existing extensions for Random, Statistics, and StatsBase).

@aplavin
Copy link
Contributor Author

aplavin commented Feb 2, 2024

Oh, sorry, I forgot to explicitly state the motivation to put this into Distributions indeed!
This extension is much more likely to require changes due to changes in Distributions, than due to changes in IntervalSets. For example, when new distributions/functions are added here.
Also, IntervalSets is a much much more lightweight package.

@devmotion
Copy link
Member

My concern is that putting these definitions in an extension in Distributions adds maintenance burden for developers that seemingly so far have never expressed a desire to use IntervalSets for constructing the few univariate distributions whose support is an interval that fully characterizes the parameters of this distribution type. The use case still seems quite limited to me, and hence I think one has not be too worried about new distributions. Since the PR also only uses exported and official APIs, I don't think potential breakages are a problem either.

To me it seems contributors of IntervalSets are much more interested in such a functionality than developers of Distributions, so I guess an extension in IntervalSets would be preferable.

@aplavin
Copy link
Contributor Author

aplavin commented May 6, 2024

In case others come looking for this functionality: the interval support (this PR) is available in https://github.com/JuliaAPlavin/DistributionsExtra.jl. Not sure if I want to register that package yet...

It has more than that, btw – including what @chelate suggested above, stuff like ℙ(>(0), Poisson(1)) or ℙ((@o abs(_) > 2), Normal(0, 1)).

@aplavin
Copy link
Contributor Author

aplavin commented Sep 1, 2024

In the last commit, I tried a complete switch from RealInterval to intervals from IntervalSets – please take a look. It turned out to be very straightforward with few lines modified!

This change is non-breaking afaict, and enables all kinds of operations on distributions supports – IntervalSets are quite widely supported. A few random examples:

support(d1)  support(d2)
clamp(x, support(d))
censored(d1, support(d2))
using Makie
lines(support(d), x -> logpdf(d, x))
using ApproxFun
Fun(x -> pdf(d, x), support(d))

If there's no specific reason to stick to the current Distributions.RealInterval implementation (that only defines few relevant functions and not really integrated with anything), would be nice to do this.

IntervalSets supports open/closed endpoints natively, can utilize it in the future – discussed long ago in #457.
Also ref #1254 (comment).

@devmotion
Copy link
Member

Existing tools in external packages could surely be useful. Note, however, that my comment was about discrete distributions for which AFAICT IntervalSets would be insufficient and we would need DomainSets or something different such as InfiniteArrays.

Regarding the previous changes (constructors with intervals) my opinion remains unchanged (see above).

So if you'd like to replace RealInterval with IntervalSets.Interval I suggest opening a separate PR (or reverting the other changes).

@aplavin
Copy link
Contributor Author

aplavin commented Sep 1, 2024

Note, however, that #1254 (comment) was about discrete distributions for which AFAICT IntervalSets would be insufficient and we would need DomainSets or something different such as InfiniteArrays.

Sure, IntervalSets doesn't solve all problems in the world :) I'm simply suggesting to use those intervals wherever makes sense, not literally everywhere – for one, discrete distributions stay as-is.

So if you'd like to replace RealInterval with IntervalSets.Interval I suggest opening a separate PR (or reverting the other changes).

IMO, going halfway here is weird: use intervalsets in some places, but not everywhere where they make sense. That's why the PR remains coherent with the same goal (as stated in its title) – to support an actual widely-adopted interval type throughout Distributions.

I've personally been using DistributionsExtra for convenience, see https://github.com/JuliaAPlavin/DistributionsExtra.jl/blob/master/src/intervals_integrations.jl for intervals integration specifically. Unfortunately, this way support() still returns RealInterval, but at least it is now convertible to an IntervalSet's one.
Just wanted to make this convenience and interoperability available to all Distributions users. Ofc, if keeping the interface fully frozen is a higher priority for such a stable package, I totally understand (Distributions2.jl anyone? :) )

@devmotion
Copy link
Member

use intervalsets in some places, but not everywhere where they make sense.

My point is that I think it doesn't make sense to use them to construct distributions but it does make sense to use them as an improvement of RealInterval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants