-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropping Requires-based extensions for Distributions, SpecialFunctions and StatsFuns? #11
Comments
@cjdoris maybe a solution would be to switch to the lighter https://github.com/JuliaMath/DensityInterface.jl ? |
That would still lead to a LogarithmicNumber-specific API that is inconsistent with the original API in DensityInterface. IMO it seems also easier to just let users call |
I agree - it removes dependencies, is clear to read and just as compact. |
On the other hand I like the fact that |
AFAICT this is different from the issue here. If |
Alright, my bad :) |
You don't mean |
Sure, I meant stuff like |
Oh, right, of course! |
Thanks @oschulz , it would be great to have this package loading faster! For some additional context, the core of the dependencies seems to be this: const DISTRIBUTIONS_OVERLOADS = [f => Symbol(:log, f) for f in [:cdf, :ccdf, :pdf]]
const STATSFUNS_DISTS = [:srdist, :nchisq, :hyper, :ntdist, :tdist, :binom, :pois, :fdist, :norm, :beta, :nfdist, :chisq, :gamma, :nbeta, :nbinom]
const STATSFUNS_OVERLOADS = [Symbol(d, f) => Symbol(d, :log, f) for d in STATSFUNS_DISTS for f in [:cdf, :ccdf, :pdf]]
const SPECIALFUNCTIONS_OVERLOADS = [f => Symbol(:log, f) for f in [:gamma, :factorial, :beta, :erfc, :erfcx]] This maps each function name So one possibility would be for this package to depend explicitly on StatsFuns and SpecialFunctions. But if the goal is to get rid of the One possibility of avoiding this would even allow this package to ignore StatsFuns and SpecialFunctions, and things could still "just work". Maybe there could be a very lightweight interface package [yet again, these just seem to solve everything] specifying higher-order functions For example, if these are called logcirc(::typeof(pdf)) = logpdf
logcirc(::typeof(cdf)) = logcdf
logcirc(::typeof(ccdf)) = logccdf
expcirc(::typeof(logpdf)) = pdf
expcirc(::typeof(logcdf)) = cdf
expcirc(::typeof(logccdf)) = ccdf |
But why would that be so unfortunate? From what I understand, we'd only lose convenience definitions like |
They are not light-weight, really - compare julia> using InverseFunctions # Load a super-lightweight package to get some initial Pkg costs out of the way
julia> @time_imports using SpecialFunctions, StatsFuns, Distributions, LogarithmicNumbers, MeasureBase
[...]
226.6 ms SpecialFunctions
[...]
92.7 ms StatsFuns
[...]
470.4 ms Distributions
[...]
379.7 ms LogarithmicNumbers
[...]
74.8 ms MeasureBase with julia> using InverseFunctions
julia> @time_imports using LogarithmicNumbers, MeasureBase
0.4 ms ┌ Requires
37.1 ms LogarithmicNumbers
[...]
287.2 ms MeasureBase So the load time of LogarithmicNumbers without requires is just about 40 ms, and the cost of MeasureBase on top of LogarithmicNumbers and it's other deps would just be 74.8. SpecialFunctions and Distributions are much heavier, and StatsFuns isn't exactly lightweight either - timed relatively on top of each other. If LogarithmicNumbers were to depend on SpecialFunctions and StatsFuns, it's load time would be over 300 ms, while without These times may not seem long, but larger use cases have many dependencies that add up, and suddenly you end up with load times of 10 seconds for a use case (and then I have to listen to people telling me "but in Python it would already be finished. :-) ). |
Having read this discussion and the one over on Distributions, I think I agree that just removing all those overloads is the right thing to do. No registered packages use the behaviour. |
This is a little off topic, but I'm generally opposed to making real design decisions for the benefit of people who think using JIT compilation for a short-running script is a good idea. That use case can benefit from static compiler work, otherwise it's a terrible cost model.
Good point, I had missed this. So, maybe the requires-es can just go away? Or maybe there could be a way for a I guess I'll wait for @cjdoris to weigh in here, since it's his package and all :) EDIT: Oops, I was typing and missed his comment ☝️ |
Btw |
I agree that the ttfx issue can be overstated sometimes, but in this case the overloads aren't doing much. It's not hard for someone to define
if they need it. |
Speaking as a future user of LogarithmicNumbers, an official shortcut for |
I fully agree, short-running scripts are certainly not a target use case for Julia. But I think it's still worth trying to keep package load time down as much as possible in general. For once, impressions matter, and in my experience package load times do immediately become a topic when advocating for (resp. introducing people tor Julia), no matter how often you tell people that they won't really start a new session that often. More importantly though, we currently have a bit of a plague of |
@oschulz Couldn't agree more, especially the semver part. I'm happy to add a shorthand, but the perfect name isn't jumping out.
|
I agree I think |
I like |
The "lazyness" may actually be useful part of it, in addition to preserving numerical precision and not exceeding float dynamical range. After all, in many cases there will be a |
Here's a neat use case for LogarithmicNumbers: Nested sampling produces an integral estimate - unfortunately the uncertainty on that estimate is a Gaussian only in log-space. But LogarithmicNumbers allows us to do write (using julia> using LogarithmicNumbers, Measurements
julia> lazyuexp(x) = exp(ULogarithmic, x)
julia> log_result = 4.2
julia> log_result_uncertainty = 0.1
julia> lazyuexp(log_result ± log_result_uncertainty)
exp(4.2 ± 0.1) If
This would look nice, I think: julia> lazyuexp(log_result ± log_result_uncertainty)
lazyuexp(4.2 ± 0.1) |
Very nice, @oschulz ! FWIW, here's my use of this in MeasureBase.jl: @inline function density_def(s::SuperpositionMeasure{Tuple{A,B}}, x) where {A,B}
(μ, ν) = s.components
insupport(μ, x) || return exp(ULogarithmic, logdensity_def(ν, x))
insupport(ν, x) || return exp(ULogarithmic, logdensity_def(μ, x))
α = basemeasure(μ)
β = basemeasure(ν)
dμ_dα = exp(ULogarithmic, logdensity_def(μ, x))
dν_dβ = exp(ULogarithmic, logdensity_def(ν, x))
dα_dβ = exp(ULogarithmic, logdensity_rel(α, β, x))
dβ_dα = inv(dα_dβ)
return dμ_dα / oneplus(dβ_dα) + dν_dβ / oneplus(dα_dβ)
end We generally work in terms of log-densities instead of densities, in order to avoid underflow. But for superpositions, the density is comparatively simple, but log-density is very awkward to express. LogarithmicNumbers gives us the best of both! MeasureBase is intended as a relatively low-level dependency, so avoiding |
Alright, LogarithmicNumbers is now dependency free! v1.2.0 is making its way through the package registration machine as we speak so will be available shortly. |
I think I'm going to not add |
Really enjoying the examples @oschulz and @cscherrer! |
Thanks a lot @cjdoris !
One advantage of having it in the package (maybe using another name) be that we could have a |
And now for the reward ... :-) Before: julia> using InverseFunctions # Load a super-lightweight package to get some initial Pkg costs out of the way
julia> @time_imports using SpecialFunctions, StatsFuns, Distributions, LogarithmicNumbers
[...]
226.6 ms SpecialFunctions
[...]
92.7 ms StatsFuns
[...]
470.4 ms Distributions
[...]
379.7 ms LogarithmicNumbers After: julia> using InverseFunctions # Load a super-lightweight package to get some initial Pkg costs out of the way
julia> @time_imports using SpecialFunctions, StatsFuns, Distributions, LogarithmicNumbers
204.0 ms SpecialFunctions
[...]
79.7 ms StatsFuns
[...]
483.4 ms Distributions
[...]
9.4 ms LogarithmicNumbers |
The
@requires
for Distributions, SpecialFunctions and StatsFuns turn LogarithmicNumbers from a lightweight package into an effectively quite heavy package, increasing the load time of packages depending on it (like @cscherrer's MeasureBase) to a degree that makes them unattractive as dependencies t themselves (I'd like to use MeasureBase.jl in BATBase.jl, for example, as part of bat/BAT.jl#351, but currently it's a too heavy because of LogarithmicNumbers' requires`).It looks like Distributions & friends will not support LogarithmicNumbers directly (JuliaStats/Distributions.jl#1545):
@cjdoris, Would you consider dropping the
@requires
, as suggested by @devmotion?The text was updated successfully, but these errors were encountered: