DiscreteDistribution object should store reference to continuous distribution it was approximated from #949

sbenthall · 2021-02-11T17:43:22Z

So that it can be inspected and its underlying mu, sigma, count values (for example) can be compared to new configurations.

sbenthall · 2021-02-12T15:43:34Z

An implementation:

when a continuous variable is approxed to a DiscreteDistribution, have the latter:
- keep a pointer to the CV
- keep the arguments given to approx
in the case where combineIndepDstns is used, the above step isn't possible...
Ultimately, combineIndepDstns should be replaced; we should have a continuous multivariate distribution class analogous to the other continuous distribution

…rox args, fixes econ-ark#949

llorracc · 2021-02-17T20:13:03Z

Note that there are many possible ways to discretize any particular continuous distribution, which will have different parameters, so we will need to keep track of the method and its parameters. For example, for a continuous distribution, you could have equiprobable, Hermite, bounded equiprobable, truncated equiprobable, bounded equiprobable, etc.

sbenthall · 2021-02-17T21:02:18Z

that is exactly what is done in PR #954

sbenthall · 2021-02-18T18:25:44Z

To be clear: PR #954 assumes the approx method picks the discretization procedure based on a keyword argument.
The DiscreteDistribution tracks the keyword arguments of approx.

Currently, we have do not have multiple discretization procedures for a single distribution type. If there was an example of an alternative discretization procedure implemented in code, it would be easier to demonstrate how this would work, and discuss alternative architectures.

sbenthall · 2021-02-18T18:26:28Z

See #804

llorracc · 2021-02-18T23:16:59Z

This was discussed at least as long ago as #121, and a number of examples were scattered through the diffuse discussions that were concentrated and distilled here.

The concrete instance we should probably start with is the case of the normal (and lognormal) distributions. Two distinct choices are important:

How to handle any "tail mass" (at the top and the bottom) that cannot be captured because any discretization will have a bottommost and a topmost point, but the continuous distribution may go to positive or negative infinity
How to distribute mass points and probability weights for the probability mass between the topmost and bottommost points.

For distributing the "tail mass" there are two common choices: ["truncation"]((https://reference.wolfram.com/language/ref/CensoredDistribution.html) and "censoring". With "censoring", all of the mass of points beyond the limit is assigned to a point AT the limit.

For distributing the points inside the interval, there are many possible choices. The most popular two are Gauss-Hermite and equiprobable, but many other choices are possible.

Mathematica handles all of this by having a set of predefined objects that it know are discrete distributions, and to which it accordingly knows it can apply certain operations (mean, variance, etc). Similarly for continuous distributions.

I'm not proposing anything remotely so fancy.

Suppose I want to define the "true" model as one in which the distribution of income shocks is a censored lognormal with the censoring taking place at three standard deviations.

Then we might have, say, three or four different "approximated" models:

Equiprobable between [-3 sigma, +3 sigma], with 7 equiprobable points
Equiprobable between [-3 sigma, +3 sigma], with 21 equiprobable points
Censored with the correct censored probability masses at exactly -3 sigma and +3 sigma, and then equiprobable between, with 7 points
Censored with the correct censored probability masses at exactly -3 sigma and +3 sigma, and then Gauss-Hermite between, with 9 points

I'm not proposing that we build a tool to which we can give all of these and many other various possibilities, and it does all the work. Instead, I'm proposing that we tell the user: "You write a function which has a set of arguments. The function, when called with those arguments, generates the discrete distribution you want. What you pass to our Solve and our Simulate routines is your function, and the specific parameter values you want it evaluated at if we get to the point where we need it. (Which we surely will unless there's a bug in the code before that point)."

Dolo kind of works this way -- at least if I recall correctly the substance of a conversation with Pablo a while ago when I asked "what if the user doesn't want to use one of your standard built in discretization routines" and he said "there's a hook in the code where they can build in whatever alternative they want."

sbenthall · 2021-02-19T00:46:00Z

I was unclear. I meant an example of multiple discretization procedures for the same distribution written in Python code.

I would not feel comfortable, myself, implementing what you propose without being able to demonstrate its functionality with at least two such functions.

Can you point to the Dolo code that does this?

sbenthall · 2021-02-22T14:44:22Z

Ah , it looks like we do have code for both equiprobable and Gauss-Hermite approximations to lognormal in HARK, sort of...

This starts with the parameters of the continuous distribution, and returns a discrete distribution.

HARK/HARK/distribution.py

Line 600 in e2fe6cf

def approxLognormalGaussHermite(N, mu=0.0, sigma=1.0, seed=0):

Contrast this with the current implementation of approx which is a method on the continuous distribution:

HARK/HARK/distribution.py

Line 98 in e2fe6cf

def approx(self, N, tail_N=0, tail_bound=None, tail_order=np.e):

I think what you are proposing is that these two modes of discretization should exist as standalone functions. Maybe that's the current architecture of approxLognormalGaussHermite. But you'll notice that the resulting DiscreteDistribution does not, in that case, understand that it's derived from a continuous lognormal distribution. In fact, this design undercuts the object-oriented way we've been moving towards for Distributions.

Have I misunderstood you?

llorracc · 2021-03-06T20:44:01Z

Took a closer look at how this is currently handled in the case of our widely used mean-one approximation to the lognormal, which is via the MeanOneLognormal class, and the approx method applied to it.

At some point I'd propose we make the following changes:

Call it Lognormal_MeanOne (my preference is for names to go from the general to the specific)
The approx method needs to take several different options, each of which has different parameters: (a) Equiprobable (just the number of points); (b) Gauss-Hermite (just the number of points); (c) Censored-Equiprobable (number of points, and upper and lower censoring points; (d) Censored-Gauss-Hermite (number of points, and upper and lower censoring points.

As I remarked somewhere else, I very much like the architecture of how Mathematica handles this. Doing something similar may be too heavy a lift for our purposes, but it might not be so hard -- I'm really not sure.

sbenthall · 2021-03-06T21:19:39Z

No underscores in class names.
Otherwise, I'm following what you're saying.

llorracc · 2021-03-06T22:24:09Z

OK; you're the PEP8 jedi master. But, yes, I think the right way to think about it is that a discrete approximation is a thing that is done to a continuous distribution. And there are multiple different ways to do it, which be combined (censoring and equiprobable, censoring and hermite, truncated and hermite, etc).

sbenthall added this to the 1.0.0 milestone Feb 11, 2021

sbenthall self-assigned this Feb 14, 2021

sbenthall added a commit to sbenthall/HARK that referenced this issue Feb 17, 2021

DiscreteDistribution keeps link to parent ContinuousDistribution, app…

7bcdc15

…rox args, fixes econ-ark#949

sbenthall linked a pull request Feb 17, 2021 that will close this issue

DiscreteDistribution keeps link to parent ContinuousDistribution #954

Open

3 tasks

sbenthall modified the milestones: 1.0.0, 1.1.0 Feb 18, 2021

sbenthall added the Function: Distributions label Feb 18, 2021

sbenthall removed their assignment May 6, 2021

sbenthall mentioned this issue Dec 16, 2021

Discretization methods as a general type of object/function that takes Continuous Distributions #1091

Open

sbenthall mentioned this issue Jan 20, 2022

functional simulation tests to enable external distribution classes #1105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DiscreteDistribution object should store reference to continuous distribution it was approximated from #949

DiscreteDistribution object should store reference to continuous distribution it was approximated from #949

sbenthall commented Feb 11, 2021

sbenthall commented Feb 12, 2021

llorracc commented Feb 17, 2021

sbenthall commented Feb 17, 2021

sbenthall commented Feb 18, 2021

sbenthall commented Feb 18, 2021

llorracc commented Feb 18, 2021 •

edited

Loading

sbenthall commented Feb 19, 2021

sbenthall commented Feb 22, 2021

llorracc commented Mar 6, 2021

sbenthall commented Mar 6, 2021

llorracc commented Mar 6, 2021

DiscreteDistribution object should store reference to continuous distribution it was approximated from #949

DiscreteDistribution object should store reference to continuous distribution it was approximated from #949

Comments

sbenthall commented Feb 11, 2021

sbenthall commented Feb 12, 2021

llorracc commented Feb 17, 2021

sbenthall commented Feb 17, 2021

sbenthall commented Feb 18, 2021

sbenthall commented Feb 18, 2021

llorracc commented Feb 18, 2021 • edited Loading

sbenthall commented Feb 19, 2021

sbenthall commented Feb 22, 2021

llorracc commented Mar 6, 2021

sbenthall commented Mar 6, 2021

llorracc commented Mar 6, 2021

llorracc commented Feb 18, 2021 •

edited

Loading