distribution truncation #804

sbenthall · 2020-08-18T22:05:21Z

Reopening just because this seems like the natural place to elaborate a future improvement that should be on the agenda:

For distributions that are unbounded, we should require an explicit truncation parameter

For normal and lognormal distributions this should be a multiple of the number of standard deviations

If the user does not specify a truncation point, we should use one by default (say, 5 standard deviations)

There are a number of ways to truncate. If you're discretizing anyway, a natural choice is simply to calculate the mean of the variable in the truncated region, and use that as a mass point. Having thought about it carefully, I've decided this is not optimal, because it means that the location of the lowest and highest mass points is not transparent to the human constructing the code, and requires a bit of extra accounting to keep track of.

Our default choice (I think; possibly further discussion will change my mind) should be to add the mass of the excluded points at exactly the truncation boundary. That is, if the choice is to truncate a normal distribution at 4 standard deviations, then there should be two symmetric mass points, at plus and minus 4 standard deviations from the mean, and with mass equal respectively to the mean value of the unbounded stochastic variable if its realization is lower than 4 standard deviations below the mean, or its mean value when its value is greater than 4 standard deviations above the mean.

A mildly undesirable consequence of this choice:

The standard deviation (and possibly the mean, for asymmetrical distributions like lognormal) of the truncated distribution will be slightly different from the standard deviation of the unbounded (untruncated) distribution.
But on the positive side:

It is transparent (as above)

No extra variables (like, the mean values mentioned above) need to be calculated
(Crucially): Changes in the number of points used in the discretization will have no effect on the location of the minimum point (which is something to which numerical solutions can be exquisitely sensitive

The obvious "true" (Platonic) version of the model is the version as the number of discretization points goes to infinity. (This is useful in adjudicating which of competing numerical solutions is "closer" to "truth" which can be non-obvious for some discretization schemes).

mnwhite · 2024-07-03T15:17:39Z

This remains important. Our equiprobable discrete lognormal approximation method incorporates truncation, but there's no current way to indicate that the distribution itself is truncated, not just its approximation. This is part of the HARK 1 roadmap, so archiving with that tag.

sbenthall added this to the 2.x.y milestone Aug 18, 2020

sbenthall mentioned this issue Aug 18, 2020

Python class representation for discretized probability distributions #519

Closed

MridulS mentioned this issue Aug 24, 2020

Income process overhaul: Truncated lognormal #121

Closed

sbenthall mentioned this issue Feb 9, 2021

generalize configuration of income process #673

Closed

sbenthall mentioned this issue Feb 18, 2021

DiscreteDistribution object should store reference to continuous distribution it was approximated from #949

Open

mnwhite added the Tag: 1.0 About the v1 release of HARK. label Jul 3, 2024

mnwhite closed this as completed Jul 3, 2024

github-project-automation bot added this to Issues & PRs Aug 28, 2024

github-project-automation bot moved this to Done in Issues & PRs Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distribution truncation #804

distribution truncation #804

sbenthall commented Aug 18, 2020

mnwhite commented Jul 3, 2024

distribution truncation #804

distribution truncation #804

Comments

sbenthall commented Aug 18, 2020

mnwhite commented Jul 3, 2024