HDI naming #2306

sboukortt · 2024-01-13T09:46:50Z

Hi,

I’m opening this because it’s not really clear why, in ArviZ’s output, the bounds for, say, a 94% HDI are named hdi_3% and hdi_97%. Such names seem to imply that they correspond respectively to the 3rd and 97th percentiles of the posterior distribution, which would make them equal-tailed credible intervals if that were the case, but it isn’t necessarily. (HDIs and equal-tailed credible intervals don’t have to coincide.) Indeed, the following program shows a counterexample:

import arviz as az
import pymc as pm
import scipy.stats

with pm.Model() as model:
    pm.Exponential('expon', scale=100)
    trace = pm.sample(10000)

summary = az.summary(trace, hdi_prob=.94)
print(summary)

hdi_lower = summary['hdi_3%'].expon
hdi_upper = summary['hdi_97%'].expon

dist = scipy.stats.expon(scale=100)

print(f"\nThe 94% HDI ([{hdi_lower:.3f}, {hdi_upper:.3f}]) contains {dist.cdf(hdi_upper) - dist.cdf(hdi_lower):.3%} of the probability.")
print(f"{dist.cdf(hdi_lower):.3%} of the probability is to the left of hdi_3%.")
print(f"{dist.sf(hdi_upper):.3%} of the probability is to the right of hdi_97%. ({dist.cdf(hdi_upper):.3%} to its left.)\n")

print(f"The equal-tailed 94% credible interval is [{dist.ppf(0.03):.3f}, {dist.ppf(0.97):.3f}].")

Here is its output:

          mean       sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
expon  100.309  100.502   0.001  281.632       0.79    0.559   10051.0    8522.0    1.0

The 94% HDI ([0.001, 281.632]) contains 94.016% of the probability.
0.001% of the probability is to the left of hdi_3%.
5.983% of the probability is to the right of hdi_97%. (94.017% to its left.)

The equal-tailed 94% credible interval is [3.046, 350.656].

Is there maybe another reason for the naming? Or should it perhaps be reconsidered?

Thanks.

The text was updated successfully, but these errors were encountered:

aloctavodia · 2024-01-16T13:27:47Z

Thanks for reporting this. Probably a better name would be hdi_lb (lower bound) and hdi_up (upper bound). Do you want to submit a PR?

sethaxen · 2024-02-01T22:29:29Z

A downside to the proposal of hdi_lb and hdi_ub is that the column names no longer inform which HDI was included. Would it be crazy if we have a single hdi_94% column that contains a 2-tuple or length 2 list with the bounds?

aloctavodia · 2024-02-02T00:47:14Z

And what about something like hdi_94_lb? The downside is that is maybe too long. Having tuples save horizontal space.

sethaxen · 2024-02-02T08:48:05Z

In arviz, I think numbers would be shown to 3 decimal points, so if they're on unit scale, we have 4 digits plus decimal plus maybe a sign plus whitespace for 12-14 total characters for the column contents. If using a tuple, this is 14-16 characters. While hdi_94% would just take 7 characters, using column names hdi_94_lb and hdi_94_ub would take 20 characters. hdi94_lb would be better, but still, this is 2-4 characters more than the tuple.

In PosteriorStats.jl, we show 3 significant figures, so the tuple takes 12-14 characters, so I'm more strongly in favor of tuples there. Plus, table display in Julia truncates columns from the right instead of the middle like pandas, so I'm more strongly in favor of tuples in PosteriorStats. Here, I'm less certain.

tomicapretto · 2024-02-18T17:35:35Z

I'm in general not in favor of storing sequences in a cell of a data frame. However, I think this case is a good exception, so I vote in favor of the tuple. Having the column name be something like hdi_{percent}% is better than the other alternatives I can think of.

sandstromviktor · 2024-12-02T08:14:34Z

I think storing a tuple with the interval is the most clean way to do it. Then there will be no confusion at all what this represents.

aloctavodia · 2024-12-02T10:50:02Z

Storing a tuple is the most favored option. We should go with that.

sethaxen mentioned this issue Feb 1, 2024

Nicer representations of intervals arviz-devs/PosteriorStats.jl#25

Closed

sethaxen mentioned this issue Dec 2, 2024

HDI column names incorrect #2399

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDI naming #2306

HDI naming #2306

sboukortt commented Jan 13, 2024 •

edited

Loading

aloctavodia commented Jan 16, 2024

sethaxen commented Feb 1, 2024

aloctavodia commented Feb 2, 2024 •

edited

Loading

sethaxen commented Feb 2, 2024

tomicapretto commented Feb 18, 2024

sandstromviktor commented Dec 2, 2024

aloctavodia commented Dec 2, 2024

HDI naming #2306

HDI naming #2306

Comments

sboukortt commented Jan 13, 2024 • edited Loading

aloctavodia commented Jan 16, 2024

sethaxen commented Feb 1, 2024

aloctavodia commented Feb 2, 2024 • edited Loading

sethaxen commented Feb 2, 2024

tomicapretto commented Feb 18, 2024

sandstromviktor commented Dec 2, 2024

aloctavodia commented Dec 2, 2024

sboukortt commented Jan 13, 2024 •

edited

Loading

aloctavodia commented Feb 2, 2024 •

edited

Loading