Wrong test values in Categorical distribution #3156

plison · 2018-08-15T08:13:10Z

The test values for the Categorical distribution are incorrect:

import pymc3 as pm
import theano

probs_X = theano.shared(np.array([0.5, 0.5]))
probs_Y = theano.shared(np.array([[0.33, 0.33,0.34], [0.25, 0.25, 0.5]]))

values_X = np.array([0, 0, 1])
with pm.Model() as model:
    X = pm.Categorical("X", p=probs_X, observed=values_X) 
    Y = pm.Categorical("Y", p=probs_Y[X])
    print("Default value:", Y.distribution.default())

gives for some strange reason the value:

Default value: 8

This incorrect value breaks the model, giving an infinite log-probability for the Y variable:

model.check_test_point()

Y        -inf
X   -2.080000

I'm using the latest development version of PyMC3, on Github.

I checked the code by I don't quite understand what is going on. The default value seems to be derived from the mode, which is in the case of the Categorical is the argmax of the probability vector. But this argmax is not actually computed, what is retrieved is the tag.test_value attribute of the mode tensor variable. I have absolutely no clue as to why this test value gives 8 in this case...

The text was updated successfully, but these errors were encountered:

gBokiau · 2018-08-20T15:00:31Z

At least two things are going on here, I don't think they're all related.

Some observations :

X might as well (should?) have been Bernoulli here, worth checking if this makes any difference wrt check_test_point.
the Y.p gets assigned with:

# probs_Y[values_X].eval()
 array([[0.33, 0.33, 0.34],
       [0.25, 0.25, 0.5 ],
       [0.25, 0.25, 0.5 ]])

Because no shape argument is given to the distribution, this is flattened, so that it appears that there are 9 categories, and softmaxed so that it all sums to 1. Hence, 8 does make sense as a default value.

gBokiau · 2018-08-20T15:09:34Z

ie, you can try with

with pm.Model() as model:
    X = pm.Bernoulli("X", p=.5, observed=values_X) 
    Y = pm.Categorical("Y", p=probs_Y[X], shape=3)

One would hope that Y.distribution.default() then returns array([2,2,2])

twiecki · 2019-02-28T10:23:59Z

Closed by #3386.

junpenglao added the bug label Aug 15, 2018

lucianopaz mentioned this issue Feb 28, 2019

Changed Categorical to work with multidim p at the logp level reloaded #3386

Merged

twiecki closed this as completed Feb 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong test values in Categorical distribution #3156

Wrong test values in Categorical distribution #3156

plison commented Aug 15, 2018

gBokiau commented Aug 20, 2018 •

edited

Loading

gBokiau commented Aug 20, 2018 •

edited

Loading

twiecki commented Feb 28, 2019

Wrong test values in Categorical distribution #3156

Wrong test values in Categorical distribution #3156

Comments

plison commented Aug 15, 2018

gBokiau commented Aug 20, 2018 • edited Loading

gBokiau commented Aug 20, 2018 • edited Loading

twiecki commented Feb 28, 2019

gBokiau commented Aug 20, 2018 •

edited

Loading

gBokiau commented Aug 20, 2018 •

edited

Loading