Posterior sampling from pm.Categorical results in strange "ValueError: Need at least 1 and at most 32 array objects" #3567

aakhmetz · 2019-07-25T11:14:21Z

While doing posterior sampling from Categorical, I get a strange Value error. The same code works on previous version of PyMC3 3.7.rc1, but not working with the latest version 3.7.

Description of your problem

Please provide a minimal, self-contained, and reproducible example.
Here, I will omit most of the code, but mainly I create a matrix p that consists of values between zero and one. I add the column to the right to make sure that every row contains at least one value greater than zero.

def makeModel(dbar_val):
    dbar.set_value(dbar_val)
    with pm.Model() as model:
        Δdata = (observed_data['end']-observed_data['start']).astype('timedelta64[D]').astype('int').values

        Δmean = pm.HalfNormal('Δmean',np.mean(Δdata))
        Δsd = pm.HalfNormal('Δsd',np.std(Δdata))
        pm.Gamma('Δobs',mu=Δmean,sd=Δsd,observed=Δdata)

        trace_delay = pm.sample(1000,tune=400)

        # Here I omit some code that is not really relevant

        K = Df['t'].shape[0]
        # M is some matrix of size (K,K)      
        M = tt.switch(tt.ge(M_times_end,-inc_time),tt.switch(tt.ge(M_times,inc_time),M_distance,0),0)

        # sometimes there are no infectors to connect with, so we assign 1 to such cases
        failed_to_connect = tt.switch(tt.gt(M.max(axis=1),0),0,1)
        # and add as the last column to the matrix M
        p = pm.Deterministic('p',tt.concatenate([M,tt.shape_padright(failed_to_connect)],axis=1))
        # then, whenever we can't connect, the last element will be chosen
        connectivity_ = pm.Categorical('connectivity_',p=p,shape=K)

        trace = pm.sample_posterior_predictive(trace_delay, vars=[p,connectivity_]) 
        
    display(pm.summary(trace_delay))

    return trace

trace = makeModel(dbar.eval())

Please provide the full traceback.

ValueError                                Traceback (most recent call last)
<ipython-input-267-b89828945496> in <module>
----> 1 trace = makeModel(dbar.eval())

<ipython-input-266-8c319ae29dfc> in makeModel(dbar_val)
     31         p = pm.Deterministic('p',tt.concatenate([M,tt.shape_padright(failed_to_connect)],axis=1))
     32         # then, whenever we can't connect, the last element will be chosen
---> 33         connectivity_ = pm.Categorical('connectivity_',p=p[:3],shape=3)
     34         # for simplicity we denote those last elements by -1
     35 #         connectivity = pm.Deterministic('connectivity',tt.switch(tt.lt(connectivity_,K),connectivity_,-1))

~/anaconda/lib/python3.7/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     45             total_size = kwargs.pop('total_size', None)
     46             dist = cls.dist(*args, **kwargs)
---> 47             return model.Var(name, dist, data, total_size)
     48         else:
     49             raise TypeError("Name needs to be a string but got: {}".format(name))

~/anaconda/lib/python3.7/site-packages/pymc3/model.py in Var(self, name, dist, data, total_size)
    824                 with self:
    825                     var = FreeRV(name=name, distribution=dist,
--> 826                                  total_size=total_size, model=self)
    827                 self.free_RVs.append(var)
    828             else:

~/anaconda/lib/python3.7/site-packages/pymc3/model.py in __init__(self, type, owner, index, name, distribution, total_size, model)
   1272             self.tag.test_value = np.ones(
   1273                 distribution.shape, distribution.dtype) * distribution.default()
-> 1274             self.logp_elemwiset = distribution.logp(self)
   1275             # The logp might need scaling in minibatches.
   1276             # This is done in `Factor`.

~/anaconda/lib/python3.7/site-packages/pymc3/distributions/discrete.py in logp(self, value)
   1002                 tt.choose(
   1003                     value_clip,
-> 1004                     p.dimshuffle(pattern),
   1005                 )
   1006             )

~/anaconda/lib/python3.7/site-packages/theano/tensor/basic.py in choose(a, choices, out, mode)
   6826     # This is done to keep the same function signature then NumPy.
   6827     assert out is None
-> 6828     return Choose(mode)(a, choices)
   6829 
   6830 

~/anaconda/lib/python3.7/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    672                 thunk.outputs = [storage_map[v] for v in node.outputs]
    673 
--> 674                 required = thunk()
    675                 assert not required  # We provided all inputs
    676 

~/anaconda/lib/python3.7/site-packages/theano/gof/op.py in rval(p, i, o, n)
    890             # default arguments are stored in the closure of `rval`
    891             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892                 r = p(n, [x[0] for x in i], o)
    893                 for o in node.outputs:
    894                     compute_map[o][0] = True

~/anaconda/lib/python3.7/site-packages/theano/tensor/basic.py in perform(self, node, inputs, outputs)
   6910         choice = inputs[1]
   6911         # TODO reuse out?
-> 6912         z[0] = np.choose(a, choice, mode=self.mode)
   6913 
   6914 

~/anaconda/lib/python3.7/site-packages/numpy/core/fromnumeric.py in choose(a, choices, out, mode)
    420 
    421     """
--> 422     return _wrapfunc(a, 'choose', choices, out=out, mode=mode)
    423 
    424 

~/anaconda/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     54 def _wrapfunc(obj, method, *args, **kwds):
     55     try:
---> 56         return getattr(obj, method)(*args, **kwds)
     57 
     58     # An AttributeError occurs if the object does not have

ValueError: Need at least 1 and at most 32 array objects.

Please provide any additional information below.

I have seen a similar complaint on discorse https://discourse.pymc.io/t/random-method-for-interpolated-rvs/3205/2, but I was unable to find if that reported bug was fixed. Maybe @lucianopaz would be kind to give some update. Thanks a lot in advance.

Versions and main components

PyMC3 Version: 3.7
Theano Version: 1.0.4
Python Version: 3.7.3
Operating system: archlinux
How did you install PyMC3: (conda/pip) conda latest

lucianopaz · 2019-07-25T12:59:53Z

Oh no, PR #3563 brought some unforseen problems with it... This was an incredibly obscure error!

It looks like theano uses numpy's implementation of choose to perform the calculations.

Taken verbatim from the numpy docs:

choices : sequence of arrays
Choice arrays. a and all of the choices must be broadcastable to
the same shape. If choices is itself an array (not recommended),
then its outermost dimension (i.e., the one corresponding to
choices.shape[0]) is taken as defining the “sequence”
Notes

To reduce the chance of misinterpretation, even though the following
“abuse” is nominally supported, choices should neither be, nor be thought
of as, a single array, i.e., the outermost sequence-like container should be
either a list or a tuple.

So the choices parameter should be interpreted as a sequence of candidate choice arrays. This sequence is then used to get a multiindex and at that point, there's a hard limit on the number of inputs that function can accept (the 32 you see). So in the end, we can choose from a list of more than 32 options!!

The conclusion is that switching to choose was a bad decision. @twiecki, maybe we should revert #3563, and try to write or own indexing function to do away with this issue and #3564.

junpenglao · 2019-07-25T13:02:37Z

reverting in #3568

lucianopaz · 2019-07-25T13:02:43Z

@twiecki, Or maybe we could bring something like this up at numpy as an issue and hope they fix it? There shouldn't be a problem for them to get a multiindex out of a single array.

lucianopaz · 2019-07-25T13:25:34Z

I opened an issue at numpy asking for them to lift the limit for arrays.

aakhmetz · 2019-07-25T15:47:32Z

Thank you especially for quick response - my code works again!

lucianopaz · 2019-07-25T16:35:23Z

Be aware that with #3535 open again, multidimensional categoricals don't only use up more memory than they should, they also produce incorrect logp evaluations, leading to incorrect inference. It's really unfortunate that numpy limited choose this much, because it was exactly what we needed to get the logp right.

aakhmetz · 2019-07-29T05:44:14Z

@lucianopaz Thank you for worrying

As for my situation, my code for sampling from a categorical distribution was not involved in inference. So I assumed I was safe. However, I will keep in mind your notion for the future

lucianopaz mentioned this issue Jul 25, 2019

DOC: Mention take_along_axis in choose numpy/numpy#14117

Closed

aakhmetz closed this as completed Jul 25, 2019

lucianopaz mentioned this issue Jul 26, 2019

Memory Error: OrderedLogistic #3535

Closed

lucianopaz mentioned this issue Jul 30, 2019

Fix Categorical.logp with take_along_axis #3572

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Posterior sampling from pm.Categorical results in strange "ValueError: Need at least 1 and at most 32 array objects" #3567

Posterior sampling from pm.Categorical results in strange "ValueError: Need at least 1 and at most 32 array objects" #3567

aakhmetz commented Jul 25, 2019 •

edited

Loading

lucianopaz commented Jul 25, 2019

junpenglao commented Jul 25, 2019

lucianopaz commented Jul 25, 2019

lucianopaz commented Jul 25, 2019

aakhmetz commented Jul 25, 2019

lucianopaz commented Jul 25, 2019

aakhmetz commented Jul 29, 2019

Posterior sampling from pm.Categorical results in strange "ValueError: Need at least 1 and at most 32 array objects" #3567

Posterior sampling from pm.Categorical results in strange "ValueError: Need at least 1 and at most 32 array objects" #3567

Comments

aakhmetz commented Jul 25, 2019 • edited Loading

Description of your problem

Versions and main components

lucianopaz commented Jul 25, 2019

junpenglao commented Jul 25, 2019

lucianopaz commented Jul 25, 2019

lucianopaz commented Jul 25, 2019

aakhmetz commented Jul 25, 2019

lucianopaz commented Jul 25, 2019

aakhmetz commented Jul 29, 2019

aakhmetz commented Jul 25, 2019 •

edited

Loading