Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Mention take_along_axis in choose #14117

Closed
lucianopaz opened this issue Jul 25, 2019 · 4 comments
Closed

DOC: Mention take_along_axis in choose #14117

lucianopaz opened this issue Jul 25, 2019 · 4 comments

Comments

@lucianopaz
Copy link

Preamble

In the pymc3 package we implemented a series of distributions. One of such distributions is the Categorical distribution, where we have an ndarray, p, that represents the probabilities of getting one of K categories (where K goes from 0 to p.shape[-1]-1).

Under some circumstances, its good to stack many independent categorical distributions on the same distribution instance (i.e. to write down a multidimensional categorical distribution). In these cases, p.ndim is larger than 1.

PyMC3 focuses on doing MCMC, and to accomplish that, we write down a distribution's log probability. To fix our implementation on multidimensional categorical distributions, we recently switched to use choose instead of advanced indexing (we actually use theano, but theano.tensor.choose later dispatches to numpy.choose). However we encountered the following issue which can be reproduced by the simple following code:

Reproducing code example:

>>> import numpy as np
>>> a = np.random.randint(0, 1000, size=(8, 3, 4))
>>> b = np.random.rand(1000, 3, 4) 
>>> np.choose(a, b)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-f1c91ffb39bc> in <module>()
----> 1 np.choose(b, a)

~/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py in choose(a, choices, out, mode)
    420 
    421     """
--> 422     return _wrapfunc(a, 'choose', choices, out=out, mode=mode)
    423 
    424 

~/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     54 def _wrapfunc(obj, method, *args, **kwds):
     55     try:
---> 56         return getattr(obj, method)(*args, **kwds)
     57 
     58     # An AttributeError occurs if the object does not have

ValueError: Need at least 1 and at most 32 array objects.

In [2]: a = np.random.rand(1000, 3, 4)

In [3]: b = np.random.randint(0, 1000, size=(8, 3, 4))

In [4]: np.choose(a, b)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-a1dc619e7b2a> in <module>()
----> 1 np.choose(a, b)

~/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py in choose(a, choices, out, mode)
    420 
    421     """
--> 422     return _wrapfunc(a, 'choose', choices, out=out, mode=mode)
    423 
    424 

~/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     54 def _wrapfunc(obj, method, *args, **kwds):
     55     try:
---> 56         return getattr(obj, method)(*args, **kwds)
     57 
     58     # An AttributeError occurs if the object does not have

ValueError: Need at least 1 and at most 32 array objects.

Feature request

It would be really great if this 32 array objects limit were lifted only in the case in which the supplied choices parameter was an ndarray instance and had a non-object dtype (i.e. float64, int64 or others like those).

Numpy/Python version information:

python version: 3.6.7 | packaged by conda-forge | (default, Feb 26 2019, 03:50:56) [GCC 7.3.0]
numpy version: 1.16.1

@seberg
Copy link
Member

seberg commented Jul 25, 2019

@lucianopaz choose really works on multiple array objects, I am not sure we will change that (although, I admit one could add a different branch/fast path for it).

I think your need here should be served very well by np.take_along_axis, as in:

import numpy as np
a = np.random.randint(0, 1000, size=(8, 3, 4))
b = np.random.rand(1000, 3, 4)
res = np.take_along_axis(b, a, axis=0)

I think it would be good to put take_along_axis in the "See Also" section and mention it prominently. choice discourages array inputs, and take_along_axis should be mentioned explicitly there probably.

EDIT: If you need this in a version which is compatible with much older numpy versions, you can write it as advanced indexing.

@lucianopaz
Copy link
Author

Thanks a lot @seberg! take_along_axis seems perfect. When you say that it could be implemented with advanced indexing, do you mean that it could be done without relying on for loops? Just reshapes and indexing? If yes, could you give me a small pointer?

@seberg
Copy link
Member

seberg commented Jul 25, 2019

Yes, if you do not mind diving a bit in the code, in numpy/lib/shape_base.py you can find the code for take_along_axis which does exactly this, but you really only would need it if you support NumPy 1.14 or earlier.

@seberg seberg changed the title [Feature request] Choose on arrays DOC: Mention take_along_axis in choose Jul 26, 2019
@lucianopaz
Copy link
Author

Thanks again @seberg! take_along axis is exactly what we need but we can't use numpy's implementation directly because we deal with theano tensors. The advanced indexing implementation you pointed me to seems perfectly portable to theano though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants