Fix different / alternative parametrizations between RandomOps and Logp methods #4548

ricardoV94 · 2021-03-16T17:46:01Z

Updated state of this PR in #4548 (comment)

brandonwillard

Looks like you have the right idea, and it should be as simple as that!

We should add a simple test for this, though. For now, we can add a single, simple beta parameter test to the/an existing HalfCauchy test method in test_distributions.

Otherwise, we should consider moving this new HalfCauchyRV and the MultinomialRV to Aesara. These changes shouldn't break anything over there, and it would be better to have a single, good implementation of these distributions.

brandonwillard · 2021-03-16T17:54:12Z

pymc3/distributions/continuous.py

+class HalfCauchyRV(RandomVariable):
+    name = "halfcauchy"
+    ndim_supp = 0
+    ndims_params = [0]
+    dtype = "floatX"
+    _print_name = ("C**+", "\\operatorname{C^{+}}")


It seems like you should be able to subclass the existing HalfCauchyRV, just like MultinomialRV does. That should prevent the need to duplicate these class-level variables, and preserve our ability to do things like isinstance(..., HalfCauchyRV) (using aesara.tensor.random.basic.HalfCauchyRV) on instances of this new class.

I tried to do that first but I didn't find a way to deal with the __call__ methods as that invokes super.__call__, which I think is strongly linked to the two parameter logic of the original HalfCauchyRV?

ricardoV94 · 2021-03-16T18:16:49Z

We should add a simple test for this, though. For now, we can add a single, simple beta parameter test to the/an existing HalfCauchy test method in test_distributions.

The current test_distributions.py::test_half_cauchy already tests for the beta parameter. In terms of logp everything is fine.

I am checking how to refactor the test_distributions_random.py to test the random method there.

Otherwise, we should consider moving this new HalfCauchyRV and the MultinomialRV to Aesara. These changes shouldn't break anything over there, and it would be better to have a single, good implementation of these distributions.

Sure, that's also fine.

brandonwillard · 2021-03-16T18:33:18Z

I am checking how to refactor the test_distributions_random.py to test the random method there.

Before doing that, take a look at the conversation about test_distributions_random starting here. Basically, we want to reserve test_distributions_random for only our custom implemented samplers, and not simple calls to SciPy/NumPy samplers. At the very least, these latter cases can be covered by much simpler tests than the ones that currently exist in test_distributions_random.

ricardoV94 · 2021-03-16T19:02:50Z

I was thinking this one is still useful: https://github.com/pymc-devs/pymc3/blob/9ce64f5a50498fd8db7f2065d2b2427dfc01bc09/pymc3/tests/test_distributions_random.py#L477

It tests that the pymc3 paramaters and scipy random reference give similar values, and it's not concerned with shapes.

I refactored the main method to something like this:

def pymc3_random(
    dist,
    paramdomains,
    ref_rand,
    valuedomain=Domain([0]),
    size=10000,
    alpha=0.05,
    fails=10,
    extra_args=None,
    model_args=None,
):
    if model_args is None:
        model_args = {}

    model, param_vars = build_model(dist, valuedomain, paramdomains, extra_args)
    model_dist = change_rv_size(model.named_vars["value"], size, expand=True)
    pymc_rand = aesara.function([], model_dist)

    domains = paramdomains.copy()
    for pt in product(domains, n_samples=100):
        pt = pm.Point(pt, model=model)
        pt.update(model_args)

        # Update the shared parameter variables in `param_vars`
        for k, v in pt.items():
            nv = param_vars.get(k, model.named_vars.get(k))
            if nv.name in param_vars:
                param_vars[nv.name].set_value(v)

        p = alpha
        # Allow KS test to fail (i.e., the samples be different)
        # a certain number of times. Crude, but necessary.
        f = fails
        while p <= alpha and f > 0:
            s0 = pymc_rand()
            s1 = ref_rand(size=size, **pt)
            _, p = st.ks_2samp(np.atleast_1d(s0).flatten(), np.atleast_1d(s1).flatten())
            f -= 1
        assert p > alpha, str(pt)

brandonwillard · 2021-03-16T19:09:33Z

That approach to testing only makes sense when you've implemented a sampler from scratch; otherwise, you're testing the numerical correctness of the underlying NumPy/SciPy samplers. Worse yet, it might actually be confirming that the NumPy/SciPy samplers are correct according to the same NumPy/SciPy samplers.

When it comes to checking the mappings between our parameterizations and NumPy/SciPy's, it's a very costly overkill.

ricardoV94 · 2021-03-16T19:10:45Z

When it comes to checking the mappings between our parameterizations and NumPy/SciPy's, it's a very costly overkill.

I agree it is an overkill. What should we do instead?

brandonwillard · 2021-03-16T19:14:26Z

I agree it is an overkill. What should we do instead?

Standard unit testing. We just need to test the logic we've implemented (e.g. the conversion/remapping of parameters and the calls to SciPy/NumPy). If there is no conversion/remapping, then we really only need to make sure that a single call to the underlying SciPy/NumPy samplers succeeds, since nothing new is being tested with more calls using different values.

ricardoV94 · 2021-03-17T08:38:45Z

I found other issues in the mapping of arguments between pymc3 and aesara random ops in the Exponential and Gamma distributions. The temporarily reintroduced tests seems useful to capture these issues, at least until we figure out something more permanent and less wasteful.

The following already refactored distributions still have issues:

FAILED pymc3/tests/test_distributions.py::TestMatchesScipy::test_normal - Ass...
FAILED pymc3/tests/test_distributions.py::TestMatchesScipy::test_beta - Asser...
FAILED pymc3/tests/test_distributions.py::TestMatchesScipy::test_inverse_gamma
FAILED pymc3/tests/test_distributions.py::TestMatchesScipy::test_dirichlet_with_batch_shapes[dist_shape3]
FAILED pymc3/tests/test_distributions_random.py::TestScalarParameterSamples::test_half_normal
FAILED pymc3/tests/test_distributions_random.py::TestScalarParameterSamples::test_beta
FAILED pymc3/tests/test_distributions_random.py::TestScalarParameterSamples::test_gamma_mu_sigma
FAILED pymc3/tests/test_distributions_random.py::TestScalarParameterSamples::test_inverse_gamma
FAILED pymc3/tests/test_distributions_random.py::TestScalarParameterSamples::test_mv_normal

I did not refactor the test for discrete variables.

We also need to establish a method for mapping optional parametrizations (such as tau/sigma in the HalfNormal). The changes here were just for unused / different arguments. But maybe both can be done in a single swipe.

.github/workflows/pytest.yml

pymc3/distributions/continuous.py

ricardoV94 · 2021-03-17T13:29:39Z

This PR will probably just be discarded in the end but hopefully it will help in figuring out how to refactor the old distributions to V4...

ricardoV94 · 2021-03-17T14:32:36Z

pymc3/tests/test_distributions.py

@@ -1354,7 +1352,7 @@ def test_fun(value, mu, sigma):
            Rplus,
            {"mu": Rplus, "sigma": Rplus},
            test_fun,
-            decimal=select_by_precision(float64=5, float32=3),
+            decimal=select_by_precision(float64=3, float32=3),


We seem to have lost some (test) precision here by using InverseGamma.get_alpha_beta

ricardoV94 · 2021-03-17T15:40:58Z

HalfNormal is now passing both logp/cdf and random tests.

On float64 only the random method of the mvnormal is currently failing:

FAILED pymc3/tests/test_distributions_random.py::TestScalarParameterSamples::test_mv_normal

On float32, surprisingly, there are more failures:

FAILED pymc3/tests/test_distributions.py::TestMatchesScipy::test_beta - Asser...
FAILED pymc3/tests/test_distributions_random.py::TestScalarParameterSamples::test_beta
FAILED pymc3/tests/test_distributions_random.py::TestScalarParameterSamples::test_gamma_mu_sigma
FAILED pymc3/tests/test_distributions_random.py::TestScalarParameterSamples::test_inverse_gamma
FAILED pymc3/tests/test_distributions_random.py::TestScalarParameterSamples::test_mv_normal

ricardoV94 · 2021-03-17T16:54:07Z

Okay so every "already-refactored" continuous distributions seems to be in order except for the MvNormal.

@brandonwillard do the failing tests say anything to you? It seems the automatic distribution refactoring is not working for this one...

in test_distributions.py::TestMatchesScipy::test_mvnormal:

test_distributions.py::TestMatchesScipy::test_mvnormal[1] FAILED         [ 33%]
pymc3/tests/test_distributions.py:1631 (TestMatchesScipy.test_mvnormal[1])
self = <pymc3.tests.test_distributions.TestMatchesScipy object at 0x7f750b257850>
n = 1

    @pytest.mark.parametrize("n", [1, 2, 3])
    def test_mvnormal(self, n):
>       self.check_logp(
            MvNormal,
            RealMatrix(5, n),
            {"mu": Vector(R, n), "tau": PdMatrix(n)},
            normal_logpdf_tau,
        )

test_distributions.py:1634: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_distributions.py:604: in check_logp
    model, param_vars = build_model(pymc3_dist, domain, paramdomains, extra_args)
test_distributions.py:232: in build_model
    distfam("value", **param_vars, transform=None)
../distributions/distribution.py:166: in __new__
    rv_out = cls.dist(*args, rng=rng, **kwargs)
../distributions/multivariate.py:225: in dist
    return super().__init__([mu, cov], **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = [mu, dot.0], shape = (), dtype = 'float64'
defaults = ('median', 'mean', 'mode'), args = ()
kwargs = {'rng': RandomStateSharedVariable(<RandomState(MT19937) at 0x7F750B304D40>)}

    def __init__(self, shape=(), dtype=None, defaults=("median", "mean", "mode"), *args, **kwargs):
        if dtype is None:
            dtype = aesara.config.floatX
>       super().__init__(shape, dtype, defaults=defaults, *args, **kwargs)
E       TypeError: super(type, obj): obj must be an instance or subtype of type

ricardoV94 · 2021-03-17T17:12:06Z

On the discrete distributions this part of the NegativeBinomial logp still needs to be refactored:
https://github.com/pymc-devs/pymc3/blob/a80cf9ac7dec48cc71a2924362884a4d8cc6e30d/pymc3/distributions/discrete.py#L782

It currently raises the error:

>       return aet.switch(aet.gt(alpha, 1e10), Poisson.dist(mu).logp(value), negbinom)
E       AttributeError: 'TensorVariable' object has no attribute 'logp'

brandonwillard · 2021-03-17T17:22:13Z

I found other issues in the mapping of arguments between pymc3 and aesara random ops in the Exponential and Gamma distributions. The temporarily reintroduced tests seems useful to capture these issues, at least until we figure out something more permanent and less wasteful.

That's odd, because I just re-enabled tests for those distributions (e.g. in test_transforms) and there were no apparent parameter mapping issues.

brandonwillard · 2021-03-17T17:23:20Z

It currently raises the error:

>       return aet.switch(aet.gt(alpha, 1e10), Poisson.dist(mu).logp(value), negbinom)
E       AttributeError: 'TensorVariable' object has no attribute 'logp'

Yes, that's just a syntax change. We no longer have Distribution.log* methods.

brandonwillard · 2021-03-17T17:25:20Z

Okay so every "already-refactored" continuous distributions seems to be in order except for the MvNormal.

@brandonwillard do the failing tests say anything to you? It seems the automatic distribution refactoring is not working for this one...

in test_distributions.py::TestMatchesScipy::test_mvnormal:

Changes needed to be made in order to handle all the variations of cov, tau, and Cholesky parameterizations. More specifically, the cov parameterization is the only one implemented now; otherwise, the code attempts to convert the other parameters to a cov.

It looks like those changes might need to be further refined and tested.

ricardoV94 · 2021-03-17T17:30:16Z

That's odd, because I just re-enabled tests for those distributions (e.g. in test_transforms) and there were no apparent parameter mapping issues.

Just to be clear, this is for the random part not the logp. It was just that the aesara randomOp works on scale and the pymc3 logp works on rate (or 1/scale). So the random draws for a given pymc3 rate were not what they should have been.

ricardoV94 · 2021-03-17T17:38:21Z

The random test for the Categorical gives a super large and cryptic error message

Error details

  /home/ricardo/Documents/Projects/pymc3-venv/bin/python3 /opt/pycharm-2020.2.3/plugins/python/helpers/pycharm/_jb_pytest_runner.py --target test_distributions_random.py::TestScalarParameterSamples.test_categorical_random
Testing started at 18:35 ...
Launching pytest with arguments test_distributions_random.py::TestScalarParameterSamples::test_categorical_random in /home/ricardo/Documents/Projects/pymc3/pymc3/tests

============================= test session starts ==============================
platform linux -- Python 3.8.5, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 -- /home/ricardo/Documents/Projects/pymc3-venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/ricardo/Documents/Projects/pymc3, configfile: setup.cfg
plugins: cov-2.10.1
collecting ... collected 1 item

test_distributions_random.py::TestScalarParameterSamples::test_categorical_random[2] FAILED [100%]
pymc3/tests/test_distributions_random.py:789 (TestScalarParameterSamples.test_categorical_random[2])
self = <aesara.compile.function.types.Function object at 0x7fd8277c8a90>
args = (), kwargs = {}
restore_defaults = <function Function.__call__.<locals>.restore_defaults at 0x7fd8278cff70>
profile = None, t0 = 1616002542.7315042, output_subset = None, i = 0
c = <array([0.001, 0.999])>

    def __call__(self, *args, **kwargs):
        """
        Evaluates value of a function on given arguments.
    
        Parameters
        ----------
        args : list
            List of inputs to the function. All inputs are required, even when
            some of them are not necessary to calculate requested subset of
            outputs.
    
        kwargs : dict
            The function inputs can be passed as keyword argument. For this, use
            the name of the input or the input instance as the key.
    
            Keyword argument ``output_subset`` is a list of either indices of the
            function's outputs or the keys belonging to the `output_keys` dict
            and represent outputs that are requested to be calculated. Regardless
            of the presence of ``output_subset``, the updates are always calculated
            and processed. To disable the updates, you should use the ``copy``
            method with ``delete_updates=True``.
    
        Returns
        -------
        list
            List of outputs on indices/keys from ``output_subset`` or all of them,
            if ``output_subset`` is not passed.
        """
    
        def restore_defaults():
            for i, (required, refeed, value) in enumerate(self.defaults):
                if refeed:
                    if isinstance(value, Container):
                        value = value.storage[0]
                    self[i] = value
    
        profile = self.profile
        t0 = time.time()
    
        output_subset = kwargs.pop("output_subset", None)
        if output_subset is not None and self.output_keys is not None:
            output_subset = [self.output_keys.index(key) for key in output_subset]
    
        # Reinitialize each container's 'provided' counter
        if self.trust_input:
            i = 0
            for arg in args:
                s = self.input_storage[i]
                s.storage[0] = arg
                i += 1
        else:
            for c in self.input_storage:
                c.provided = 0
    
            if len(args) + len(kwargs) > len(self.input_storage):
                raise TypeError("Too many parameter passed to aesara function")
    
            # Set positional arguments
            i = 0
            for arg in args:
                # TODO: provide a Param option for skipping the filter if we
                #      really want speed.
                s = self.input_storage[i]
                # see this emails for a discuation about None as input
                # https://groups.google.com/group/theano-dev/browse_thread/thread/920a5e904e8a8525/4f1b311a28fc27e5
                if arg is None:
                    s.storage[0] = arg
                else:
                    try:
                        s.storage[0] = s.type.filter(
                            arg, strict=s.strict, allow_downcast=s.allow_downcast
                        )
    
                    except Exception as e:
                        function_name = "aesara function"
                        argument_name = "argument"
                        if self.name:
                            function_name += ' with name "' + self.name + '"'
                        if hasattr(arg, "name") and arg.name:
                            argument_name += ' with name "' + arg.name + '"'
                        where = get_variable_trace_string(self.maker.inputs[i].variable)
                        if len(e.args) == 1:
                            e.args = (
                                "Bad input "
                                + argument_name
                                + " to "
                                + function_name
                                + f" at index {int(i)} (0-based). {where}"
                                + e.args[0],
                            )
                        else:
                            e.args = (
                                "Bad input "
                                + argument_name
                                + " to "
                                + function_name
                                + f" at index {int(i)} (0-based). {where}"
                            ) + e.args
                        restore_defaults()
                        raise
                s.provided += 1
                i += 1
    
        # Set keyword arguments
        if kwargs:  # for speed, skip the items for empty kwargs
            for k, arg in kwargs.items():
                self[k] = arg
    
        if (
            not self.trust_input
            and
            # The getattr is only needed for old pickle
            getattr(self, "_check_for_aliased_inputs", True)
        ):
            # Collect aliased inputs among the storage space
            args_share_memory = []
            for i in range(len(self.input_storage)):
                i_var = self.maker.inputs[i].variable
                i_val = self.input_storage[i].storage[0]
                if hasattr(i_var.type, "may_share_memory"):
                    is_aliased = False
                    for j in range(len(args_share_memory)):
    
                        group_j = zip(
                            [
                                self.maker.inputs[k].variable
                                for k in args_share_memory[j]
                            ],
                            [
                                self.input_storage[k].storage[0]
                                for k in args_share_memory[j]
                            ],
                        )
                        if any(
                            [
                                (
                                    var.type is i_var.type
                                    and var.type.may_share_memory(val, i_val)
                                )
                                for (var, val) in group_j
                            ]
                        ):
    
                            is_aliased = True
                            args_share_memory[j].append(i)
                            break
    
                    if not is_aliased:
                        args_share_memory.append([i])
    
            # Check for groups of more than one argument that share memory
            for group in args_share_memory:
                if len(group) > 1:
                    # copy all but the first
                    for j in group[1:]:
                        self.input_storage[j].storage[0] = copy.copy(
                            self.input_storage[j].storage[0]
                        )
    
        # Check if inputs are missing, or if inputs were set more than once, or
        # if we tried to provide inputs that are supposed to be implicit.
        if not self.trust_input:
            for c in self.input_storage:
                if c.required and not c.provided:
                    restore_defaults()
                    raise TypeError(
                        f"Missing required input: {getattr(self.inv_finder[c], 'variable', self.inv_finder[c])}"
                    )
                if c.provided > 1:
                    restore_defaults()
                    raise TypeError(
                        f"Multiple values for input: {getattr(self.inv_finder[c], 'variable', self.inv_finder[c])}"
                    )
                if c.implicit and c.provided > 0:
                    restore_defaults()
                    raise TypeError(
                        f"Tried to provide value for implicit input: {getattr(self.inv_finder[c], 'variable', self.inv_finder[c])}"
                    )
    
        # Do the actual work
        t0_fn = time.time()
        try:
            outputs = (
>               self.fn()
                if output_subset is None
                else self.fn(output_subset=output_subset)
            )

../../../pymc3-venv/lib/python3.8/site-packages/aesara/compile/function/types.py:974: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

p = <bound method RandomVariable.perform of <aesara.tensor.random.basic.CategoricalRV object at 0x7fd827c32fd0>>
i = [[RandomState(MT19937) at 0x7FD82755F040], [array([100000])], [array(4)], [array([0.001, 0.999])]]
o = [[RandomState(MT19937) at 0x7FD82755F040], [None]]
n = categorical_rv(RandomStateSharedVariable(<RandomState(MT19937) at 0x7FD82755F040>), TensorConstant{(1,) of 100000}, TensorConstant{4}, p)

    def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
>       r = p(n, [x[0] for x in i], o)

../../../pymc3-venv/lib/python3.8/site-packages/aesara/graph/op.py:473: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <aesara.tensor.random.basic.CategoricalRV object at 0x7fd827c32fd0>
node = categorical_rv(RandomStateSharedVariable(<RandomState(MT19937) at 0x7FD82755F040>), TensorConstant{(1,) of 100000}, TensorConstant{4}, p)
inputs = [RandomState(MT19937) at 0x7FD82755F040, array([100000]), array(4), array([0.001, 0.999])]
outputs = [[RandomState(MT19937) at 0x7FD82755F040], [None]]

    def perform(self, node, inputs, outputs):
        rng_var_out, smpl_out = outputs
    
        rng, size, dtype, *args = inputs
    
        out_var = node.outputs[1]
    
        # If `size == []`, that means no size is enforced, and NumPy is trusted
        # to draw the appropriate number of samples, NumPy uses `size=None` to
        # represent that.  Otherwise, NumPy expects a tuple.
        if np.size(size) == 0:
            size = None
        else:
            size = tuple(size)
    
        # Draw from `rng` if `self.inplace` is `True`, and from a copy of `rng`
        # otherwise.
        if not self.inplace:
            rng = copy(rng)
    
        rng_var_out[0] = rng
    
>       smpl_val = self.rng_fn(rng, *(args + [size]))

../../../pymc3-venv/lib/python3.8/site-packages/aesara/tensor/random/op.py:417: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'aesara.tensor.random.basic.CategoricalRV'>
rng = RandomState(MT19937) at 0x7FD82755F040, p = array([0.001, 0.999])
size = (100000,)

    @classmethod
    def rng_fn(cls, rng, p, size):
        if size is None:
            size = ()
    
        size = tuple(np.atleast_1d(size))
        ind_shape = p.shape[:-1]
    
        if len(size) > 0 and size[-len(ind_shape) :] != ind_shape:
>           raise ValueError("Parameters shape and size do not match.")
E           ValueError: Parameters shape and size do not match.

../../../pymc3-venv/lib/python3.8/site-packages/aesara/tensor/random/basic.py:375: ValueError

During handling of the above exception, another exception occurred:

self = <pymc3.tests.test_distributions_random.TestScalarParameterSamples object at 0x7fd8277c1d60>
s = 2

    @pytest.mark.parametrize("s", [2,])
    def test_categorical_random(self, s):
        def ref_rand(size, p):
            return nr.choice(np.arange(p.shape[0]), p=p, size=size)
    
>       pymc3_random_discrete(pm.Categorical, {"p": Simplex(s)}, ref_rand=ref_rand)

test_distributions_random.py:795: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_distributions_random.py:132: in pymc3_random_discrete
    o = pymc_rand()
../../../pymc3-venv/lib/python3.8/site-packages/aesara/compile/function/types.py:987: in __call__
    raise_with_op(
../../../pymc3-venv/lib/python3.8/site-packages/aesara/link/utils.py:508: in raise_with_op
    raise exc_value.with_traceback(exc_trace)
../../../pymc3-venv/lib/python3.8/site-packages/aesara/compile/function/types.py:974: in __call__
    self.fn()
../../../pymc3-venv/lib/python3.8/site-packages/aesara/graph/op.py:473: in rval
    r = p(n, [x[0] for x in i], o)
../../../pymc3-venv/lib/python3.8/site-packages/aesara/tensor/random/op.py:417: in perform
    smpl_val = self.rng_fn(rng, *(args + [size]))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'aesara.tensor.random.basic.CategoricalRV'>
rng = RandomState(MT19937) at 0x7FD82755F040, p = array([0.001, 0.999])
size = (100000,)

    @classmethod
    def rng_fn(cls, rng, p, size):
        if size is None:
            size = ()
    
        size = tuple(np.atleast_1d(size))
        ind_shape = p.shape[:-1]
    
        if len(size) > 0 and size[-len(ind_shape) :] != ind_shape:
>           raise ValueError("Parameters shape and size do not match.")
E           ValueError: Parameters shape and size do not match.
E           Apply node that caused the error: categorical_rv(RandomStateSharedVariable(<RandomState(MT19937) at 0x7FD82755F040>), TensorConstant{(1,) of 100000}, TensorConstant{4}, p)
E           Toposort index: 0
E           Inputs types: [RandomStateType, TensorType(int64, (True,)), TensorType(int64, scalar), TensorType(float64, vector)]
E           Inputs shapes: ['No shapes', (1,), (), (2,)]
E           Inputs strides: ['No strides', (8,), (), (8,)]
E           Inputs values: [RandomState(MT19937) at 0x7FD82755F040, array([100000]), array(4), array([0.001, 0.999])]
E           Inputs type_num: ['', 7, 7, 12]
E           Outputs clients: [[], ['output']]
E           
E           Backtrace when the node is created(use Aesara flag traceback__limit=N to make it longer):
E             File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__
E               return self._hookexec(self, self.get_hookimpls(), kwargs)
E             File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec
E               return self._inner_hookexec(hook, methods, kwargs)
E             File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda>
E               self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
E             File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall
E               res = hook_impl.function(*args)
E             File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/python.py", line 184, in pytest_pyfunc_call
E               result = testfunction(**testargs)
E             File "/home/ricardo/Documents/Projects/pymc3/pymc3/tests/test_distributions_random.py", line 795, in test_categorical_random
E               pymc3_random_discrete(pm.Categorical, {"p": Simplex(s)}, ref_rand=ref_rand)
E             File "/home/ricardo/Documents/Projects/pymc3/pymc3/tests/test_distributions_random.py", line 114, in pymc3_random_discrete
E               model_dist = change_rv_size(model.named_vars["value"], size, expand=True)
E             File "/home/ricardo/Documents/Projects/pymc3/pymc3/distributions/__init__.py", line 130, in change_rv_size
E               new_rv_node = rv_node.op.make_node(rng, new_size, dtype, *dist_params)
E           
E           Debugprint of the apply node: 
E           categorical_rv.0 [id A] <RandomStateType> ''   
E            |RandomStateSharedVariable(<RandomState(MT19937) at 0x7FD82755F040>) [id B] <RandomStateType>
E            |TensorConstant{(1,) of 100000} [id C] <TensorType(int64, (True,))>
E            |TensorConstant{4} [id D] <TensorType(int64, scalar)>
E            |p [id E] <TensorType(float64, vector)>
E           categorical_rv.1 [id A] <TensorType(int64, vector)> 'value'   
E           
E           Storage map footprint:
E            - p, Shared Input, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
E            - TensorConstant{(1,) of 100000}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
E            - TensorConstant{4}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
E            TotalSize: 32.0 Byte(s) 0.000 GB
E            TotalSize inputs: 32.0 Byte(s) 0.000 GB

../../../pymc3-venv/lib/python3.8/site-packages/aesara/tensor/random/basic.py:375: ValueError


Assertion failed


Assertion failed


Assertion failed


Assertion failed


=================================== FAILURES ===================================
____________ TestScalarParameterSamples.test_categorical_random[2] _____________

self = <aesara.compile.function.types.Function object at 0x7fd8277c8a90>
args = (), kwargs = {}
restore_defaults = <function Function.__call__.<locals>.restore_defaults at 0x7fd8278cff70>
profile = None, t0 = 1616002542.7315042, output_subset = None, i = 0
c = <array([0.001, 0.999])>

    def __call__(self, *args, **kwargs):
        """
        Evaluates value of a function on given arguments.
    
        Parameters
        ----------
        args : list
            List of inputs to the function. All inputs are required, even when
            some of them are not necessary to calculate requested subset of
            outputs.
    
        kwargs : dict
            The function inputs can be passed as keyword argument. For this, use
            the name of the input or the input instance as the key.
    
            Keyword argument ``output_subset`` is a list of either indices of the
            function's outputs or the keys belonging to the `output_keys` dict
            and represent outputs that are requested to be calculated. Regardless
            of the presence of ``output_subset``, the updates are always calculated
            and processed. To disable the updates, you should use the ``copy``
            method with ``delete_updates=True``.
    
        Returns
        -------
        list
            List of outputs on indices/keys from ``output_subset`` or all of them,
            if ``output_subset`` is not passed.
        """
    
        def restore_defaults():
            for i, (required, refeed, value) in enumerate(self.defaults):
                if refeed:
                    if isinstance(value, Container):
                        value = value.storage[0]
                    self[i] = value
    
        profile = self.profile
        t0 = time.time()
    
        output_subset = kwargs.pop("output_subset", None)
        if output_subset is not None and self.output_keys is not None:
            output_subset = [self.output_keys.index(key) for key in output_subset]
    
        # Reinitialize each container's 'provided' counter
        if self.trust_input:
            i = 0
            for arg in args:
                s = self.input_storage[i]
                s.storage[0] = arg
                i += 1
        else:
            for c in self.input_storage:
                c.provided = 0
    
            if len(args) + len(kwargs) > len(self.input_storage):
                raise TypeError("Too many parameter passed to aesara function")
    
            # Set positional arguments
            i = 0
            for arg in args:
                # TODO: provide a Param option for skipping the filter if we
                #      really want speed.
                s = self.input_storage[i]
                # see this emails for a discuation about None as input
                # https://groups.google.com/group/theano-dev/browse_thread/thread/920a5e904e8a8525/4f1b311a28fc27e5
                if arg is None:
                    s.storage[0] = arg
                else:
                    try:
                        s.storage[0] = s.type.filter(
                            arg, strict=s.strict, allow_downcast=s.allow_downcast
                        )
    
                    except Exception as e:
                        function_name = "aesara function"
                        argument_name = "argument"
                        if self.name:
                            function_name += ' with name "' + self.name + '"'
                        if hasattr(arg, "name") and arg.name:
                            argument_name += ' with name "' + arg.name + '"'
                        where = get_variable_trace_string(self.maker.inputs[i].variable)
                        if len(e.args) == 1:
                            e.args = (
                                "Bad input "
                                + argument_name
                                + " to "
                                + function_name
                                + f" at index {int(i)} (0-based). {where}"
                                + e.args[0],
                            )
                        else:
                            e.args = (
                                "Bad input "
                                + argument_name
                                + " to "
                                + function_name
                                + f" at index {int(i)} (0-based). {where}"
                            ) + e.args
                        restore_defaults()
                        raise
                s.provided += 1
                i += 1
    
        # Set keyword arguments
        if kwargs:  # for speed, skip the items for empty kwargs
            for k, arg in kwargs.items():
                self[k] = arg
    
        if (
            not self.trust_input
            and
            # The getattr is only needed for old pickle
            getattr(self, "_check_for_aliased_inputs", True)
        ):
            # Collect aliased inputs among the storage space
            args_share_memory = []
            for i in range(len(self.input_storage)):
                i_var = self.maker.inputs[i].variable
                i_val = self.input_storage[i].storage[0]
                if hasattr(i_var.type, "may_share_memory"):
                    is_aliased = False
                    for j in range(len(args_share_memory)):
    
                        group_j = zip(
                            [
                                self.maker.inputs[k].variable
                                for k in args_share_memory[j]
                            ],
                            [
                                self.input_storage[k].storage[0]
                                for k in args_share_memory[j]
                            ],
                        )
                        if any(
                            [
                                (
                                    var.type is i_var.type
                                    and var.type.may_share_memory(val, i_val)
                                )
                                for (var, val) in group_j
                            ]
                        ):
    
                            is_aliased = True
                            args_share_memory[j].append(i)
                            break
    
                    if not is_aliased:
                        args_share_memory.append([i])
    
            # Check for groups of more than one argument that share memory
            for group in args_share_memory:
                if len(group) > 1:
                    # copy all but the first
                    for j in group[1:]:
                        self.input_storage[j].storage[0] = copy.copy(
                            self.input_storage[j].storage[0]
                        )
    
        # Check if inputs are missing, or if inputs were set more than once, or
        # if we tried to provide inputs that are supposed to be implicit.
        if not self.trust_input:
            for c in self.input_storage:
                if c.required and not c.provided:
                    restore_defaults()
                    raise TypeError(
                        f"Missing required input: {getattr(self.inv_finder[c], 'variable', self.inv_finder[c])}"
                    )
                if c.provided > 1:
                    restore_defaults()
                    raise TypeError(
                        f"Multiple values for input: {getattr(self.inv_finder[c], 'variable', self.inv_finder[c])}"
                    )
                if c.implicit and c.provided > 0:
                    restore_defaults()
                    raise TypeError(
                        f"Tried to provide value for implicit input: {getattr(self.inv_finder[c], 'variable', self.inv_finder[c])}"
                    )
    
        # Do the actual work
        t0_fn = time.time()
        try:
            outputs = (
>               self.fn()
                if output_subset is None
                else self.fn(output_subset=output_subset)
            )

../../../pymc3-venv/lib/python3.8/site-packages/aesara/compile/function/types.py:974: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

p = <bound method RandomVariable.perform of <aesara.tensor.random.basic.CategoricalRV object at 0x7fd827c32fd0>>
i = [[RandomState(MT19937) at 0x7FD82755F040], [array([100000])], [array(4)], [array([0.001, 0.999])]]
o = [[RandomState(MT19937) at 0x7FD82755F040], [None]]
n = categorical_rv(RandomStateSharedVariable(<RandomState(MT19937) at 0x7FD82755F040>), TensorConstant{(1,) of 100000}, TensorConstant{4}, p)

    def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
>       r = p(n, [x[0] for x in i], o)

../../../pymc3-venv/lib/python3.8/site-packages/aesara/graph/op.py:473: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <aesara.tensor.random.basic.CategoricalRV object at 0x7fd827c32fd0>
node = categorical_rv(RandomStateSharedVariable(<RandomState(MT19937) at 0x7FD82755F040>), TensorConstant{(1,) of 100000}, TensorConstant{4}, p)
inputs = [RandomState(MT19937) at 0x7FD82755F040, array([100000]), array(4), array([0.001, 0.999])]
outputs = [[RandomState(MT19937) at 0x7FD82755F040], [None]]

    def perform(self, node, inputs, outputs):
        rng_var_out, smpl_out = outputs
    
        rng, size, dtype, *args = inputs
    
        out_var = node.outputs[1]
    
        # If `size == []`, that means no size is enforced, and NumPy is trusted
        # to draw the appropriate number of samples, NumPy uses `size=None` to
        # represent that.  Otherwise, NumPy expects a tuple.
        if np.size(size) == 0:
            size = None
        else:
            size = tuple(size)
    
        # Draw from `rng` if `self.inplace` is `True`, and from a copy of `rng`
        # otherwise.
        if not self.inplace:
            rng = copy(rng)
    
        rng_var_out[0] = rng
    
>       smpl_val = self.rng_fn(rng, *(args + [size]))

../../../pymc3-venv/lib/python3.8/site-packages/aesara/tensor/random/op.py:417: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'aesara.tensor.random.basic.CategoricalRV'>
rng = RandomState(MT19937) at 0x7FD82755F040, p = array([0.001, 0.999])
size = (100000,)

    @classmethod
    def rng_fn(cls, rng, p, size):
        if size is None:
            size = ()
    
        size = tuple(np.atleast_1d(size))
        ind_shape = p.shape[:-1]
    
        if len(size) > 0 and size[-len(ind_shape) :] != ind_shape:
>           raise ValueError("Parameters shape and size do not match.")
E           ValueError: Parameters shape and size do not match.

../../../pymc3-venv/lib/python3.8/site-packages/aesara/tensor/random/basic.py:375: ValueError

During handling of the above exception, another exception occurred:

self = <pymc3.tests.test_distributions_random.TestScalarParameterSamples object at 0x7fd8277c1d60>
s = 2

    @pytest.mark.parametrize("s", [2,])
    def test_categorical_random(self, s):
        def ref_rand(size, p):
            return nr.choice(np.arange(p.shape[0]), p=p, size=size)
    
>       pymc3_random_discrete(pm.Categorical, {"p": Simplex(s)}, ref_rand=ref_rand)

test_distributions_random.py:795: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_distributions_random.py:132: in pymc3_random_discrete
    o = pymc_rand()
../../../pymc3-venv/lib/python3.8/site-packages/aesara/compile/function/types.py:987: in __call__
    raise_with_op(
../../../pymc3-venv/lib/python3.8/site-packages/aesara/link/utils.py:508: in raise_with_op
    raise exc_value.with_traceback(exc_trace)
../../../pymc3-venv/lib/python3.8/site-packages/aesara/compile/function/types.py:974: in __call__
    self.fn()
../../../pymc3-venv/lib/python3.8/site-packages/aesara/graph/op.py:473: in rval
    r = p(n, [x[0] for x in i], o)
../../../pymc3-venv/lib/python3.8/site-packages/aesara/tensor/random/op.py:417: in perform
    smpl_val = self.rng_fn(rng, *(args + [size]))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'aesara.tensor.random.basic.CategoricalRV'>
rng = RandomState(MT19937) at 0x7FD82755F040, p = array([0.001, 0.999])
size = (100000,)

    @classmethod
    def rng_fn(cls, rng, p, size):
        if size is None:
            size = ()
    
        size = tuple(np.atleast_1d(size))
        ind_shape = p.shape[:-1]
    
        if len(size) > 0 and size[-len(ind_shape) :] != ind_shape:
>           raise ValueError("Parameters shape and size do not match.")
E           ValueError: Parameters shape and size do not match.
E           Apply node that caused the error: categorical_rv(RandomStateSharedVariable(<RandomState(MT19937) at 0x7FD82755F040>), TensorConstant{(1,) of 100000}, TensorConstant{4}, p)
E           Toposort index: 0
E           Inputs types: [RandomStateType, TensorType(int64, (True,)), TensorType(int64, scalar), TensorType(float64, vector)]
E           Inputs shapes: ['No shapes', (1,), (), (2,)]
E           Inputs strides: ['No strides', (8,), (), (8,)]
E           Inputs values: [RandomState(MT19937) at 0x7FD82755F040, array([100000]), array(4), array([0.001, 0.999])]
E           Inputs type_num: ['', 7, 7, 12]
E           Outputs clients: [[], ['output']]
E           
E           Backtrace when the node is created(use Aesara flag traceback__limit=N to make it longer):
E             File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__
E               return self._hookexec(self, self.get_hookimpls(), kwargs)
E             File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec
E               return self._inner_hookexec(hook, methods, kwargs)
E             File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda>
E               self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
E             File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall
E               res = hook_impl.function(*args)
E             File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/python.py", line 184, in pytest_pyfunc_call
E               result = testfunction(**testargs)
E             File "/home/ricardo/Documents/Projects/pymc3/pymc3/tests/test_distributions_random.py", line 795, in test_categorical_random
E               pymc3_random_discrete(pm.Categorical, {"p": Simplex(s)}, ref_rand=ref_rand)
E             File "/home/ricardo/Documents/Projects/pymc3/pymc3/tests/test_distributions_random.py", line 114, in pymc3_random_discrete
E               model_dist = change_rv_size(model.named_vars["value"], size, expand=True)
E             File "/home/ricardo/Documents/Projects/pymc3/pymc3/distributions/__init__.py", line 130, in change_rv_size
E               new_rv_node = rv_node.op.make_node(rng, new_size, dtype, *dist_params)
E           
E           Debugprint of the apply node: 
E           categorical_rv.0 [id A] <RandomStateType> ''   
E            |RandomStateSharedVariable(<RandomState(MT19937) at 0x7FD82755F040>) [id B] <RandomStateType>
E            |TensorConstant{(1,) of 100000} [id C] <TensorType(int64, (True,))>
E            |TensorConstant{4} [id D] <TensorType(int64, scalar)>
E            |p [id E] <TensorType(float64, vector)>
E           categorical_rv.1 [id A] <TensorType(int64, vector)> 'value'   
E           
E           Storage map footprint:
E            - p, Shared Input, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
E            - TensorConstant{(1,) of 100000}, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
E            - TensorConstant{4}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
E            TotalSize: 32.0 Byte(s) 0.000 GB
E            TotalSize inputs: 32.0 Byte(s) 0.000 GB

../../../pymc3-venv/lib/python3.8/site-packages/aesara/tensor/random/basic.py:375: ValueError
=========================== short test summary info ============================
FAILED test_distributions_random.py::TestScalarParameterSamples::test_categorical_random[2]
============================== 1 failed in 0.52s ===============================

Process finished with exit code 1


Assertion failed

Assertion failed

brandonwillard

Overall, it looks like these changes are trying to make RandomVariables fit PyMC3's current interface. This isn't the right approach, especially since we can remap whatever we want via Distribution.dist.

brandonwillard · 2021-03-17T17:30:40Z

pymc3/distributions/continuous.py

-    exponential,
-    gamma,
-    halfcauchy,
-    halfnormal,


I have a hard time believing that we need to recreate all of these distributions. Something's wrong here.

The tests pass now (which was my goal). We can now rework from here and see what is needed or not. The original code before this PR was certainly not working yet.

Yes, we're building toward complete conversion of the code. It isn't going to happen in a single PR/commit.

It's very important that we not introduce new design issues just because we want to make things "work"/tests pass. Remember, a big part of these v4 changes involves fixing old design issues that most likely arose under similar circumstances.

brandonwillard · 2021-03-17T17:32:28Z

pymc3/distributions/continuous.py

@@ -722,6 +722,24 @@ def _distr_parameters_for_repr(self):
        return ["mu", "sigma", "lower", "upper"]


+class PyMC3HalfNormalRV(RandomVariable):


This PyMC3* naming scheme isn't good. The RandomVariables defined in PyMC3 should be generally applicable. Furthermore, we should expect that other libraries will use these as the canonical Aesara RandomVariables—when these variables aren't already in Aesara.

Agree. This was just to distinguish the Aesara / PyMC variations when they were incompatible / extended

pymc3/distributions/continuous.py

brandonwillard · 2021-03-17T17:41:24Z

pymc3/distributions/continuous.py

+class PyMC3HalfCauchyRV(RandomVariable):
+    name = "halfcauchy"
+    ndim_supp = 0
+    ndims_params = [0]
+    dtype = "floatX"
+    _print_name = ("C**+", "\\operatorname{C^{+}}")
+
+    def __call__(self, beta=1.0, size=None, **kwargs):
+        return super().__call__(beta, size=size, **kwargs)
+
+    @classmethod
+    def rng_fn(cls, rng, beta, size):
+        return stats.halfcauchy.rvs(scale=beta, random_state=rng, size=size)
+
+
+halfcauchy = PyMC3HalfCauchyRV()


Again, creating a new class for a reduced parameter space doesn't make sense unless it's absolutely necessary.

Here, it looks like a default loc of zero will work, and not require a new class.

The problem is that the original Aesara class expects two parameters (or at least that's what I figured out from trying to just inherit from it). Am I mistaken?

Then we make the HalfCauchy.dist interface take only the used parameter, scale/beta, and set the unused loc parameter to zero.

Don't we still need to at least invert the order of the arguments in rng_fn. Sounds like a messier solution, but I'm happy to try

Original Aesara HalfCauchyRV:

class HalfCauchyRV(RandomVariable): name = "cauchy" ndim_supp = 0 ndims_params = [0, 0] dtype = "floatX" _print_name = ("C**+", "\\operatorname{C^{+}}") def __call__(self, loc=0.0, scale=1.0, size=None, **kwargs): return super().__call__(loc, scale, size=size, **kwargs) @classmethod def rng_fn(cls, rng, loc, scale, size): return stats.halfcauchy.rvs(loc=loc, scale=scale, random_state=rng, size=size)

Remember, users are not creating RandomVariables directly; they're using Distribution.dist, so that's where we need to focus.

I think I just need to se an example where the loc parameter is ignored, set to zero by default and the relevant scale argument is used properly. Before this refactoring the scale argument was mistaken as the loc during random sampling (with the scale defaulting to 1). That's why it failed in the reintroduced test_distributions_random.py test

Yeah, it sounds like there was a bug underlying that issue. I'll try to push the changes I'm alluding to in a minute.

ricardoV94 · 2021-03-17T17:42:58Z

Will need to also refactor NegativeBinomial to be in terms of n and p by default or the aesaraOp to be in terms of mu and alpha. This is enough for today though :D

brandonwillard · 2021-03-17T17:57:49Z

Will need to also refactor NegativeBinomial to be in terms of n and p by default or the aesaraOp to be in terms of mu and alpha. This is enough for today though :D

Again, we only need to change NegativeBinomial.dist to match the parameterization we want. No need to change anything else.

ricardoV94 · 2021-03-17T18:09:26Z

Overall, it looks like these changes are trying to make RandomVariables fit PyMC3's current interface. This isn't the right approach, especially since we can remap whatever we want via Distribution.dist.

Yes that's exactly what I had in mind. My assumption was that since most randomOps will be created in PyMC3 anyway, we might as well have all of them work with the old pymc3 parametrizations and just tweak those that we want to recycle from Aesara. If the final split is like 80% brand new RandomOps created in Pymc3 and 20% adapted from Asera with the parameters converted to match them, is it worth it to add the extra boilerplate to the logp/logcdf/dist methods instead of writing those 20% from scratch (inheriting from RandomOp)?

Whatever the final solution, I got the hang of how to code the parameter conversions. I am still unsure how to do it for parameter reduction, but I'll try to figure it out.

ricardoV94 · 2021-03-18T10:09:02Z

pymc3/distributions/continuous.py

    """
    rv_op = exponential

    @classmethod
-    def dist(cls, lam, *args, **kwargs):
-        lam = aet.as_tensor_variable(floatX(lam))
+    def dist(cls, lam=None, mu=None, *args, **kwargs):


Introducing new parametrization mu. This is not strictly needed, but since we have to reparametrize behind the scenes for the exponential Op anway, we might as well add it as an option for the logp / logcdf.

ricardoV94 · 2021-03-18T10:37:06Z

Reintroduced xfails for:

Categorical (random): needs debugging, see Fix different / alternative parametrizations between RandomOps and Logp methods #4548 (comment)
MvNormal (random and logp): needs refactoring of different parametrizations, see Fix different / alternative parametrizations between RandomOps and Logp methods #4548 (comment)

There are still temporary hacks in place for:

GammaRV: there is a weird parameter inversion going on in the original op, see Fix different / alternative parametrizations between RandomOps and Logp methods #4548 (comment)
HalfNormalRV and HalfCauchyRV: still figuring out how to use only a subset of the original parameters, see Fix different / alternative parametrizations between RandomOps and Logp methods #4548 (comment)

Still to be done:

~~Finish logp of NegativeBinomial: needs to call logp of Poisson for large values~~
Decide on whether to revert the test_distributions_random.py or keep them at least until we finish refactoring other distributions (they were quite useful in detecting issues between different parametrizations of logp/logcdf and random methods of Aesara Ops that were not addressed in Refactor basic Distributions #4508)

After this (and review) I think we will be in a position to quickly refactor the remaining standard distributions.

ricardoV94 · 2021-03-19T11:17:41Z

Sometimes getting a failing test related to the Dirichlet (not touched in this PR) in test_dirichlet_with_batch_shapes

https://github.com/pymc-devs/pymc3/runs/2147720067?check_suite_focus=true

alpha = array([1., 1., 1.])
x = array([0.83469003, 0.02537253, 0.13993742], dtype=float32)

    def _dirichlet_check_input(alpha, x):
        x = np.asarray(x)
    
        if x.shape[0] + 1 != alpha.shape[0] and x.shape[0] != alpha.shape[0]:
            raise ValueError("Vector 'x' must have either the same number "
                             "of entries as, or one entry fewer than, "
                             "parameter vector 'a', but alpha.shape = %s "
                             "and x.shape = %s." % (alpha.shape, x.shape))
    
        if x.shape[0] != alpha.shape[0]:
            xk = np.array([1 - np.sum(x, 0)])
            if xk.ndim == 1:
                x = np.append(x, xk)
            elif xk.ndim == 2:
                x = np.vstack((x, xk))
            else:
                raise ValueError("The input must be one dimensional or a two "
                                 "dimensional matrix containing the entries.")
    
        if np.min(x) < 0:
            raise ValueError("Each entry in 'x' must be greater than or equal "
                             "to zero.")
    
        if np.max(x) > 1:
            raise ValueError("Each entry in 'x' must be smaller or equal one.")
    
        # Check x_i > 0 or alpha_i > 1
        xeq0 = (x == 0)
        alphalt1 = (alpha < 1)
        if x.shape != alpha.shape:
            alphalt1 = np.repeat(alphalt1, x.shape[-1], axis=-1).reshape(x.shape)
        chk = np.logical_and(xeq0, alphalt1)
    
        if np.sum(chk):
            raise ValueError("Each entry in 'x' must be greater than zero if its "
                             "alpha is less than one.")
    
        if (np.abs(np.sum(x, 0) - 1.0) > 10e-10).any():
            raise ValueError("The input vector 'x' must lie within the normal "
>                            "simplex. but np.sum(x, 0) = %s." % np.sum(x, 0))
E           ValueError: The input vector 'x' must lie within the normal simplex. but np.sum(x, 0) = 0.99999994.

…y Aesara

ricardoV94 force-pushed the fix_halfcauchy branch from 8413dd5 to 105ed4a Compare March 16, 2021 17:47

brandonwillard suggested changes Mar 16, 2021

View reviewed changes

michaelosthege added the v4 label Mar 16, 2021

ricardoV94 changed the title ~~Fix HalfCauchy distribution~~ Fix different / alternative parametrizations between RandomOps and Logp methods Mar 17, 2021

ricardoV94 commented Mar 17, 2021

View reviewed changes

.github/workflows/pytest.yml Outdated Show resolved Hide resolved

ricardoV94 force-pushed the fix_halfcauchy branch 2 times, most recently from 77259a3 to e77bed8 Compare March 17, 2021 09:45

twiecki reviewed Mar 17, 2021

View reviewed changes

pymc3/distributions/continuous.py Outdated Show resolved Hide resolved

ricardoV94 commented Mar 17, 2021

View reviewed changes

ricardoV94 force-pushed the fix_halfcauchy branch from 2fc525b to 1996ed3 Compare March 17, 2021 17:01

brandonwillard suggested changes Mar 17, 2021

View reviewed changes

ricardoV94 commented Mar 18, 2021

View reviewed changes

ricardoV94 force-pushed the fix_halfcauchy branch from 2f6c0a9 to 3ca2162 Compare March 19, 2021 09:22

ricardoV94 mentioned this pull request Mar 19, 2021

Establish new test framework for the "random" behavior of Distributions #4554

Closed

ricardoV94 force-pushed the fix_halfcauchy branch 2 times, most recently from 2a76568 to fbca286 Compare March 19, 2021 10:54

ricardoV94 force-pushed the fix_halfcauchy branch from fbca286 to 6ba2c19 Compare March 22, 2021 16:17

ricardoV94 requested a review from brandonwillard March 23, 2021 05:45

brandonwillard mentioned this pull request Mar 24, 2021

Update more samplers for v4 compatibility #4559

Merged

ricardoV94 force-pushed the fix_halfcauchy branch 4 times, most recently from aebdfcf to 136dc07 Compare March 26, 2021 15:23

Fix shape logic in test_distributions_random and skip tests covered b…

9056a13

…y Aesara

brandonwillard force-pushed the fix_halfcauchy branch from 136dc07 to 1c5bc07 Compare March 26, 2021 23:53

ricardoV94 added 4 commits March 26, 2021 18:59

Re-enable test_distributions_random

2f42d72

Fix HalfCauchy/HalfCauchyRV parameterization

7f2e3e7

Fix HalfNormal/HalfNormalRV parameterization

4a88c17

Refactor Beta to use custom rng_fn clipped_beta_rv

da457f2

brandonwillard force-pushed the fix_halfcauchy branch from 1c5bc07 to da457f2 Compare March 26, 2021 23:59

brandonwillard added 3 commits March 26, 2021 19:27

Re-enable v4 xfails in pymc3.distributions.dist_math

c1c7a39

Remove deprecated default values tests

1bfbf07

Re-enable pymc3.tests.test_modelcontext

49f6d6f

brandonwillard force-pushed the fix_halfcauchy branch from 94c5980 to 49f6d6f Compare March 27, 2021 00:27

brandonwillard merged commit 48a00b8 into pymc-devs:v4 Mar 27, 2021

ricardoV94 mentioned this pull request Mar 27, 2021

Fix exponential and gamma logp / random link #4576

Merged

This was referenced Mar 28, 2021

WIP: V4 update test framework for distributions random method #4580

Closed

V4 update test framework for distributions random method 2nd attempt #4608

Merged

ricardoV94 deleted the fix_halfcauchy branch April 13, 2021 18:49

		@@ -722,6 +722,24 @@ def _distr_parameters_for_repr(self):
		return ["mu", "sigma", "lower", "upper"]


		class PyMC3HalfNormalRV(RandomVariable):

Uh oh!

Fix different / alternative parametrizations between RandomOps and Logp methods #4548

Fix different / alternative parametrizations between RandomOps and Logp methods #4548

Uh oh!

Conversation

ricardoV94 commented Mar 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard left a comment

Choose a reason for hiding this comment

Uh oh!

brandonwillard Mar 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Mar 17, 2021

Choose a reason for hiding this comment

Uh oh!

ricardoV94 commented Mar 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard commented Mar 16, 2021

Uh oh!

ricardoV94 commented Mar 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard commented Mar 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Mar 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard commented Mar 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ricardoV94 commented Mar 17, 2021

Uh oh!

ricardoV94 Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard commented Mar 17, 2021

Uh oh!

brandonwillard commented Mar 17, 2021

Uh oh!

brandonwillard commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Mar 17, 2021

Uh oh!

brandonwillard left a comment

Choose a reason for hiding this comment

Uh oh!

brandonwillard Mar 17, 2021

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Mar 17, 2021

Choose a reason for hiding this comment

Uh oh!

brandonwillard Mar 17, 2021

Choose a reason for hiding this comment

Uh oh!

brandonwillard Mar 17, 2021

ricardoV94 commented Mar 16, 2021 •

edited

Loading

brandonwillard Mar 16, 2021 •

edited

Loading

ricardoV94 commented Mar 16, 2021 •

edited

Loading

ricardoV94 commented Mar 16, 2021 •

edited

Loading

brandonwillard commented Mar 16, 2021 •

edited

Loading

ricardoV94 commented Mar 16, 2021 •

edited

Loading

brandonwillard commented Mar 16, 2021 •

edited

Loading

ricardoV94 commented Mar 17, 2021 •

edited

Loading

ricardoV94 Mar 17, 2021 •

edited

Loading

ricardoV94 commented Mar 17, 2021 •

edited

Loading

ricardoV94 commented Mar 17, 2021 •

edited

Loading

ricardoV94 commented Mar 17, 2021 •

edited

Loading

brandonwillard commented Mar 17, 2021 •

edited

Loading

ricardoV94 commented Mar 17, 2021 •

edited

Loading

ricardoV94 Mar 17, 2021 •

edited

Loading

ricardoV94 Mar 17, 2021 •

edited

Loading

brandonwillard Mar 17, 2021 •

edited

Loading

ricardoV94 Mar 17, 2021 •

edited

Loading

ricardoV94 Mar 17, 2021 •

edited

Loading

brandonwillard Mar 17, 2021 •

edited

Loading

ricardoV94 commented Mar 17, 2021 •

edited

Loading

ricardoV94 Mar 18, 2021 •

edited

Loading

ricardoV94 commented Mar 18, 2021 •

edited

Loading

ricardoV94 commented Mar 19, 2021 •

edited

Loading