Test model logp before starting any MCMC chains #4116

michaelosthege · 2020-09-18T15:11:29Z

Description of your problem

When a model contains nan observations, or observations that are outside of the support of the prior, the models logp evaluates to -inf.

Both find_MAP and sample don't check against this and fail in different ways that don't make the underlying problem obvious:

pm.find_MAP does two iterations, showing -inf for the logp, but it doesn't raise errors
pm.sample typically fails with Bad initial energy
pm.sample on Windows with multiprocessing crashes the child processes with uninformative error escalation (output similar as shown in Remove default multiprocessing when on Windows? #3403). If run with Jupyter notebooks, it even crashes the entire notebook server

Related issues:
enhancement #4107 would use the same information, but NOT raise

Proposed solution

At the beginning of find_MAP/ sample and BEFORE creating child processes we should evaluate the model and raise an informative error.

Possibly even telling the user which variable was nan/inf?

The text was updated successfully, but these errors were encountered:

StephenHogg · 2020-11-05T09:06:39Z

Hi, I'm happy to take this on as long as someone can give me pointers on how to implement the proposed solution?

michaelosthege · 2020-11-05T20:12:51Z

Both sampling and find_MAP use a start kwarg/variable to initialize their iterations.

For example, it is assigned to model.test_point here: https://github.com/pymc-devs/pymc3/blob/9eb69fc6f06eb1f507a29eff070cfc47b07275a4/pymc3/tuning/starting.py#L88

The Model has properties for the log-posterior and gradient. I always forget what they are called - model.logp and model.dlogp or so.
You can stick the start value (defaulting to model.test_point) into that and check that it evaluates to a number (not inf or nan).

The General API Quickstart notebook has some cells using model.logp.

Note that the start kwarg to pm.sample can be of a few different types.

StephenHogg · 2020-11-06T09:07:24Z

So am I right in thinking that this is as simple as updating the start of find_MAP to be something like this?

    model = modelcontext(model)
    if start is None:
        start = model.test_point
    else:
        update_start_vals(start, model.test_point, model)

    if not set(start.keys()).issubset(model.named_vars.keys()):
        extra_keys = ", ".join(set(start.keys()) - set(model.named_vars.keys()))
        valid_keys = ", ".join(model.named_vars.keys())
        raise KeyError("Some start parameters do not appear in the model!\n"
                       "Valid keys are: {}, but {} was supplied".format(valid_keys, extra_keys))

    initial_eval = model.logp(start)
    if (~np.isfinite(initial_eval)).any():
        raise ValueError("Initial evaluation of model at starting point {} failed:\n"
                         "Variables appear to have NaN or Inf values".format(start))

Additionally, how do I get it to spit out which variables have the problem?

michaelosthege · 2020-11-06T18:00:39Z

Something like that, yes. I wouldn't be surprised if some of the tests break.

Your if does not check against NaN yet. Technically it may be possible to tell which variables are responsible, but I don't know if there's an easy solution.

StephenHogg · 2020-11-07T04:43:10Z

Something like that, yes. I wouldn't be surprised if some of the tests break.

Your if does not check against NaN yet. Technically it may be possible to tell which variables are responsible, but I don't know if there's an easy solution.

Thanks! Actually, NaN checking is accomplished with np.isfinite - here's the relevant doc: https://numpy.org/doc/stable/reference/generated/numpy.isfinite.html

As far as the working out which variable is responsible thing, I guess I will drop that for now then.

lucianopaz · 2020-11-07T06:32:04Z

You should look at how check_test_point does this.

StephenHogg · 2020-11-09T10:36:24Z

Thanks, unless there's anything outright wrong with me doing so I'll probably just inspect the output of that method.

I'm having a lot of trouble getting the tests, to work, unfortunately. As it stands, this is the code in sampling.sample():

    if start is None:
        start = [model.test_point] * (chains if chains else 1)
    else:
        if isinstance(start, dict):
            start = [start] * (chains if chains else 1)

    for chain_start_vals in start:
        update_start_vals(chain_start_vals, model.test_point, model)

        if not set(chain_start_vals.keys()).issubset(model.named_vars.keys()):
            extra_keys = ", ".join(set(chain_start_vals.keys()) - set(model.named_vars.keys()))
            valid_keys = ", ".join(model.named_vars.keys())
            raise KeyError("Some start parameters do not appear in the model!\n"
                           "Valid keys are: {}, but {} was supplied".format(valid_keys, extra_keys))

        initial_eval = model.check_test_point(test_point=chain_start_vals)

        if not np.all(np.isfinite(initial_eval)):
            raise ValueError("Initial evaluation of model at starting point {} failed:\n"
                             "Variables appear to have NaN or Inf values: {}".format(
                                 chain_start_vals, str(initial_eval.index[np.isfinite(initial_eval)].to_list())))

This generates errors that look like the following:

______________________________________________________________________________________________________________________________ TestSample.test_sample_init ______________________________________________________________________________________________________________________________

self = <pymc3.tests.test_sampling.TestSample object at 0x1155c12d0>

    def test_sample_init(self):
        with self.model:
            for init in ("advi", "advi_map", "map"):
                pm.sample(
                    init=init,
                    tune=0,
                    n_init=1000,
                    draws=50,
>                   random_seed=self.random_seed,
                )

pymc3/tests/test_sampling.py:95: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pymc3/sampling.py:539: in sample
    trace = _mp_sample(**sample_args, **parallel_args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

draws = 50, tune = 0, step = <pymc3.step_methods.hmc.nuts.NUTS object at 0x139b3a6d0>, chains = 4, cores = 4, chain = 0, random_seed = [163206115, 507505786, 368981262, 154707469], start = [{'x': array([0.1, 0.1])}], progressbar = True, trace = None
model = <pymc3.model.Model object at 0x138690950>, callback = None, discard_tuned_samples = True, mp_ctx = None, pickle_backend = 'pickle', kwargs = {}, ps = <module 'pymc3.parallel_sampling' from '/Users/shogg/git/pymc3/pymc3/parallel_sampling.py'>
traces = [<pymc3.backends.ndarray.NDArray object at 0x139c48650>], idx = 1, strace = <pymc3.backends.ndarray.NDArray object at 0x139c48610>

    def _mp_sample(
        draws: int,
        tune: int,
        step,
        chains: int,
        cores: int,
        chain: int,
        random_seed: list,
        start: list,
        progressbar=True,
        trace=None,
        model=None,
        callback=None,
        discard_tuned_samples=True,
        mp_ctx=None,
        pickle_backend="pickle",
        **kwargs,
    ):
        """Main iteration for multiprocess sampling.
    
        Parameters
        ----------
        draws : int
            The number of samples to draw
        tune : int, optional
            Number of iterations to tune, if applicable (defaults to None)
        step : function
            Step function
        chains : int
            The number of chains to sample.
        cores : int
            The number of chains to run in parallel.
        chain : int
            Number of the first chain.
        random_seed : list of ints
            Random seeds for each chain.
        start : list
            Starting points for each chain.
        progressbar : bool
            Whether or not to display a progress bar in the command line.
        trace : backend, list, MultiTrace or None
            This should be a backend instance, a list of variables to track, or a MultiTrace object
            with past values. If a MultiTrace object is given, it must contain samples for the chain
            number ``chain``. If None or a list of variables, the NDArray backend is used.
        model : Model (optional if in ``with`` context)
        callback : Callable
            A function which gets called for every sample from the trace of a chain. The function is
            called with the trace and the current draw and will contain all samples for a single trace.
            the ``draw.chain`` argument can be used to determine which of the active chains the sample
            is drawn from.
            Sampling can be interrupted by throwing a ``KeyboardInterrupt`` in the callback.
    
        Returns
        -------
        trace : pymc3.backends.base.MultiTrace
            A ``MultiTrace`` object that contains the samples for all chains.
        """
        import pymc3.parallel_sampling as ps
    
        # We did draws += tune in pm.sample
        draws -= tune
    
        traces = []
        for idx in range(chain, chain + chains):
            if trace is not None:
                strace = _choose_backend(copy(trace), idx, model=model)
            else:
                strace = _choose_backend(None, idx, model=model)
            # for user supply start value, fill-in missing value if the supplied
            # dict does not contain all parameters
>           update_start_vals(start[idx - chain], model.test_point, model)
E           IndexError: list index out of range

pymc3/sampling.py:1426: IndexError

Any ideas on how to proceed would be gratefully received.

michaelosthege · 2020-11-09T10:45:51Z

The indexing of start[idx - chain] seems to be the problem. I assume you already changed something?

It may help to insert a print(start) in the line before. The print will be captured by the test logger.

StephenHogg · 2020-11-09T11:19:58Z

Ok, have got that problem fixed. I now have two tests that have errors like this:

__________________________________________________________________ TestSamplePriorPredictive.test_shared __________________________________________________________________

self = <pymc3.tests.test_sampling.TestSamplePriorPredictive object at 0x13e31b4d0>

    def test_shared(self):
        n1 = 10
        obs = shared(np.random.rand(n1) < 0.5)
        draws = 50
    
        with pm.Model() as m:
            p = pm.Beta("p", 1.0, 1.0)
>           y = pm.Bernoulli("y", p, observed=obs)

pymc3/tests/test_sampling.py:860: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pymc3/distributions/distribution.py:98: in __new__
    return model.Var(name, dist, data, total_size, dims=dims)
pymc3/model.py:1166: in Var
    model=self,
pymc3/model.py:1800: in __init__
    self.tag.test_value = theano.compile.view_op(data).tag.test_value
../../.pyenv/versions/3.7.5/envs/pymc3/lib/python3.7/site-packages/theano/gof/utils.py:279: in __setattr__
    obj = self.attr_filter(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = TensorType(bool, vector), data = array([1, 1, 1, 0, 0, 0, 1, 1, 1, 0]), strict = False, allow_downcast = None

    def filter(self, data, strict=False, allow_downcast=None):
        """
        Convert `data` to something which can be associated to a
        `TensorVariable`.
    
        This function is not meant to be called in user code. It is for
        `Linker` instances to use when running a compiled graph.
    
        """
        # Explicit error message when one accidentally uses a Variable as
        # input (typical mistake, especially with shared variables).
        if isinstance(data, Variable):
            raise TypeError(
                "Expected an array-like object, but found a Variable: "
                "maybe you are trying to call a function on a (possibly "
                "shared) variable instead of a numeric array?"
            )
    
        if (type(data) is np.ndarray) and (data.dtype == self.numpy_dtype):
            if data.dtype.num != self.numpy_dtype.num:
                data = theano._asarray(data, dtype=self.dtype)
            # -- now fall through to ndim check
        elif (type(data) is np.memmap) and (data.dtype == self.numpy_dtype):
            # numpy.memmap is a "safe" subclass of ndarray,
            # so we can use it wherever we expect a base ndarray.
            # however, casting it would defeat the purpose of not
            # loading the whole data into memory
            pass
        elif strict:
            # If any of the two conditions above was not met,
            # we raise a meaningful TypeError.
            if not (type(data) is np.ndarray):
                raise TypeError(
                    "%s expected a ndarray object." % self, data, type(data)
                )
            if data.dtype != self.numpy_dtype:
                raise TypeError(
                    ("%s expected a ndarray object with " "dtype = %s (got %s).")
                    % (self, self.numpy_dtype, data.dtype)
                )
            raise AssertionError("This point should never be reached.")
        else:
            if allow_downcast:
                # Convert to self.dtype, regardless of the type of data
                data = theano._asarray(data, dtype=self.dtype)
                # TODO: consider to pad shape with ones to make it consistent
                # with self.broadcastable... like vector->row type thing
            else:
                if isinstance(data, np.ndarray):
                    # Check if self.dtype can accurately represent data
                    # (do not try to convert the data)
                    up_dtype = scal.upcast(self.dtype, data.dtype)
                    if up_dtype == self.dtype:
                        # Bug in the following line when data is a
                        # scalar array, see
                        # http://projects.scipy.org/numpy/ticket/1611
                        # data = data.astype(self.dtype)
                        data = theano._asarray(data, dtype=self.dtype)
                    if up_dtype != self.dtype:
                        err_msg = (
                            "%s cannot store a value of dtype %s without "
                            "risking loss of precision. If you do not mind "
                            "this loss, you can: "
                            "1) explicitly cast your data to %s, or "
                            '2) set "allow_input_downcast=True" when calling '
                            '"function". Value: "%s"'
                            % (self, data.dtype, self.dtype, repr(data))
                        )
>                       raise TypeError(err_msg)
E                       TypeError: TensorType(bool, vector) cannot store a value of dtype int64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to bool, or 2) set "allow_input_downcast=True" when calling "function". Value: "array([1, 1, 1, 0, 0, 0, 1, 1, 1, 0])"

Any ideas as to what this one is about? It seems like some sort of environment variable problem, but I'm not sure how it's arising in this context

michaelosthege · 2020-11-09T11:36:34Z

I see no connection, but it's hard to tell without the diff. Can you open a Draft PR?

StephenHogg · 2020-11-09T12:00:06Z

Sure - PR open

michaelosthege · 2020-11-11T19:23:29Z

@StephenHogg you can now merge the latest changes from master into your branch (4116) to fix that unrelated (locally) failing test:

git remote add pymcdevs https://github.com/pymc-devs/pymc3
git fetch pymcdevs master
git checkout 4116
git merge master

StephenHogg · 2020-11-27T09:58:29Z

Suggest closing the issue now that #4211 is merged

twiecki · 2020-11-27T10:56:52Z

@StephenHogg Small tip: if in your PR you would have written "Closes #4116" this would have happened automatically.

michaelosthege added enhancements hackathon labels Sep 18, 2020

michaelosthege changed the title ~~Test model likelihood before starting any MCMC chains~~ Test model logp before starting any MCMC chains Sep 18, 2020

junpenglao added the beginner friendly label Oct 31, 2020

StephenHogg mentioned this issue Nov 9, 2020

Test model logp before starting any MCMC chains #4211

Merged

twiecki closed this as completed Nov 27, 2020

Spaak mentioned this issue Dec 7, 2020

bumping version to 3.10.0 #4309

Merged

michaelosthege removed the hackathon label Jun 5, 2021

michaelosthege mentioned this issue May 27, 2022

Remove default multiprocessing when on Windows? #3403

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test model logp before starting any MCMC chains #4116

Test model logp before starting any MCMC chains #4116

michaelosthege commented Sep 18, 2020

StephenHogg commented Nov 5, 2020

michaelosthege commented Nov 5, 2020 •

edited

Loading

StephenHogg commented Nov 6, 2020

michaelosthege commented Nov 6, 2020

StephenHogg commented Nov 7, 2020

lucianopaz commented Nov 7, 2020

StephenHogg commented Nov 9, 2020

michaelosthege commented Nov 9, 2020

StephenHogg commented Nov 9, 2020

michaelosthege commented Nov 9, 2020

StephenHogg commented Nov 9, 2020

michaelosthege commented Nov 11, 2020

StephenHogg commented Nov 27, 2020

twiecki commented Nov 27, 2020

Test model logp before starting any MCMC chains #4116

Test model logp before starting any MCMC chains #4116

Comments

michaelosthege commented Sep 18, 2020

Description of your problem

Proposed solution

StephenHogg commented Nov 5, 2020

michaelosthege commented Nov 5, 2020 • edited Loading

StephenHogg commented Nov 6, 2020

michaelosthege commented Nov 6, 2020

StephenHogg commented Nov 7, 2020

lucianopaz commented Nov 7, 2020

StephenHogg commented Nov 9, 2020

michaelosthege commented Nov 9, 2020

StephenHogg commented Nov 9, 2020

michaelosthege commented Nov 9, 2020

StephenHogg commented Nov 9, 2020

michaelosthege commented Nov 11, 2020

StephenHogg commented Nov 27, 2020

twiecki commented Nov 27, 2020

michaelosthege commented Nov 5, 2020 •

edited

Loading