Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test model logp before starting any MCMC chains #4116

Closed
michaelosthege opened this issue Sep 18, 2020 · 14 comments
Closed

Test model logp before starting any MCMC chains #4116

michaelosthege opened this issue Sep 18, 2020 · 14 comments

Comments

@michaelosthege
Copy link
Member

Description of your problem

When a model contains nan observations, or observations that are outside of the support of the prior, the models logp evaluates to -inf.

Both find_MAP and sample don't check against this and fail in different ways that don't make the underlying problem obvious:

  • pm.find_MAP does two iterations, showing -inf for the logp, but it doesn't raise errors
  • pm.sample typically fails with Bad initial energy
  • pm.sample on Windows with multiprocessing crashes the child processes with uninformative error escalation (output similar as shown in Remove default multiprocessing when on Windows? #3403). If run with Jupyter notebooks, it even crashes the entire notebook server

Related issues:
enhancement #4107 would use the same information, but NOT raise

Proposed solution

At the beginning of find_MAP/ sample and BEFORE creating child processes we should evaluate the model and raise an informative error.

Possibly even telling the user which variable was nan/inf?

@michaelosthege michaelosthege changed the title Test model likelihood before starting any MCMC chains Test model logp before starting any MCMC chains Sep 18, 2020
@StephenHogg
Copy link
Contributor

Hi, I'm happy to take this on as long as someone can give me pointers on how to implement the proposed solution?

@michaelosthege
Copy link
Member Author

michaelosthege commented Nov 5, 2020

Both sampling and find_MAP use a start kwarg/variable to initialize their iterations.

For example, it is assigned to model.test_point here: https://github.com/pymc-devs/pymc3/blob/9eb69fc6f06eb1f507a29eff070cfc47b07275a4/pymc3/tuning/starting.py#L88

The Model has properties for the log-posterior and gradient. I always forget what they are called - model.logp and model.dlogp or so.
You can stick the start value (defaulting to model.test_point) into that and check that it evaluates to a number (not inf or nan).

The General API Quickstart notebook has some cells using model.logp.

Note that the start kwarg to pm.sample can be of a few different types.

@StephenHogg
Copy link
Contributor

So am I right in thinking that this is as simple as updating the start of find_MAP to be something like this?

    model = modelcontext(model)
    if start is None:
        start = model.test_point
    else:
        update_start_vals(start, model.test_point, model)

    if not set(start.keys()).issubset(model.named_vars.keys()):
        extra_keys = ", ".join(set(start.keys()) - set(model.named_vars.keys()))
        valid_keys = ", ".join(model.named_vars.keys())
        raise KeyError("Some start parameters do not appear in the model!\n"
                       "Valid keys are: {}, but {} was supplied".format(valid_keys, extra_keys))

    initial_eval = model.logp(start)
    if (~np.isfinite(initial_eval)).any():
        raise ValueError("Initial evaluation of model at starting point {} failed:\n"
                         "Variables appear to have NaN or Inf values".format(start))

Additionally, how do I get it to spit out which variables have the problem?

@michaelosthege
Copy link
Member Author

Something like that, yes. I wouldn't be surprised if some of the tests break.

Your if does not check against NaN yet. Technically it may be possible to tell which variables are responsible, but I don't know if there's an easy solution.

@StephenHogg
Copy link
Contributor

Something like that, yes. I wouldn't be surprised if some of the tests break.

Your if does not check against NaN yet. Technically it may be possible to tell which variables are responsible, but I don't know if there's an easy solution.

Thanks! Actually, NaN checking is accomplished with np.isfinite - here's the relevant doc: https://numpy.org/doc/stable/reference/generated/numpy.isfinite.html

As far as the working out which variable is responsible thing, I guess I will drop that for now then.

@lucianopaz
Copy link
Contributor

You should look at how check_test_point does this.

@StephenHogg
Copy link
Contributor

Thanks, unless there's anything outright wrong with me doing so I'll probably just inspect the output of that method.

I'm having a lot of trouble getting the tests, to work, unfortunately. As it stands, this is the code in sampling.sample():

    if start is None:
        start = [model.test_point] * (chains if chains else 1)
    else:
        if isinstance(start, dict):
            start = [start] * (chains if chains else 1)

    for chain_start_vals in start:
        update_start_vals(chain_start_vals, model.test_point, model)

        if not set(chain_start_vals.keys()).issubset(model.named_vars.keys()):
            extra_keys = ", ".join(set(chain_start_vals.keys()) - set(model.named_vars.keys()))
            valid_keys = ", ".join(model.named_vars.keys())
            raise KeyError("Some start parameters do not appear in the model!\n"
                           "Valid keys are: {}, but {} was supplied".format(valid_keys, extra_keys))

        initial_eval = model.check_test_point(test_point=chain_start_vals)

        if not np.all(np.isfinite(initial_eval)):
            raise ValueError("Initial evaluation of model at starting point {} failed:\n"
                             "Variables appear to have NaN or Inf values: {}".format(
                                 chain_start_vals, str(initial_eval.index[np.isfinite(initial_eval)].to_list())))

This generates errors that look like the following:

______________________________________________________________________________________________________________________________ TestSample.test_sample_init ______________________________________________________________________________________________________________________________

self = <pymc3.tests.test_sampling.TestSample object at 0x1155c12d0>

    def test_sample_init(self):
        with self.model:
            for init in ("advi", "advi_map", "map"):
                pm.sample(
                    init=init,
                    tune=0,
                    n_init=1000,
                    draws=50,
>                   random_seed=self.random_seed,
                )

pymc3/tests/test_sampling.py:95: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pymc3/sampling.py:539: in sample
    trace = _mp_sample(**sample_args, **parallel_args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

draws = 50, tune = 0, step = <pymc3.step_methods.hmc.nuts.NUTS object at 0x139b3a6d0>, chains = 4, cores = 4, chain = 0, random_seed = [163206115, 507505786, 368981262, 154707469], start = [{'x': array([0.1, 0.1])}], progressbar = True, trace = None
model = <pymc3.model.Model object at 0x138690950>, callback = None, discard_tuned_samples = True, mp_ctx = None, pickle_backend = 'pickle', kwargs = {}, ps = <module 'pymc3.parallel_sampling' from '/Users/shogg/git/pymc3/pymc3/parallel_sampling.py'>
traces = [<pymc3.backends.ndarray.NDArray object at 0x139c48650>], idx = 1, strace = <pymc3.backends.ndarray.NDArray object at 0x139c48610>

    def _mp_sample(
        draws: int,
        tune: int,
        step,
        chains: int,
        cores: int,
        chain: int,
        random_seed: list,
        start: list,
        progressbar=True,
        trace=None,
        model=None,
        callback=None,
        discard_tuned_samples=True,
        mp_ctx=None,
        pickle_backend="pickle",
        **kwargs,
    ):
        """Main iteration for multiprocess sampling.
    
        Parameters
        ----------
        draws : int
            The number of samples to draw
        tune : int, optional
            Number of iterations to tune, if applicable (defaults to None)
        step : function
            Step function
        chains : int
            The number of chains to sample.
        cores : int
            The number of chains to run in parallel.
        chain : int
            Number of the first chain.
        random_seed : list of ints
            Random seeds for each chain.
        start : list
            Starting points for each chain.
        progressbar : bool
            Whether or not to display a progress bar in the command line.
        trace : backend, list, MultiTrace or None
            This should be a backend instance, a list of variables to track, or a MultiTrace object
            with past values. If a MultiTrace object is given, it must contain samples for the chain
            number ``chain``. If None or a list of variables, the NDArray backend is used.
        model : Model (optional if in ``with`` context)
        callback : Callable
            A function which gets called for every sample from the trace of a chain. The function is
            called with the trace and the current draw and will contain all samples for a single trace.
            the ``draw.chain`` argument can be used to determine which of the active chains the sample
            is drawn from.
            Sampling can be interrupted by throwing a ``KeyboardInterrupt`` in the callback.
    
        Returns
        -------
        trace : pymc3.backends.base.MultiTrace
            A ``MultiTrace`` object that contains the samples for all chains.
        """
        import pymc3.parallel_sampling as ps
    
        # We did draws += tune in pm.sample
        draws -= tune
    
        traces = []
        for idx in range(chain, chain + chains):
            if trace is not None:
                strace = _choose_backend(copy(trace), idx, model=model)
            else:
                strace = _choose_backend(None, idx, model=model)
            # for user supply start value, fill-in missing value if the supplied
            # dict does not contain all parameters
>           update_start_vals(start[idx - chain], model.test_point, model)
E           IndexError: list index out of range

pymc3/sampling.py:1426: IndexError

Any ideas on how to proceed would be gratefully received.

@michaelosthege
Copy link
Member Author

The indexing of start[idx - chain] seems to be the problem. I assume you already changed something?

It may help to insert a print(start) in the line before. The print will be captured by the test logger.

@StephenHogg
Copy link
Contributor

Ok, have got that problem fixed. I now have two tests that have errors like this:

__________________________________________________________________ TestSamplePriorPredictive.test_shared __________________________________________________________________

self = <pymc3.tests.test_sampling.TestSamplePriorPredictive object at 0x13e31b4d0>

    def test_shared(self):
        n1 = 10
        obs = shared(np.random.rand(n1) < 0.5)
        draws = 50
    
        with pm.Model() as m:
            p = pm.Beta("p", 1.0, 1.0)
>           y = pm.Bernoulli("y", p, observed=obs)

pymc3/tests/test_sampling.py:860: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pymc3/distributions/distribution.py:98: in __new__
    return model.Var(name, dist, data, total_size, dims=dims)
pymc3/model.py:1166: in Var
    model=self,
pymc3/model.py:1800: in __init__
    self.tag.test_value = theano.compile.view_op(data).tag.test_value
../../.pyenv/versions/3.7.5/envs/pymc3/lib/python3.7/site-packages/theano/gof/utils.py:279: in __setattr__
    obj = self.attr_filter(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = TensorType(bool, vector), data = array([1, 1, 1, 0, 0, 0, 1, 1, 1, 0]), strict = False, allow_downcast = None

    def filter(self, data, strict=False, allow_downcast=None):
        """
        Convert `data` to something which can be associated to a
        `TensorVariable`.
    
        This function is not meant to be called in user code. It is for
        `Linker` instances to use when running a compiled graph.
    
        """
        # Explicit error message when one accidentally uses a Variable as
        # input (typical mistake, especially with shared variables).
        if isinstance(data, Variable):
            raise TypeError(
                "Expected an array-like object, but found a Variable: "
                "maybe you are trying to call a function on a (possibly "
                "shared) variable instead of a numeric array?"
            )
    
        if (type(data) is np.ndarray) and (data.dtype == self.numpy_dtype):
            if data.dtype.num != self.numpy_dtype.num:
                data = theano._asarray(data, dtype=self.dtype)
            # -- now fall through to ndim check
        elif (type(data) is np.memmap) and (data.dtype == self.numpy_dtype):
            # numpy.memmap is a "safe" subclass of ndarray,
            # so we can use it wherever we expect a base ndarray.
            # however, casting it would defeat the purpose of not
            # loading the whole data into memory
            pass
        elif strict:
            # If any of the two conditions above was not met,
            # we raise a meaningful TypeError.
            if not (type(data) is np.ndarray):
                raise TypeError(
                    "%s expected a ndarray object." % self, data, type(data)
                )
            if data.dtype != self.numpy_dtype:
                raise TypeError(
                    ("%s expected a ndarray object with " "dtype = %s (got %s).")
                    % (self, self.numpy_dtype, data.dtype)
                )
            raise AssertionError("This point should never be reached.")
        else:
            if allow_downcast:
                # Convert to self.dtype, regardless of the type of data
                data = theano._asarray(data, dtype=self.dtype)
                # TODO: consider to pad shape with ones to make it consistent
                # with self.broadcastable... like vector->row type thing
            else:
                if isinstance(data, np.ndarray):
                    # Check if self.dtype can accurately represent data
                    # (do not try to convert the data)
                    up_dtype = scal.upcast(self.dtype, data.dtype)
                    if up_dtype == self.dtype:
                        # Bug in the following line when data is a
                        # scalar array, see
                        # http://projects.scipy.org/numpy/ticket/1611
                        # data = data.astype(self.dtype)
                        data = theano._asarray(data, dtype=self.dtype)
                    if up_dtype != self.dtype:
                        err_msg = (
                            "%s cannot store a value of dtype %s without "
                            "risking loss of precision. If you do not mind "
                            "this loss, you can: "
                            "1) explicitly cast your data to %s, or "
                            '2) set "allow_input_downcast=True" when calling '
                            '"function". Value: "%s"'
                            % (self, data.dtype, self.dtype, repr(data))
                        )
>                       raise TypeError(err_msg)
E                       TypeError: TensorType(bool, vector) cannot store a value of dtype int64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to bool, or 2) set "allow_input_downcast=True" when calling "function". Value: "array([1, 1, 1, 0, 0, 0, 1, 1, 1, 0])"

Any ideas as to what this one is about? It seems like some sort of environment variable problem, but I'm not sure how it's arising in this context

@michaelosthege
Copy link
Member Author

I see no connection, but it's hard to tell without the diff. Can you open a Draft PR?

@StephenHogg
Copy link
Contributor

Sure - PR open

@michaelosthege
Copy link
Member Author

@StephenHogg you can now merge the latest changes from master into your branch (4116) to fix that unrelated (locally) failing test:

git remote add pymcdevs https://github.com/pymc-devs/pymc3
git fetch pymcdevs master
git checkout 4116
git merge master

@StephenHogg
Copy link
Contributor

Suggest closing the issue now that #4211 is merged

@twiecki twiecki closed this as completed Nov 27, 2020
@twiecki
Copy link
Member

twiecki commented Nov 27, 2020

@StephenHogg Small tip: if in your PR you would have written "Closes #4116" this would have happened automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants