Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow partial imputation with pm.observe #7204

Open
ricardoV94 opened this issue Mar 20, 2024 · 1 comment
Open

Allow partial imputation with pm.observe #7204

ricardoV94 opened this issue Mar 20, 2024 · 1 comment

Comments

@ricardoV94
Copy link
Member

ricardoV94 commented Mar 20, 2024

One tricky thing will be to work in conjunction with #6932

Partial imputation is a model transformation that happens usually at model.register_rv and creates two model RVs (the observed and unobserved components) that are the joined together in a deterministic (with the original name) to look like a single entity in case the variable is used downstream elsewhere (or just so it shows in the trace)

We could read nan in constant values and do the same automatic imputation as observe does.

Besides this, and what I think is a better API, we could add a mask kwarg, that specifies which subset dimensions of the variable are to be observed, and then trigger the same kind of model transformation that observe=[x, np.nan]) does. This could be done without a warning because it's explicit.

The second approach as the benefit that the mask can be a shared variable (i.e., pm.Data) that can be updated later. See #6626

@ricardoV94
Copy link
Member Author

Snippet from @wd60622 for how to reveal the missing functionality:

import pymc as pm
import numpy as np

import matplotlib.pyplot as plt

import arviz as az


def normal_declaration(data):
    coords = {
        "idx": range(len(data)),
    }
    with pm.Model(coords=coords) as model:
        pm.Normal(
            "obs",
            mu=pm.Normal("mu"),
            sigma=pm.HalfNormal("sigma"),
            observed=data,
            dims="idx",
        )

    return model


def work_around(data):
    coords = {
        "idx": range(len(data)),
    }
    with pm.Model(coords=coords) as generative_model:
        pm.Normal(
            "obs",
            mu=pm.Normal("mu"),
            sigma=pm.HalfNormal("sigma"),
            dims="idx",
        )

    return pm.observe(generative_model, {"obs": data})

seed = sum(map(ord, "impute observe bug"))
rng = np.random.default_rng(seed)

mu = 5
sigma = 0.25

data = rng.normal(mu, sigma, size=250)

missing_idx = rng.choice([True, False, False, False], size=data.shape)
data[missing_idx] = np.nan

with normal_declaration(data):
    idata = pm.sample()

with work_around(data):
    # SamplingError: Initial evaluation of model at starting point failed!
    idata_workaround = pm.sample()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant