Allow partial imputation with `pm.observe` #7204

ricardoV94 · 2024-03-20T17:57:51Z

One tricky thing will be to work in conjunction with #6932

Partial imputation is a model transformation that happens usually at model.register_rv and creates two model RVs (the observed and unobserved components) that are the joined together in a deterministic (with the original name) to look like a single entity in case the variable is used downstream elsewhere (or just so it shows in the trace)

We could read nan in constant values and do the same automatic imputation as observe does.

Besides this, and what I think is a better API, we could add a mask kwarg, that specifies which subset dimensions of the variable are to be observed, and then trigger the same kind of model transformation that observe=[x, np.nan]) does. This could be done without a warning because it's explicit.

The second approach as the benefit that the mask can be a shared variable (i.e., pm.Data) that can be updated later. See #6626

The text was updated successfully, but these errors were encountered:

ricardoV94 · 2024-07-26T10:33:55Z

Snippet from @wd60622 for how to reveal the missing functionality:

import pymc as pm
import numpy as np

import matplotlib.pyplot as plt

import arviz as az


def normal_declaration(data):
    coords = {
        "idx": range(len(data)),
    }
    with pm.Model(coords=coords) as model:
        pm.Normal(
            "obs",
            mu=pm.Normal("mu"),
            sigma=pm.HalfNormal("sigma"),
            observed=data,
            dims="idx",
        )

    return model


def work_around(data):
    coords = {
        "idx": range(len(data)),
    }
    with pm.Model(coords=coords) as generative_model:
        pm.Normal(
            "obs",
            mu=pm.Normal("mu"),
            sigma=pm.HalfNormal("sigma"),
            dims="idx",
        )

    return pm.observe(generative_model, {"obs": data})

seed = sum(map(ord, "impute observe bug"))
rng = np.random.default_rng(seed)

mu = 5
sigma = 0.25

data = rng.normal(mu, sigma, size=250)

missing_idx = rng.choice([True, False, False, False], size=data.shape)
data[missing_idx] = np.nan

with normal_declaration(data):
    idata = pm.sample()

with work_around(data):
    # SamplingError: Initial evaluation of model at starting point failed!
    idata_workaround = pm.sample()

ricardoV94 mentioned this issue Jul 26, 2024

BUG: automatic imputation with pm.observe #7430

Closed

ricardoV94 added enhancements model labels Nov 7, 2024

jonsedar mentioned this issue Nov 9, 2024

New example for auto-imputation aka handle missing values with a simple dataset and full workflow pymc-devs/pymc-examples#721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Allow partial imputation with `pm.observe` #7204

Allow partial imputation with `pm.observe` #7204

ricardoV94 commented Mar 20, 2024 •

edited

Loading

ricardoV94 commented Jul 26, 2024

Uh oh!

Uh oh!

Allow partial imputation with pm.observe #7204

Allow partial imputation with pm.observe #7204

Comments

ricardoV94 commented Mar 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ricardoV94 commented Jul 26, 2024

Uh oh!

Allow partial imputation with `pm.observe` #7204

Allow partial imputation with `pm.observe` #7204

ricardoV94 commented Mar 20, 2024 •

edited

Loading