Skip to content

BUG: pytensor.shared does not respect masked missing values #258

Closed
@kamicollo

Description

@kamicollo

Describe the issue:

This is related to pymc-devs/pymc#6626 - it seems that pytensor.shared() (and thus pm.MutableData() does not respect masked missing values. They get unmasked in the process, which is especially problematic if the missing data was encoded as an actual number.

Reproducable code example:

import pymc as pm
import pytensor as pt
import arviz as az

#basic example:
X = np.ma.masked_greater(np.array([1, 2, 3, 4]), 3)
print(pt.shared(X).container.value)  # (returns [1,2,3,4])

#example where inference is wrong in pyMC as a result:

real_X = np.random.default_rng().normal(size=1000)
Y = np.random.default_rng().normal(loc=3 * real_X, scale=0.1)
X = real_X.copy()
X[0:10] = 999
masked_X = np.ma.masked_where(X == 999, X)


with pm.Model() as m:
    β = pm.Normal("β", 0, 1)
    σ = pm.Exponential("σ", 1)
    X = pm.Normal("X", 0, 1, observed = pm.MutableData("masked_X", masked_X))    
    pm.Normal("Y", pm.math.dot(X, β), σ, observed=Y) 
    trace = pm.sample()


az.summary(trace)
# yields β == 0 which is incorrect

Error message:

No response

PyTensor version information:

2.10.1

Context for the issue:

This issue fails silently and can lead to incorrect inference results by pyMC users.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions