Closed
Description
Describe the issue:
This is related to pymc-devs/pymc#6626 - it seems that pytensor.shared()
(and thus pm.MutableData()
does not respect masked missing values. They get unmasked in the process, which is especially problematic if the missing data was encoded as an actual number.
Reproducable code example:
import pymc as pm
import pytensor as pt
import arviz as az
#basic example:
X = np.ma.masked_greater(np.array([1, 2, 3, 4]), 3)
print(pt.shared(X).container.value) # (returns [1,2,3,4])
#example where inference is wrong in pyMC as a result:
real_X = np.random.default_rng().normal(size=1000)
Y = np.random.default_rng().normal(loc=3 * real_X, scale=0.1)
X = real_X.copy()
X[0:10] = 999
masked_X = np.ma.masked_where(X == 999, X)
with pm.Model() as m:
β = pm.Normal("β", 0, 1)
σ = pm.Exponential("σ", 1)
X = pm.Normal("X", 0, 1, observed = pm.MutableData("masked_X", masked_X))
pm.Normal("Y", pm.math.dot(X, β), σ, observed=Y)
trace = pm.sample()
az.summary(trace)
# yields β == 0 which is incorrect
Error message:
No response
PyTensor version information:
2.10.1
Context for the issue:
This issue fails silently and can lead to incorrect inference results by pyMC users.