Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not reference model variables in GP internals #6883

Open
ricardoV94 opened this issue Aug 30, 2023 · 8 comments
Open

Do not reference model variables in GP internals #6883

ricardoV94 opened this issue Aug 30, 2023 · 8 comments
Labels
GP Gaussian Process

Comments

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 30, 2023

Description

The model fgraph functionality (used by do and observe) doesn't work with GP variables because those are not represented as pure PyTensor objects and make use of variables from the old model in the instance scope.

@ricardoV94 ricardoV94 added the GP Gaussian Process label Aug 30, 2023
@ferrine
Copy link
Member

ferrine commented Sep 12, 2023

Can we refactor the code in the way it works correctly?

@ricardoV94
Copy link
Member Author

It's not easy, we need to figure out how to represent GPs as PyTensor variables/Ops.

Some very rough (and likely bad) ideas in https://github.com/pymc-devs/pymc-experimental/blob/2084e167486a42b313abcec77958d3a24e001ae3/pymc_experimental/gp/GPs%20from%20scratch.ipynb

@ricardoV94
Copy link
Member Author

ricardoV94 commented Oct 5, 2023

There's actually no GP object saved in the model, so no good way to raise an error.

The problem is that GP objects hold references to model variables, so if you tried to do a gp.conditional after cloning a model (which happens whenever you used do/observe) you would be re-introducing the old model variables, instead of referring to cloned ones.

import pymc as pm
import numpy as np
from pytensor.graph import ancestors
from pymc.model.fgraph import clone_model

with pm.Model() as model:
    ell = pm.Gamma("ell", alpha=2, beta=1)
    cov = 2 * pm.gp.cov.ExpQuad(1, ell)
    gp = pm.gp.Latent(cov_func=cov)
    f = gp.prior("f", X=np.arange(10)[:, None])
    pm.Normal("y", f * 2)

assert ell in ancestors(f)

cloned_model = clone_model(model)
with cloned_model:
    f_cloned = gp.prior("f_cloned", X=np.arange(10)[:, None])
    pm.Normal("y_cloned", f_cloned * 2)

assert model["ell"] in ancestors(f_cloned)  # bad
assert cloned_model["ell"] not in ancestors(f_cloned)  # bad
cloned_model.logp()  # ValueError: Random variables detected in the logp graph: {ell}.

I see two ways of fixing this:

  1. Use an approach more like the statespace module in pymc-experimental, which retrieves all variables by name from the model context when those are needed (is that correct @jessegrabowski?)
  2. Try to define GPs as PyTensor objects/operations (this is not trivial, but could be interesting in the long term). See link above.

@ricardoV94 ricardoV94 changed the title Raise error when converting model with GPs to fgraph Do not reference model variables in GPs internals Oct 5, 2023
@ricardoV94 ricardoV94 changed the title Do not reference model variables in GPs internals Do not reference model variables in GP internals Oct 5, 2023
@jessegrabowski
Copy link
Member

Yes, the state space models have a list of required parameters, and it does name matching to model RVs when you "build" the model.

The challenge with that approach is that the statespace model needs knows ahead of time what variables to look for. It ends up being really inflexible, because users have to conform to the demands of the model, rather than the (preferable) other way around.

The core of the GP is just the conditioning operation. I think what could be interesting is to define something like pm.condition(X, y), which would ask the RV X to return a conditional version of itself, given data y. I guess this isn't defined in most cases, so it could just raise on those. But when called on a MvNormal(mean_func, cov_func), it would give back the GP as a symbolic random variable that could be worked with "as usual"?

@bwengals
Copy link
Contributor

Breaking the do-operator puts a bit more urgency on this idea now

@bwengals
Copy link
Contributor

I'm wondering if rewriting the GP lib as pytensor ops is the only way forward here. It's probably easier to make do and observe work with existing GPs than the other way around (just going off lines of code). Though there are other interesting things that I think you'd gain from a GPs in PyTensor.

which retrieves all variables by name from the model context

It wouldn't be hard to store GP objects in the model. It's not as slick, but could they be registered and then reproduced the same way random variables and deterministics are?

I think designing a hierarchical normal model Op would be a simpler, univariate, version of what a GP Op would need to do:

  • gp.conditional -> predicting on new groups
  • gp.prior -> Choosing centered or non-centered

I dunno what an Op-ified version of a GP would look like, but a hierarchical normal model that follows the current GP API would look like this:

mu = pm.Normal("mu", ...)
sigma = pm.HalfNormal("sigma", ...)
hier = pm.Hierarchical("x", mu=mu, sigma=sigma) # like pm.gp.cov.Latent
x = hier.prior("x", dims="group")   # (optionally) non-center under the hood, `gp.prior` does that

and then for prediction

xnew = hier.new_group("xnew") # like gp.conditional

Not proposing we implement that like that, just to show the similarity for why I'm thinking a PyTensor version of a hierarchical normal model might be a good first step towards a PyTensor version of a GP.

@ricardoV94
Copy link
Member Author

ricardoV94 commented Dec 10, 2023

It wouldn't be hard to store GP objects in the model. It's not as slick, but could they be registered and then reproduced the same way random variables and deterministics are?

They could but that adds another layer of complexity for all Model methods and fgraph manipulation functions that now have to think about yet another kind of Model variable

The is is the existing logic which is already pretty complex:

pymc/pymc/model/fgraph.py

Lines 116 to 277 in 01ddcb8

def fgraph_from_model(
model: Model, inlined_views=False
) -> Tuple[FunctionGraph, Dict[Variable, Variable]]:
"""Convert Model to FunctionGraph.
See: model_from_fgraph
Parameters
----------
model: PyMC model
inlined_views: bool, default False
Whether "view" variables (Deterministics and Data) should be inlined among RVs in the fgraph,
or show up as separate branches.
Returns
-------
fgraph: FunctionGraph
FunctionGraph that includes a copy of model variables, wrapped in dummy `ModelVar` Ops.
It should be possible to reconstruct a valid PyMC model using `model_from_fgraph`.
memo: Dict
A dictionary mapping original model variables to the equivalent nodes in the fgraph.
"""
if any(v is not None for v in model.rvs_to_initial_values.values()):
raise NotImplementedError("Cannot convert models with non-default initial_values")
if model.parent is not None:
raise ValueError(
"Nested sub-models cannot be converted to fgraph. Convert the parent model instead"
)
# Collect PyTensor variables
rvs_to_values = model.rvs_to_values
rvs = list(rvs_to_values.keys())
free_rvs = model.free_RVs
observed_rvs = model.observed_RVs
potentials = model.potentials
named_vars = model.named_vars.values()
# We copy Deterministics (Identity Op) so that they don't show in between "main" variables
# We later remove these Identity Ops when we have a Deterministic ModelVar Op as a separator
old_deterministics = model.deterministics
deterministics = [det if inlined_views else det.copy(det.name) for det in old_deterministics]
# Value variables (we also have to decide whether to inline named ones)
old_value_vars = list(rvs_to_values.values())
unnamed_value_vars = [val for val in old_value_vars if val not in named_vars]
named_value_vars = [
val if inlined_views else val.copy(val.name) for val in old_value_vars if val in named_vars
]
value_vars = old_value_vars.copy()
if inlined_views:
# In this case we want to use the named_value_vars as the value_vars in RVs
for named_val in named_value_vars:
idx = value_vars.index(named_val)
value_vars[idx] = named_val
# Other variables that are in named_vars but are not any of the categories above
# E.g., MutableData, ConstantData, _dim_lengths
# We use the same trick as deterministics!
accounted_for = set(free_rvs + observed_rvs + potentials + old_deterministics + old_value_vars)
other_named_vars = [
var if inlined_views else var.copy(var.name)
for var in named_vars
if var not in accounted_for
]
model_vars = (
rvs + potentials + deterministics + other_named_vars + named_value_vars + unnamed_value_vars
)
memo = {}
# Replace the following shared variables in the model:
# 1. RNGs
# 2. MutableData (could increase memory usage significantly)
# 3. Mutable coords dim lengths
shared_vars_to_copy = find_rng_nodes(model_vars)
shared_vars_to_copy += [v for v in model.dim_lengths.values() if isinstance(v, SharedVariable)]
shared_vars_to_copy += [v for v in model.named_vars.values() if isinstance(v, SharedVariable)]
for var in shared_vars_to_copy:
# FIXME: ScalarSharedVariables are converted to 0d numpy arrays internally,
# so calling shared(shared(5).get_value()) returns a different type: TensorSharedVariables!
# Furthermore, PyMC silently ignores mutable dim changes that are SharedTensorVariables...
# https://github.com/pymc-devs/pytensor/issues/396
if isinstance(var, ScalarSharedVariable):
new_var = shared(var.get_value(borrow=False).item())
else:
new_var = shared(var.get_value(borrow=False))
assert new_var.type == var.type
new_var.name = var.name
new_var.tag = copy(var.tag)
# We can replace input variables by placing them in the memo
memo[var] = new_var
fgraph = FunctionGraph(
outputs=model_vars,
clone=True,
memo=memo,
copy_orphans=True,
copy_inputs=True,
)
# Copy model meta-info to fgraph
fgraph._coords = model._coords.copy()
fgraph._dim_lengths = {k: memo.get(v, v) for k, v in model._dim_lengths.items()}
rvs_to_transforms = model.rvs_to_transforms
named_vars_to_dims = model.named_vars_to_dims
# Introduce dummy `ModelVar` Ops
free_rvs_to_transforms = {memo[k]: tr for k, tr in rvs_to_transforms.items()}
free_rvs_to_values = {memo[k]: memo[v] for k, v in zip(rvs, value_vars) if k in free_rvs}
observed_rvs_to_values = {
memo[k]: memo[v] for k, v in zip(rvs, value_vars) if k in observed_rvs
}
potentials = [memo[k] for k in potentials]
deterministics = [memo[k] for k in deterministics]
named_vars = [memo[k] for k in other_named_vars + named_value_vars]
vars = fgraph.outputs
new_vars = []
for var in vars:
dims = named_vars_to_dims.get(var.name, ())
if var in free_rvs_to_values:
new_var = model_free_rv(
var, free_rvs_to_values[var], free_rvs_to_transforms[var], *dims
)
elif var in observed_rvs_to_values:
new_var = model_observed_rv(var, observed_rvs_to_values[var], *dims)
elif var in potentials:
new_var = model_potential(var, *dims)
elif var in deterministics:
new_var = model_deterministic(var, *dims)
elif var in named_vars:
new_var = model_named(var, *dims)
else:
# Unnamed value variables
new_var = var
new_vars.append(new_var)
replacements = tuple(zip(vars, new_vars))
toposort_replace(fgraph, replacements, reverse=True)
# Reference model vars in memo
inverse_memo = {v: k for k, v in memo.items()}
for var, model_var in replacements:
if not inlined_views and (
model_var.owner and isinstance(model_var.owner.op, (ModelDeterministic, ModelNamed))
):
# Ignore extra identity that will be removed at the end
var = var.owner.inputs[0]
original_var = inverse_memo[var]
memo[original_var] = model_var
# Remove the last outputs corresponding to unnamed value variables, now that they are graph inputs
first_idx_to_remove = len(fgraph.outputs) - len(unnamed_value_vars)
for _ in unnamed_value_vars:
fgraph.remove_output(first_idx_to_remove)
# Now that we have Deterministic dummy Ops, we remove the noisy `Identity`s from the graph
remove_identity_rewrite.apply(fgraph)
return fgraph, memo

We also used to have special rules for Minibatch variables and it was quite cleaner to just represent those as vanilla PyTensor objects: #6523

As a first approach I would just to it with name approach like statespaces models does, so store the name of x/mu/sigma instead of the variable itself and retrieve it from the model context via model[name] (to ensure they are always there they need to be model variables, data or deterministics).

Representing stuff as PyTensor objects shouldn't be that hard, but it definitely sounds like it will take a while longer to find the right representation and it would be nice to fix the incompatible behavior we have right now.

@ricardoV94
Copy link
Member Author

@bwengals adding a model.gps to the model doesn't solve the problem. You still need to figure out how to replace any references of the old model variables (such as parameters of the covariance) with references of the new model.

It still boils down to the same problem that gp methods are stateful, and not pure I/O functions of the model variables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GP Gaussian Process
Projects
None yet
Development

No branches or pull requests

4 participants