-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use RandomVariables in PyMC models #4440
Conversation
eb1e01f
to
8938add
Compare
The general goal behind these initial changes was to start tracking The entry point for Changes to
|
Hi @brandonwillard |
That's a tough question, because the real issue is the underlying design changes. Basically, I don't want to go through a lot of refactoring only to have it scrapped. At some point soon, we should be able to say whether or not we want to stick with some of these changes (e.g. the Also, I should probably put this branch in the |
I don't know if this helps at all in deciding how to implement things in the end, but is there a straightforward way to adapt other PyMC3 "meta distribution" constructors (for a lack of a better word) under this framework? Stuff like Truncated and Censored distributions, Distributions for missing observations, Mixtures, RandomWalk, DensityDist, Potentials... Are any of these particularly challenging to adapt? Could these influence the final design for the RandomVariables? Just trying to help get the discussion going on. |
Those are exactly the kinds of things we need to consider. Right now, I don't see any reason why the current approach wouldn't work in those cases; however, like many other This work is inline with the general theme of these PyMC3 changes: i.e. move more PyMC3 logic into |
700322b
to
3d4e6b4
Compare
The most recent commit starts to address one of the biggest issues in PyMC3: its current dependence on concrete (i.e. non-symbolic) shapes. More specifically, the After some review, it seems as though most of the logic that relies on The last commit effectively removes the need for The basics of import numpy as np
from pymc3.blocking import DictToArrayBijection
dpoint = {"v1": np.random.normal(size=(4, 3)).astype("float32"),
"v2": np.random.normal(size=(2, 4)).astype("float64"),
"v3": np.random.normal(size=(2, 4)).astype("float64")}
dtab = DictToArrayBijection([k for k in dpoint.keys()],
[v.shape for v in dpoint.values()],
[v.dtype for v in dpoint.values()])
# This simply ravels/flattens all the values in the dictionary and concatenates them
apoint = dtab.map(dpoint)
assert np.array_equal(np.concatenate([v.ravel() for v in dpoint.values()]), apoint)
# This reverts the concatenated values back to a dictionary
# (the result should be equal to `dpoint`)
dpoint_rmap = dtab.rmap(apoint)
assert dpoint_rmap.keys() == dpoint.keys()
assert all(np.array_equal(a, b) for a, b in zip(dpoint.values(), dpoint_rmap.values())) I've replaced most uses of If there are any reasons why that shape information is needed outside of a sampling context, please tell me. In the meantime, I'll finish making the changes necessary to sample models. After that, I'll look into variable dimension sampling (e.g. Dirichlet processes). This is a situation that's likely to conflict with other parts of PyMC3 that assume fixed dimensions (e.g. the containers holding sample results, specific step methods, etc.), but, after these changes, it's something that's easy to do in terms of sampling and Aesara mechanics. |
ffb1976
to
a8a24bb
Compare
We should make sure to benchmark this properly so as not to introduce any perf regressions. |
|
||
kinetic = pot.velocity_energy(p_new, v_new) | ||
kinetic = pot.velocity_energy(p_new.data, v_new) | ||
energy = kinetic - logp | ||
|
||
return State(q_new, p_new, v_new, q_new_grad, energy, logp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q_new
and p_new
might need to be packed back into RaveledData
here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be RaveledData
instances on that line (i.e. 117).
Note: I originally used an ndarray
subclass to track point_map_info
, instead of this RaveledData
type. It was more convenient to use, because these lines didn't need to change, but it would also hold onto the mapping information for longer than desired, and that didn't seem good. I was able to make it return normal ndarray
s whenever an operation changed its shape or dtype, but that wouldn't cover everything so I scrapped it.
With the most recent update, it should now be possible sample a simple model with NUTS: import numpy as np
import theano
import theano.tensor as tt
import pymc3 as pm
from theano.printing import debugprint as tt_dprint
theano.config.compute_test_value = "warn"
theano.config.gcc = ""
np.random.seed(2344)
y_sample = np.random.normal(20.2, 0.5, size=(3,))
with pm.Model() as model:
B_rv = pm.Normal("B", 0.0, 10.0)
B_rv.tag.test_value = np.array(0.0)
Y_rv = pm.Normal("Y", B_rv, 1.0, observed=y_sample)
with model:
trace = pm.sample(
draws=100,
chains=1,
cores=1,
tune=10,
compute_convergence_checks=False,
return_inferencedata=False,
) Only 100 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [B]
Sampling 1 chain for 10 tune and 100 draw iterations (10 + 100 draws total) took 0 seconds.
The acceptance probability does not match the target. It is 0.92719043799477, but should be close to 0.8. Try to increase the number of tuning steps. >>> trace.get_values("B").mean()
20.153435618022563
Also, there's a reason for those Next, I'll implement posterior predictive sampling. |
@brandonwillard so is this going to be merged to master as the PR currently indicates, or to a v4 branch? I can devote some time today to helping with your checklist. |
return self.function(*params) | ||
else: | ||
return np.array([self.function(*params) for _ in range(size[0])]) | ||
# size = to_tuple(size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need refactoring then, or does random
just go away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distribution.random
is gone forever; however, some of the code in our Distribution.random
implementations can be repurposed in a RandomVariable.perform
, especially for Distribution
s that do not already have corresponding RandomVariable
subclasses in Aesara.
They're actually about the use of symbolic variables as a means of obtaining concrete shapes (e.g. through test values), and the way those values permeate our logic. Those are the fundamental problems that needs to be addressed in order to move forward. In other words, we need to get concrete shape values from concrete samples of the variables and only those. We obtain those values from our Python-based sampling processes; unfortunately, those sampling process needs initial values with which to start and those have been provided by—or stored in— I've phrased most things in terms of "dynamic" shapes because that's one of the easiest ways to describe how this entanglement is bad. Put another way, properties like In summary, we should essentially outlaw the use of test values for anything except debugging.
Dynamic shape handling isn't just about sampling fancy models—although that is extremely important in itself—it's also about model (object) reuse and the use of out-of-sample data. Our current implicit reliance on test values and fixed-shape logic has led to inflexible code that can barely handle changes to the data in a model. For instance, we currently need to recreate models when the dimensions of their input data change. This is very costly when the models are non-trivial and require extensive C compilation. The worst part is that these limitations are not due to Theano/Aesara; they're due to PyMC3's inflexibility. No, we don't need to make all of our samplers capable of handling dynamic shapes, especially for cases in which the underlying maths/methods needs to be carefully reconsidered; however, we do need to relocate that responsibility to the samplers themselves and update the internal API so that it works in a "shape agnostic" and "shape isolated" fashion. In general, our API (e.g. the
That is a real limitation, but it shouldn't be a limitation of PyMC3. If someone writes a sampler that can handle infinite dimensional distributions or anything like that, PyMC3 should be able to handle it. There's absolutely no reason that the core PyMC3 interface and functions can't handle sample results in the form of ragged arrays/ Stochastic processes with these kinds of dimensional properties are too important to ignore unnecessarily.
Some parts of it, sure, but the core refactoring work that we're talking about in this PR needs to be done here and now. |
I must admit that I don't fully understand all the intracies of the transition to |
Since test values are essentially an attribute of a particular model-fitting run, perhaps they should be passed as a dict to |
@fonnesbeck, good point; I did make the PR target I can put the above bullet points into an issue and we can start putting in PRs to the now updated |
Exactly, and, in both cases, I believe that would serve as an initial values parameter. |
fe6d72d
to
1c11f11
Compare
This value was not representative of its name.
@brandonwillard I have a late question. You mentioned we need to convert the missing random methods to |
In PyMC. I don't see us adding any more |
…d basic dists These changes can be summarized as follows: - `Model` objects now track fully functional Theano graphs that represent all relationships between random and "deterministic" variables. These graphs are called these "sample-space" graphs. `Model.unobserved_RVs`, `Model.basic_RVs`, `Model.free_RVs`, and `Model.observed_RVs` contain these graphs (i.e. `TensorVariable`s), which are generated by `RandomVariable` `Op`s. - For each random variable, there is now a corresponding "measure-space" variable (i.e. a `TensorVariable` that corresponds to said variable in a log-likelihood graph). These variables are available as `rv_var.tag.value_var`, for each random variable `rv_var`, or via `Model.vars`. - Log-likelihood (i.e. measure-space) graphs are now created for individual random variables by way of the generic functions `logpt`, `logcdf`, `logp_nojac`, and `logpt_sum` in `pymc3.distributions`. - Numerous uses of concrete shape information stemming from `Model` objects (e.g. `Model.size`) have been removed/refactored. - Use of `FreeRV`, `ObservedRV`, `MultiObservedRV`, and `TransformedRV` has been deprecated. The information previously stored in these classes is now tracked using `TensorVariable.tag`, and log-likelihoods are generated using the aforementioned `log*` generic functions.
This commit changes `DictToArrayBijection` so that it returns a `RaveledVars` datatype that contains the original raveled and concatenated vector along with the information needed to revert it back to dictionay/variables form. Simply put, the variables-to-single-vector mapping steps have been pushed away from the model object and its symbolic terms and closer to the (sampling) processes that produce and work with `ndarray` values for said terms. In doing so, we can operate under fewer unnecessarily strong assumptions (e.g. that the shapes of each term are static and equal to the initial test points), and let the sampling processes that require vector-only steps deal with any changes in the mappings.
The approach currently being used is rather inefficient. Instead, we should change the `size` parameters for `RandomVariable` terms in the sample-space graph(s) so that they match arrays of the inputs in the trace and the desired number of output samples. This would allow the compiled graph to vectorize operations (when it can) and sample variables more efficiently in large batches.
Classes and functions removed: - PyMC3Variable - ObservedRV - FreeRV - MultiObservedRV - TransformedRV - ArrayOrdering - VarMap - DataMap - _DrawValuesContext - _DrawValuesContextBlocker - is_fast_drawable - _compile_theano_function - vectorize_theano_function - get_vectorize_signature - _draw_value - draw_values - generate_samples - fast_sample_posterior_predictive Modules removed: - pymc3.distributions.posterior_predictive - pymc3.tests.test_random
@@ -1285,7 +1294,7 @@ def __init__(self, groups, model=None): | |||
self._scale_cost_to_minibatch = theano.shared(np.int8(1)) | |||
model = modelcontext(model) | |||
if not model.free_RVs: | |||
raise TypeError("Model does not have FreeRVs") | |||
raise TypeError("Model does not have an free RVs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise TypeError("Model does not have an free RVs") | |
raise TypeError("Model does not have any free RVs") |
"""Convenience attribute to return tag.test_value""" | ||
return self.tag.test_value | ||
|
||
|
||
def pandas_to_array(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but this function could probably benefit from a new name
All right, I'm closing this. Work will continue on the |
@brandonwillard Do you want to open a PR for that too, that way it's easier to track progress. |
This PR prototypes large-scale changes that introduce tighter integration with Theano-PyMC/Aesara and—more specifically—the use of
theano.tensor.random.op.RandomVariable
s.The original PyMC3 tests will likely not pass any time soon, due to the nature and extent of the changes involved.
This PR is currently for discussion and demonstration purposes only. It is not a place to discuss changes unrelated to the use of
RandomVariable
s and Theano-PyMC/Aesara.Distribution
s withRandomVariable
s andlogp
dispatch functionsModel.size
(was previouslyModel.ndim
)ArrayOrdering
DictToArrayBijection