Skip to content

Commit

Permalink
Rename Theano to Aesara
Browse files Browse the repository at this point in the history
  • Loading branch information
brandonwillard committed Feb 13, 2021
1 parent 0fd439f commit f7b6b00
Show file tree
Hide file tree
Showing 107 changed files with 2,515 additions and 2,461 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ If you have questions about a specific use case, or you are not sure whether thi
## Versions and main components

* PyMC3 Version:
* Theano Version:
* Aesara Version:
* Python Version:
* Operating system:
* How did you install PyMC3: (conda/pip)
8 changes: 4 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ Check out the `getting started guide <http://docs.pymc.io/notebooks/getting_star
using Binder!
For questions on PyMC3, head on over to our `PyMC Discourse <https://discourse.pymc.io/>`__ forum.

The future of PyMC3 & Theano
The future of PyMC3 & Aesara
============================

There have been many questions and uncertainty around the future of PyMC3 since Theano
There have been many questions and uncertainty around the future of PyMC3 since Aesara
stopped getting developed by the original authors, and we started experiments with PyMC4.

We are happy to announce that PyMC3 on Theano (which we are `developing further <https://github.com/pymc-devs/Theano-PyMC>`__)
We are happy to announce that PyMC3 on Aesara (which we are `developing further <https://github.com/pymc-devs/aesara>`__)
with a new JAX backend is the future. PyMC4 will not be developed further.

See the `full announcement <https://pymc-devs.medium.com/the-future-of-pymc3-or-theano-is-dead-long-live-theano-d8005f8a0e9b>`__
Expand All @@ -39,7 +39,7 @@ Features
- **Variational inference**: `ADVI <http://www.jmlr.org/papers/v18/16-107.html>`__
for fast approximate posterior estimation as well as mini-batch ADVI
for large data sets.
- Relies on `Theano-PyMC <https://theano-pymc.readthedocs.io/en/latest/>`__ which provides:
- Relies on `Aesara <https://aesara.readthedocs.io/en/latest/>`__ which provides:
* Computation optimization and dynamic C or JAX compilation
* Numpy broadcasting and advanced indexing
* Linear algebra operators
Expand Down
10 changes: 5 additions & 5 deletions benchmarks/benchmarks/benchmarks.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@
import time
import timeit

import aesara
import aesara.tensor as aet
import arviz as az
import numpy as np
import pandas as pd
import theano
import theano.tensor as tt

import pymc3 as pm

Expand All @@ -27,7 +27,7 @@ def glm_hierarchical_model(random_seed=123):
"""Sample glm hierarchical model to use in benchmarks"""
np.random.seed(random_seed)
data = pd.read_csv(pm.get_data("radon.csv"))
data["log_radon"] = data["log_radon"].astype(theano.config.floatX)
data["log_radon"] = data["log_radon"].astype(aesara.config.floatX)
county_idx = data.county_code.values

n_counties = len(data.county.unique())
Expand Down Expand Up @@ -61,8 +61,8 @@ def mixture_model(random_seed=1234):
mu = pm.Normal("mu", mu=0.0, sd=10.0, shape=w_true.shape)
enforce_order = pm.Potential(
"enforce_order",
tt.switch(mu[0] - mu[1] <= 0, 0.0, -np.inf)
+ tt.switch(mu[1] - mu[2] <= 0, 0.0, -np.inf),
aet.switch(mu[0] - mu[1] <= 0, 0.0, -np.inf)
+ aet.switch(mu[1] - mu[2] <= 0, 0.0, -np.inf),
)
tau = pm.Gamma("tau", alpha=1.0, beta=1.0, shape=w_true.shape)
pm.NormalMixture("x_obs", w=w, mu=mu, tau=tau, observed=x)
Expand Down
54 changes: 27 additions & 27 deletions docs/source/Advanced_usage_of_Theano_in_PyMC3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,20 @@
_referenced in docs/source/notebooks/table_of_contents_tutorials.js
=================================
Advanced usage of Theano in PyMC3
Advanced usage of Aesara in PyMC3
=================================

Using shared variables
======================

Shared variables allow us to use values in theano functions that are
Shared variables allow us to use values in aesara functions that are
not considered an input to the function, but can still be changed
later. They are very similar to global variables in may ways::

a = tt.scalar('a')
a = aet.scalar('a')
# Create a new shared variable with initial value of 0.1
b = theano.shared(0.1)
func = theano.function([a], a * b)
b = aesara.shared(0.1)
func = aesara.function([a], a * b)
assert func(2.) == 0.2

b.set_value(10.)
Expand All @@ -34,7 +34,7 @@ be time consuming if the number of datasets is large)::
true_mu = [np.random.randn() for _ in range(10)]
observed_data = [mu + np.random.randn(20) for mu in true_mu]

data = theano.shared(observed_data[0])
data = aesara.shared(observed_data[0])
with pm.Model() as model:
mu = pm.Normal('mu', 0, 10)
pm.Normal('y', mu=mu, sigma=1, observed=data)
Expand All @@ -55,7 +55,7 @@ variable for our observations::
x = np.random.randn(100)
y = x > 0

x_shared = theano.shared(x)
x_shared = aesara.shared(x)

with pm.Model() as model:
coeff = pm.Normal('x', mu=0, sigma=1)
Expand All @@ -74,10 +74,10 @@ not possible to change the shape of a shared variable if that would
also change the shape of one of the variables.


Writing custom Theano Ops
Writing custom Aesara Ops
=========================

While Theano includes a wide range of operations, there are cases where
While Aesara includes a wide range of operations, there are cases where
it makes sense to write your own. But before doing this it is a good
idea to think hard if it is actually necessary. Especially if you want
to use algorithms that need gradient information — this includes NUTS and
Expand All @@ -87,22 +87,22 @@ debugging skills for the gradients.

Good reasons for defining a custom Op might be the following:

- You require an operation that is not available in Theano and can't
be build up out of existing Theano operations. This could for example
- You require an operation that is not available in Aesara and can't
be build up out of existing Aesara operations. This could for example
include models where you need to solve differential equations or
integrals, or find a root or minimum of a function that depends
on your parameters.
- You want to connect your PyMC3 model to some existing external code.
- After carefully considering different parametrizations and a lot
of profiling your model is still too slow, but you know of a faster
way to compute the gradient than what theano is doing. This faster
way to compute the gradient than what aesara is doing. This faster
way might be anything from clever maths to using more hardware.
There is nothing stopping anyone from using a cluster via MPI in
a custom node, if a part of the gradient computation is slow enough
and sufficiently parallelizable to make the cost worth it.
We would definitely like to hear about any such examples.

Theano has extensive `documentation, <http://deeplearning.net/software/theano/extending/index.html>`_
Aesara has extensive `documentation, <https://aesara.readthedocs.io/en/latest/extending/index.html>`_
about how to write new Ops.


Expand Down Expand Up @@ -158,7 +158,7 @@ We can now use `scipy.optimize.newton` to find the root::
def mu_from_theta(theta):
return optimize.newton(func, 1, fprime=jac, args=(theta,))

We could wrap `mu_from_theta` with `theano.compile.ops.as_op` and use gradient-free
We could wrap `mu_from_theta` with `aesara.compile.ops.as_op` and use gradient-free
methods like Metropolis, but to get NUTS and ADVI working, we also
need to define the derivative of `mu_from_theta`. We can find this
derivative using the implicit function theorem, or equivalently we
Expand All @@ -181,16 +181,16 @@ We get
\frac{d}{d\theta}\mu(\theta)
= - \frac{\mu(\theta)^2}{1 + \theta\mu(\theta) + e^{-\theta\mu(\theta)}}
Now, we use this to define a theano op, that also computes the gradient::
Now, we use this to define a aesara op, that also computes the gradient::

import theano
import theano.tensor as tt
import theano.tests.unittest_tools
from theano.graph.op import Op
import aesara
import aesara.tensor as aet
import aesara.tests.unittest_tools
from aesara.graph.op import Op

class MuFromTheta(Op):
itypes = [tt.dscalar]
otypes = [tt.dscalar]
itypes = [aet.dscalar]
otypes = [aet.dscalar]

def perform(self, node, inputs, outputs):
theta, = inputs
Expand All @@ -201,23 +201,23 @@ Now, we use this to define a theano op, that also computes the gradient::
theta, = inputs
mu = self(theta)
thetamu = theta * mu
return [- g[0] * mu ** 2 / (1 + thetamu + tt.exp(-thetamu))]
return [- g[0] * mu ** 2 / (1 + thetamu + aet.exp(-thetamu))]

If you value your sanity, always check that the gradient is ok::

theano.tests.unittest_tools.verify_grad(MuFromTheta(), [np.array(0.2)])
theano.tests.unittest_tools.verify_grad(MuFromTheta(), [np.array(1e-5)])
theano.tests.unittest_tools.verify_grad(MuFromTheta(), [np.array(1e5)])
aesara.gradient.verify_grad(MuFromTheta(), [np.array(0.2)])
aesara.gradient.verify_grad(MuFromTheta(), [np.array(1e-5)])
aesara.gradient.verify_grad(MuFromTheta(), [np.array(1e5)])

We can now define our model using this new op::

import pymc3 as pm

tt_mu_from_theta = MuFromTheta()
aet_mu_from_theta = MuFromTheta()

with pm.Model() as model:
theta = pm.HalfNormal('theta', sigma=1)
mu = pm.Deterministic('mu', tt_mu_from_theta(theta))
mu = pm.Deterministic('mu', aet_mu_from_theta(theta))
pm.Normal('y', mu=mu, sigma=0.1, observed=[0.2, 0.21, 0.3])

trace = pm.sample()
4 changes: 2 additions & 2 deletions docs/source/Gaussian_Processes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ which allows users to combine covariance functions into new ones, for example:

After the covariance function is defined, it is now a function that is
evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). Since
PyMC3 is built on top of Theano, it is relatively easy to define and experiment
PyMC3 is built on top of Aesara, it is relatively easy to define and experiment
with non-standard covariance and mean functons. For more information check out
the tutorial on covariance functions.

Expand Down Expand Up @@ -158,7 +158,7 @@ other type of random variable. The first argument is the name of the random
variable representing the function we are placing the prior over.
The second argument is the inputs to the function that the prior is over,
:code:`X`. The inputs are usually known and present in the data, but they can
also be PyMC3 random variables. If the inputs are a Theano tensor or a
also be PyMC3 random variables. If the inputs are a Aesara tensor or a
PyMC3 random variable, the :code:`shape` needs to be given.

Usually at this point, inference is performed on the model. The
Expand Down
2 changes: 1 addition & 1 deletion docs/source/Probability_Distributions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ A variable requires at least a ``name`` argument, and zero or more model paramet

p = pm.Beta('p', 1, 1, shape=(3, 3))

Probability distributions are all subclasses of ``Distribution``, which in turn has two major subclasses: ``Discrete`` and ``Continuous``. In terms of data types, a ``Continuous`` random variable is given whichever floating point type is defined by ``theano.config.floatX``, while ``Discrete`` variables are given ``int16`` types when ``theano.config.floatX`` is ``float32``, and ``int64`` otherwise.
Probability distributions are all subclasses of ``Distribution``, which in turn has two major subclasses: ``Discrete`` and ``Continuous``. In terms of data types, a ``Continuous`` random variable is given whichever floating point type is defined by ``aesara.config.floatX``, while ``Discrete`` variables are given ``int16`` types when ``aesara.config.floatX`` is ``float32``, and ``int64`` otherwise.

All distributions in ``pm.distributions`` will have two important methods: ``random()`` and ``logp()`` with the following signatures:

Expand Down
Loading

0 comments on commit f7b6b00

Please sign in to comment.