pymc-devs · brandonwillard · Mar 1, 2021 · Feb 12, 2021 · Feb 26, 2021
diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md
@@ -18,7 +18,7 @@ If you have questions about a specific use case, or you are not sure whether thi
 ## Versions and main components
 
 * PyMC3 Version:
-* Theano Version:
+* Aesara Version:
 * Python Version:
 * Operating system:
 * How did you install PyMC3: (conda/pip)
diff --git a/.github/workflows/arviz_compat.yml b/.github/workflows/arviz_compat.yml
@@ -19,7 +19,7 @@ jobs:
     runs-on: ${{ matrix.os }}
     env:
       TEST_SUBSET: ${{ matrix.test-subset }}
-      THEANO_FLAGS: floatX=${{ matrix.floatx }},gcc__cxxflags='-march=native'
+      AESARA_FLAGS: floatX=${{ matrix.floatx }},gcc__cxxflags='-march=native'
     defaults:
       run:
         shell: bash -l {0}

diff --git a/.github/workflows/jaxtests.yml b/.github/workflows/jaxtests.yml
@@ -17,7 +17,7 @@ jobs:
     runs-on: ${{ matrix.os }}
     env:
       TEST_SUBSET: ${{ matrix.test-subset }}
-      THEANO_FLAGS: floatX=${{ matrix.floatx }},gcc__cxxflags='-march=native'
+      AESARA_FLAGS: floatX=${{ matrix.floatx }},gcc__cxxflags='-march=native'
     defaults:
       run:
         shell: bash -l {0}

diff --git a/.github/workflows/pytest.yml b/.github/workflows/pytest.yml
@@ -62,7 +62,7 @@ jobs:
     runs-on: ${{ matrix.os }}
     env:
       TEST_SUBSET: ${{ matrix.test-subset }}
-      THEANO_FLAGS: floatX=${{ matrix.floatx }},gcc__cxxflags='-march=native'
+      AESARA_FLAGS: floatX=${{ matrix.floatx }},gcc__cxxflags='-march=native'
     defaults:
       run:
         shell: bash -l {0}

diff --git a/.github/workflows/windows.yml b/.github/workflows/windows.yml
@@ -17,7 +17,7 @@ jobs:
     runs-on: ${{ matrix.os }}
     env:
       TEST_SUBSET: ${{ matrix.test-subset }}
-      THEANO_FLAGS: floatX=${{ matrix.floatx }},gcc__cxxflags='-march=core2'
+      AESARA_FLAGS: floatX=${{ matrix.floatx }},gcc__cxxflags='-march=core2'
     defaults:
       run:
         shell: bash -l {0}

diff --git a/README.rst b/README.rst
@@ -15,18 +15,6 @@ Check out the `getting started guide <http://docs.pymc.io/notebooks/getting_star
 using Binder!
 For questions on PyMC3, head on over to our `PyMC Discourse <https://discourse.pymc.io/>`__ forum.
 
-The future of PyMC3 & Theano
-============================
-
-There have been many questions and uncertainty around the future of PyMC3 since Theano
-stopped getting developed by the original authors, and we started experiments with PyMC4.
-
-We are happy to announce that PyMC3 on Theano (which we are `developing further <https://github.com/pymc-devs/Theano-PyMC>`__)
-with a new JAX backend is the future. PyMC4 will not be developed further.
-
-See the `full announcement <https://pymc-devs.medium.com/the-future-of-pymc3-or-theano-is-dead-long-live-theano-d8005f8a0e9b>`__
-for more details.
-
 Features
 ========
 
@@ -39,7 +27,7 @@ Features
 -  **Variational inference**: `ADVI <http://www.jmlr.org/papers/v18/16-107.html>`__
    for fast approximate posterior estimation as well as mini-batch ADVI
    for large data sets.
--  Relies on `Theano-PyMC <https://theano-pymc.readthedocs.io/en/latest/>`__ which provides:
+-  Relies on `Aesara <https://aesara.readthedocs.io/en/latest/>`__ which provides:
     *  Computation optimization and dynamic C or JAX compilation
     *  Numpy broadcasting and advanced indexing
     *  Linear algebra operators

diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md
@@ -2,7 +2,7 @@
 
 ## PyMC3 vNext (TBD)
 ### Breaking Changes
-+ ...
+- ⚠ Theano-PyMC has been replaced with Aesara, so all external references to `theano`, `tt`, and `pymc3.theanof` need to be replaced with `aesara`, `aet`, and `pymc3.aesaraf` (see [4471](https://github.com/pymc-devs/pymc3/pull/4471)).
 
 ### New Features
 + ...

diff --git a/benchmarks/benchmarks/benchmarks.py b/benchmarks/benchmarks/benchmarks.py
@@ -14,11 +14,11 @@
 import time
 import timeit
 
+import aesara
+import aesara.tensor as aet
 import arviz as az
 import numpy as np
 import pandas as pd
-import theano
-import theano.tensor as tt
 
 import pymc3 as pm
 
@@ -27,7 +27,7 @@ def glm_hierarchical_model(random_seed=123):
     """Sample glm hierarchical model to use in benchmarks"""
     np.random.seed(random_seed)
     data = pd.read_csv(pm.get_data("radon.csv"))
-    data["log_radon"] = data["log_radon"].astype(theano.config.floatX)
+    data["log_radon"] = data["log_radon"].astype(aesara.config.floatX)
     county_idx = data.county_code.values
 
     n_counties = len(data.county.unique())
@@ -61,8 +61,8 @@ def mixture_model(random_seed=1234):
         mu = pm.Normal("mu", mu=0.0, sd=10.0, shape=w_true.shape)
         enforce_order = pm.Potential(
             "enforce_order",
-            tt.switch(mu[0] - mu[1] <= 0, 0.0, -np.inf)
-            + tt.switch(mu[1] - mu[2] <= 0, 0.0, -np.inf),
+            aet.switch(mu[0] - mu[1] <= 0, 0.0, -np.inf)
+            + aet.switch(mu[1] - mu[2] <= 0, 0.0, -np.inf),
         )
         tau = pm.Gamma("tau", alpha=1.0, beta=1.0, shape=w_true.shape)
         pm.NormalMixture("x_obs", w=w, mu=mu, tau=tau, observed=x)

diff --git a/...rce/Advanced_usage_of_Theano_in_PyMC3.rst → ...rce/Advanced_usage_of_Aesara_in_PyMC3.rst b/...rce/Advanced_usage_of_Theano_in_PyMC3.rst → ...rce/Advanced_usage_of_Aesara_in_PyMC3.rst
@@ -4,20 +4,20 @@
     _referenced in docs/source/notebooks/table_of_contents_tutorials.js
 
 =================================
-Advanced usage of Theano in PyMC3
+Advanced usage of Aesara in PyMC3
 =================================
 
 Using shared variables
 ======================
 
-Shared variables allow us to use values in theano functions that are
+Shared variables allow us to use values in Aesara functions that are
 not considered an input to the function, but can still be changed
 later. They are very similar to global variables in may ways::
 
-    a = tt.scalar('a')
+    a = aet.scalar('a')
     # Create a new shared variable with initial value of 0.1
-    b = theano.shared(0.1)
-    func = theano.function([a], a * b)
+    b = aesara.shared(0.1)
+    func = aesara.function([a], a * b)
     assert func(2.) == 0.2
 
     b.set_value(10.)
@@ -34,7 +34,7 @@ be time consuming if the number of datasets is large)::
     true_mu = [np.random.randn() for _ in range(10)]
     observed_data = [mu + np.random.randn(20) for mu in true_mu]
 
-    data = theano.shared(observed_data[0])
+    data = aesara.shared(observed_data[0])
     with pm.Model() as model:
         mu = pm.Normal('mu', 0, 10)
         pm.Normal('y', mu=mu, sigma=1, observed=data)
@@ -55,7 +55,7 @@ variable for our observations::
     x = np.random.randn(100)
     y = x > 0
 
-    x_shared = theano.shared(x)
+    x_shared = aesara.shared(x)
 
     with pm.Model() as model:
       coeff = pm.Normal('x', mu=0, sigma=1)
@@ -74,35 +74,35 @@ not possible to change the shape of a shared variable if that would
 also change the shape of one of the variables.
 
 
-Writing custom Theano Ops
+Writing custom Aesara Ops
 =========================
 
-While Theano includes a wide range of operations, there are cases where
+While Aesara includes a wide range of operations, there are cases where
 it makes sense to write your own. But before doing this it is a good
 idea to think hard if it is actually necessary. Especially if you want
 to use algorithms that need gradient information — this includes NUTS and
 all variational methods, and you probably *should* want to use those —
 this is often quite a bit of work and also requires some math and
 debugging skills for the gradients.
 
-Good reasons for defining a custom Op might be the following:
+Good reasons for defining a custom `Op` might be the following:
 
-- You require an operation that is not available in Theano and can't
-  be build up out of existing Theano operations. This could for example
+- You require an operation that is not available in Aesara and can't
+  be build up out of existing Aesara operations. This could for example
   include models where you need to solve differential equations or
   integrals, or find a root or minimum of a function that depends
   on your parameters.
 - You want to connect your PyMC3 model to some existing external code.
 - After carefully considering different parametrizations and a lot
   of profiling your model is still too slow, but you know of a faster
-  way to compute the gradient than what theano is doing. This faster
+  way to compute the gradient than what Aesara is doing. This faster
   way might be anything from clever maths to using more hardware.
   There is nothing stopping anyone from using a cluster via MPI in
   a custom node, if a part of the gradient computation is slow enough
   and sufficiently parallelizable to make the cost worth it.
   We would definitely like to hear about any such examples.
 
-Theano has extensive `documentation, <http://deeplearning.net/software/theano/extending/index.html>`_
+Aesara has extensive `documentation, <https://aesara.readthedocs.io/en/latest/extending/index.html>`_
 about how to write new Ops.
 
 
@@ -158,7 +158,7 @@ We can now use `scipy.optimize.newton` to find the root::
     def mu_from_theta(theta):
         return optimize.newton(func, 1, fprime=jac, args=(theta,))
 
-We could wrap `mu_from_theta` with `theano.compile.ops.as_op` and use gradient-free
+We could wrap `mu_from_theta` with `aesara.compile.ops.as_op` and use gradient-free
 methods like Metropolis, but to get NUTS and ADVI working, we also
 need to define the derivative of `mu_from_theta`. We can find this
 derivative using the implicit function theorem, or equivalently we
@@ -181,16 +181,16 @@ We get
     \frac{d}{d\theta}\mu(\theta)
         = - \frac{\mu(\theta)^2}{1 + \theta\mu(\theta) + e^{-\theta\mu(\theta)}}
 
-Now, we use this to define a theano op, that also computes the gradient::
+Now, we use this to define a Aesara `Op`, that also computes the gradient::
 
-    import theano
-    import theano.tensor as tt
-    import theano.tests.unittest_tools
-    from theano.graph.op import Op
+    import aesara
+    import aesara.tensor as aet
+    import aesara.tests.unittest_tools
+    from aesara.graph.op import Op
 
     class MuFromTheta(Op):
-        itypes = [tt.dscalar]
-        otypes = [tt.dscalar]
+        itypes = [aet.dscalar]
+        otypes = [aet.dscalar]
 
         def perform(self, node, inputs, outputs):
             theta, = inputs
@@ -201,23 +201,23 @@ Now, we use this to define a theano op, that also computes the gradient::
             theta, = inputs
             mu = self(theta)
             thetamu = theta * mu
-            return [- g[0] * mu ** 2 / (1 + thetamu + tt.exp(-thetamu))]
+            return [- g[0] * mu ** 2 / (1 + thetamu + aet.exp(-thetamu))]
 
 If you value your sanity, always check that the gradient is ok::
 
-    theano.tests.unittest_tools.verify_grad(MuFromTheta(), [np.array(0.2)])
-    theano.tests.unittest_tools.verify_grad(MuFromTheta(), [np.array(1e-5)])
-    theano.tests.unittest_tools.verify_grad(MuFromTheta(), [np.array(1e5)])
+    aesara.gradient.verify_grad(MuFromTheta(), [np.array(0.2)])
+    aesara.gradient.verify_grad(MuFromTheta(), [np.array(1e-5)])
+    aesara.gradient.verify_grad(MuFromTheta(), [np.array(1e5)])
 
-We can now define our model using this new op::
+We can now define our model using this new `Op`::
 
     import pymc3 as pm
 
-    tt_mu_from_theta = MuFromTheta()
+    aet_mu_from_theta = MuFromTheta()
 
     with pm.Model() as model:
         theta = pm.HalfNormal('theta', sigma=1)
-        mu = pm.Deterministic('mu', tt_mu_from_theta(theta))
+        mu = pm.Deterministic('mu', aet_mu_from_theta(theta))
         pm.Normal('y', mu=mu, sigma=0.1, observed=[0.2, 0.21, 0.3])
 
         trace = pm.sample()
diff --git a/docs/source/Gaussian_Processes.rst b/docs/source/Gaussian_Processes.rst
@@ -113,7 +113,7 @@ which allows users to combine covariance functions into new ones, for example:
 
 After the covariance function is defined, it is now a function that is
 evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`).  Since
-PyMC3 is built on top of Theano, it is relatively easy to define and experiment
+PyMC3 is built on top of Aesara, it is relatively easy to define and experiment
 with non-standard covariance and mean functons.  For more information check out
 the tutorial on covariance functions.
 
@@ -158,7 +158,7 @@ other type of random variable.  The first argument is the name of the random
 variable representing the function we are placing the prior over.
 The second argument is the inputs to the function that the prior is over,
 :code:`X`.  The inputs are usually known and present in the data, but they can
-also be PyMC3 random variables.  If the inputs are a Theano tensor or a
+also be PyMC3 random variables.  If the inputs are a Aesara tensor or a
 PyMC3 random variable, the :code:`shape` needs to be given.
 
 Usually at this point, inference is performed on the model.  The

diff --git a/docs/source/Probability_Distributions.rst b/docs/source/Probability_Distributions.rst
@@ -27,7 +27,7 @@ A variable requires at least a ``name`` argument, and zero or more model paramet
 
         p = pm.Beta('p', 1, 1, shape=(3, 3))
 
-Probability distributions are all subclasses of ``Distribution``, which in turn has two major subclasses: ``Discrete`` and ``Continuous``. In terms of data types, a ``Continuous`` random variable is given whichever floating point type is defined by ``theano.config.floatX``, while ``Discrete`` variables are given ``int16`` types when ``theano.config.floatX`` is ``float32``, and ``int64`` otherwise.
+Probability distributions are all subclasses of ``Distribution``, which in turn has two major subclasses: ``Discrete`` and ``Continuous``. In terms of data types, a ``Continuous`` random variable is given whichever floating point type is defined by ``aesara.config.floatX``, while ``Discrete`` variables are given ``int16`` types when ``aesara.config.floatX`` is ``float32``, and ``int64`` otherwise.
 
 All distributions in ``pm.distributions`` will have two important methods: ``random()`` and ``logp()`` with the following signatures: