diff --git a/notebooks/L96-two-scale-description.ipynb b/notebooks/L96-two-scale-description.ipynb
index 841a1c7c..c5dad380 100644
--- a/notebooks/L96-two-scale-description.ipynb
+++ b/notebooks/L96-two-scale-description.ipynb
@@ -14,7 +14,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In 1996, Lorenz proposed a two-time scale dynamical system, referred to as the Lorenz-96 model (L96), whose behavior mimics the non-linear dynamics of the extratropical atmosphere with simplified representation of multiscale interactions and nonlinear advection [Lorenz (1996)](https://www.ecmwf.int/en/elibrary/10829-predictability-problem-partly-solved). The L96 model consists of two sets of equations coupling two sets of variables ($X_k$ and $Y_j,k$), which evolve over two (slow and fast) timescales and are arranged around a latitude circle as shown in Fig. 1. The equations comprising L96 are:\n",
+    "In 1996, Lorenz proposed a two-time scale dynamical system, referred to as the Lorenz-96 model (L96), whose behavior mimics the non-linear dynamics of the extratropical atmosphere with simplified representation of multiscale interactions and nonlinear advection [Lorenz (1996)](https://www.ecmwf.int/en/elibrary/10829-predictability-problem-partly-solved). The L96 model consists of two sets of equations coupling two sets of variables ($X_k$ and $Y_{j,k}$), which evolve over two (slow and fast) timescales and are arranged around a latitude circle as shown in Fig. 1. The equations comprising L96 are:\n",
     "\\begin{align}\n",
     "\\frac{d}{dt} X_k\n",
     "&= - X_{k-1} \\left( X_{k-2} - X_{k+1} \\right) - X_k + F - \\left( \\frac{hc}{b} \\right) \\sum_{j=0}^{J-1} Y_{j,k},\n",
@@ -23,7 +23,7 @@
     "&= - cbY_{j+1,k} \\left( Y_{j+2,k} - Y_{j-1,k} \\right) - c Y_{j,k} + \\frac{hc}{b} X_k\n",
     "\\end{align}\n",
     "\n",
-    "where $X_k$, $k=1,\\ldots,K$, denotes $K$ slow (or low-frequency) variables and $Y_{j,k}$ , $j=1,\\ldots,J$ denotes $J*K$ fast (or high-frequency) variables. Each slow $X_k$ is coupled with $J$ fast $Y_{j,k}$ variables. The slow equations are coupled to the fast equations via a coupling term, $\\sum_{j=0}^{J-1} Y_{j,k}$, which sums over the $J$ fast variables corresponding to a particular $k$. On the other hand, each fast equation is forced by the a coupling term, \\frac{hc}{b} X_k, that depends on the slow variable corresponding that particular $k$. The coupling and the fast time-scale equations depends on three key parameters: $b$, $c$ and $h$. Here $b$ determines the magnitude of the nonlinear interactions among the fast variables, $c$ controls how rapidly the fast variables fluctuate compared to the slow variables and, and $h$ governs the strength of the coupling between the slow and fast variables. Also, the slow time-scale equation is forced by $F$, e.g. [Wilks (2005)](http://rmets.onlinelibrary.wiley.com/doi/abs/10.1256/qj.04.03).\n",
+    "where $X_k$, $k=1,\\ldots,K$, denotes $K$ slow (or low-frequency) variables and $Y_{j,k}$ , $j=1,\\ldots,J$ denotes $J*K$ fast (or high-frequency) variables. Each slow $X_k$ is coupled with $J$ fast $Y_{j,k}$ variables. The slow equations are coupled to the fast equations via a coupling term, $\\sum_{j=0}^{J-1} Y_{j,k}$, which sums over the $J$ fast variables corresponding to a particular $k$. On the other hand, each fast equation is forced by a coupling term, $\\frac{hc}{b} X_k$, that depends on the slow variable corresponding that particular $k$. The coupling term and the fast time-scale equation depend on three key parameters: $b$, $c$ and $h$. Here $b$ determines the magnitude of the nonlinear interactions among the fast variables, $c$ controls how rapidly the fast variables fluctuate compared to the slow variables and, $h$ governs the strength of the coupling between the slow and fast variables. Moreover, the slow time-scale equation is forced by the parameter $F$, whose value determines the chaotic behaviour of the system. E.g. [Wilks (2005)](http://rmets.onlinelibrary.wiley.com/doi/abs/10.1256/qj.04.03).\n",
     "\n",
     "The chaotic dynamical system L96 is very useful for testing different numerical methods in atmospheric modeling thanks to its transparency, low computational cost and simplicity compared to Global Climate Models (GCM). The interaction between variables of different scales makes the L96 model of particular interest when evaluating new parameterization methodologies. As such, it was used in assessing different techniques that were later incorporated into GCMs ([Crommelin (2008)](https://journals.ametsoc.org/view/journals/atsc/65/8/2008jas2566.1.xml), [Dorrestijn (2013)](https://royalsocietypublishing.org/doi/10.1098/rsta.2012.0374)).\n",
     "\n",
@@ -105,7 +105,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "An example of how to use `integrate_L96_2t` is shown below. Here we choose $K=36$ and $J=10$ (i.e. 36 values of $X$ and 10 values of $Y$ per value of $X$). Also, in accordance with previous studies, we set $h=1$, $c=10$, and $b=10$. The initial condition for $X$ is setup as $X(s)=s(s-1)(s+1)$, for $s=-1\\ldots 1$, and the $Y$ is initialized with zeros. The value of $F$ is set to $10$, which is sufficient to obtain chaotic behavior. *Note: if you increase $F$, you may need to reduce $dt$ for numerical stability.* "
+    "An example of how to use `integrate_L96_2t` is shown below. Here we choose $K=36$ and $J=10$ (i.e. 36 values of $X$ and 10 values of $Y$ per value of $X$). Also, in accordance with previous studies, we set $h=1$, $c=10$, and $b=10$. The initial condition for $X$ is setup as $X(s)=s(s-1)(s+1)$, for $s=-1\\ldots 1$, and the $Y$ is initialized with zeros. The value of $F$ is set to $10$, which is sufficient to obtain chaotic behavior.\n",
+    "\n",
+    "*Note: if you increase $F$, you may need to reduce $dt$ for numerical stability.* "
    ]
   },
   {
@@ -166,15 +168,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "plt.figure(figsize=(12, 10))\n",
+    "plt.figure(figsize=(12, 10), dpi=150)\n",
     "plt.subplot(221)\n",
     "\n",
     "# Snapshot of X[k]\n",
     "plt.plot(k, X[-1], label=\"$X_k(t=n_t dt)$\")\n",
     "plt.plot(j / J, Y[-1], label=\"$Y_{j,k}(t=n_t dt)$\")\n",
-    "plt.plot(k, Xinit, \"k:\", label=\"$X_k(t=0)$\")\n",
-    "plt.plot(j / J, Yinit, \"k:\", label=\"$Y_{j,k}(t=0)$\")\n",
-    "plt.legend()\n",
+    "plt.plot(k, X_init, \"k:\", label=\"$X_k(t=0)$\")\n",
+    "plt.plot(j / J, Y_init, \"k:\", label=\"$Y_{j,k}(t=0)$\")\n",
+    "plt.legend(fontsize=7)\n",
     "plt.xlabel(\"k, k + j/J\")\n",
     "plt.title(\"$X_k, Y_{j,k}$\")\n",
     "plt.subplot(222)\n",
@@ -182,7 +184,7 @@
     "# Sample time-series X[0](t), Y[0](t)\n",
     "plt.plot(t, X[:, 0], label=\"$X_0(t)$\")\n",
     "plt.plot(t, Y[:, 0], label=\"$Y_{0,0}(t)$\")\n",
-    "plt.legend()\n",
+    "plt.legend(fontsize=7)\n",
     "plt.xlabel(\"t\")\n",
     "plt.subplot(223)\n",
     "\n",
@@ -336,7 +338,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.16"
+   "version": "3.8.13"
   }
  },
  "nbformat": 4,
diff --git a/notebooks/gcm-parameterization-problem.ipynb b/notebooks/gcm-parameterization-problem.ipynb
index a9892e6e..134df5bf 100644
--- a/notebooks/gcm-parameterization-problem.ipynb
+++ b/notebooks/gcm-parameterization-problem.ipynb
@@ -29,7 +29,7 @@
    "id": "f4905750",
    "metadata": {},
    "source": [
-    "Starting from where we stopped last time, let's assume from now on that we are all familiar  with the {cite}`Lorenz1995` two-time scale model and its pratical numerical implementation in the `L96_model` module.  "
+    "Starting from where we stopped last time, let's assume from now on that we are all familiar  with the {cite}`Lorenz1995` two-time scale model and its pratical numerical implementation in the `L96_model` module."
    ]
   },
   {
@@ -42,10 +42,17 @@
     "import matplotlib.pyplot as plt\n",
     "import numpy as np\n",
     "\n",
-    "# L96 provides the \"real world\", L96_eq1_xdot is the beginning of rhs of X tendency\n",
     "from L96_model import L96, RK2, RK4, EulerFwd, L96_eq1_xdot"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "338aff9c-2891-4583-9efa-b1a56669fe2f",
+   "metadata": {},
+   "source": [
+    "Here, `L96` serves as the **\"real world\"** whereas `L96_eq1_xdot` represents the **beginning of rhs of X tendency**."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -56,47 +63,43 @@
     "# Setting the seed gives us reproducible results\n",
     "np.random.seed(13)\n",
     "\n",
-    "# Create a \"real world\" with K=8 and J=32\n",
-    "W = L96(8, 32, F=18)\n",
-    "\n",
-    "# Run \"real world\" for 3 days to forget initial conditons\n",
-    "# (store=True save the final state as an initial condition for the next run)\n",
-    "W.run(0.05, 3.0, store=True);\n",
-    "\n",
-    "# From here on we can use W.X as perfect initial conditions for a model\n",
-    "# and sample the real world using W.run(dt,T)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8e1c62ff",
-   "metadata": {},
-   "source": [
-    "Let's call $Z(t)$ the trajectory of the full complexity physical system (say planet earth). Because in practice, for computational or observational reasons, we cannot afford describing and predicting $Z(t)$, we will only focus on a projection of $Z(t)$ in some lower dimension space. Let's call this reduced dimension state $X(t)$."
+    "# Create a \"real world\" with K = 8 and J = 32\n",
+    "W = L96(8, 32, F=18)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7e04ccf0",
+   "id": "4b4b5606-be7c-4fd3-8f3d-fa7fbdbec74f",
    "metadata": {},
    "source": [
-    "In our toy model,  $Z(t)=(X(t),Y(t))$ is the full complexity physical system while $X(t)$ is the lower dimension reduction. In real case situations, the lower dimension representation of the real system is usually a coarse-grained or a subsampled description of the full-scale system but this is not exclusive. "
+    "We *run the \"real world\" for 3 days in order to forget the initial conditons*."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "d66dd9fa",
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c52019dd-0d33-45d2-a10a-08429c6374a2",
    "metadata": {},
+   "outputs": [],
    "source": [
-    "Now, a GCM is simply a numerical machine which intends to predict the trajectory $X(t)$ from knowledge of $X(t=0)$ only. A GCM is generally built from first principle physical laws, by discretizing partial differential equations. "
+    "# `store=True` saves the final state as an initial condition for the next run.\n",
+    "W.run(0.05, 3.0, store=True);"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0364ca7d",
+   "id": "8e1c62ff",
    "metadata": {},
    "source": [
-    "In what follows, we therefore assume that we know a fraction of the terms that govern the evolution of $X$. We also assume that we do not know what governs the evolution of $Y$ nor how $Y$ may affect $X$. "
+    "From here on we can use `W.X` as perfect initial conditions for a model and sample the real world using `W.run(dt, T)`\n",
+    "\n",
+    "Let's call $Z(t)$ the trajectory of the full complexity physical system (say planet earth). Because in practice, for computational or observational reasons, we cannot afford describing and predicting $Z(t)$, we will only focus on a projection of $Z(t)$ in some lower dimension space. Let's call this reduced dimension state $X(t)$.\n",
+    "\n",
+    "In our toy model,  $Z(t)=(X(t),Y(t))$ is the full complexity physical system while $X(t)$ is the lower dimension reduction. In real case situations, the lower dimension representation of the real system is usually a coarse-grained or a subsampled description of the full-scale system but this is not exclusive.\n",
+    "\n",
+    "Now, a GCM is simply a numerical machine which intends to predict the trajectory $X(t)$ from knowledge of $X(t=0)$ only. A GCM is generally built from first principle physical laws, by discretizing partial differential equations.\n",
+    "\n",
+    "In what follows, we therefore assume that we know a fraction of the terms that govern the evolution of $X$. We also assume that we do not know what governs the evolution of $Y$ nor how $Y$ may affect $X$."
    ]
   },
   {
@@ -106,8 +109,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# X0 - initial conditions, dt - time increment, nt - number of forward steps to take\n",
-    "def GCMnoparam(X0, F, dt, nt):\n",
+    "def GCM_no_parameterization(X0, F, dt, nt):\n",
+    "    \"\"\"GCM without parameterization\n",
+    "\n",
+    "    Args:\n",
+    "        X0: initial conditions\n",
+    "        dt: time increment\n",
+    "        nt: number of forward steps to take\n",
+    "    \"\"\"\n",
     "    time, hist, X = (\n",
     "        dt * np.arange(nt + 1),\n",
     "        np.zeros((nt + 1, len(X0))) * np.nan,\n",
@@ -121,6 +130,14 @@
     "    return hist, time"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "0a620522-8a1a-4eda-9b0a-04e304b49255",
+   "metadata": {},
+   "source": [
+    "This GCM is unstable due to Euler forward time stepping scheme, so we don't integrate it for too long and compare it to the real world with the same time interval as `dt` used by the model."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -128,10 +145,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# This GCM is unstable due to Euler forward time stepping scheme,\n",
-    "# so we integrate for not too long...\n",
     "F, dt, T = 18, 0.01, 3.0\n",
-    "X, t = GCMnoparam(W.X, F, dt, int(T / dt))"
+    "X, t = GCM_no_parameterization(W.X, F, dt, int(T / dt))"
    ]
   },
   {
@@ -141,14 +156,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# ... and compare to the real world with the same time interval as \"dt\" used by the model\n",
-    "Xtrue, _, _ = W.run(dt, T)\n",
+    "# Compare with the real world\n",
+    "X_true, _, _ = W.run(dt, T)\n",
     "\n",
-    "plt.plot(t, Xtrue[:, 4], label=\"Truth ie L96\")\n",
-    "plt.plot(t, X[:, 4], label=\"Model ie GCM\")\n",
+    "plt.figure(dpi=150)\n",
+    "plt.plot(t, X_true[:, 4], label=\"Truth (L96)\")\n",
+    "plt.plot(t, X[:, 4], label=\"Model (GCM)\")\n",
     "plt.xlabel(\"$t$\")\n",
     "plt.ylabel(\"$X_4(t)$\")\n",
-    "plt.legend();"
+    "plt.legend(fontsize=7);"
    ]
   },
   {
@@ -156,11 +172,11 @@
    "id": "dad9cbd6",
    "metadata": {},
    "source": [
-    "There are several reasons why the above Model (ie GCM) differs from Truth (ie L96), cf section 5 of this notebook. \n",
+    "There are several reasons why the above Model (i.e. GCM) differs from Truth (i.e. L96), see [section 5](#how-other-sources-of-error-affect-the-parameterization-problem) of this notebook. \n",
     "\n",
-    "One possibility to reduce differences in between Model and Truth, is to add a *parameterization* : an extra term to the rhs of the Model evolution operator in order to reduce the Model error as compared to Truth. It may account for missing processes, acting in Truth but not included in Model, eventually relating to subgrid physics, which is the case here, but not automatically. *Parameterizations* are also commonly refered to as *closures*, in particular when they encode explicit physical assumptions on how non-represented variables (e.g. $Y$) impact represented variables (e.g. $X$). \n",
+    "One way to reduce the differences between the Model and the Truth, is to add *parameterization*: an extra term to the rhs of the Model evolution operator in order to reduce the Model error as compared to the Truth. It may account for missing processes, acting in Truth but not included in Model, eventually relating to subgrid physics, which is the case here, but not automatically. *Parameterizations* are also commonly refered to as *closures*, in particular when they encode explicit physical assumptions on how non-represented variables (e.g. $Y$) impact represented variables (e.g. $X$). \n",
     "\n",
-    "Parameterizations usually involve free parameters that need to be adjusted. The form of the parameterization may be dictated by physical laws, but generally is unknown as well."
+    "Parameterizations usually involve free parameters that need to be adjusted. The form of the parameterization may be dictated by physical laws, but generally it is unknown as well."
    ]
   },
   {
@@ -178,8 +194,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# - a GCM class including a parameterization in rhs of equation for tendency\n",
     "class GCM:\n",
+    "    \"\"\"GCM with parameterization in rhs of equation for tendency\"\"\"\n",
+    "\n",
     "    def __init__(self, F, parameterization, time_stepping=EulerFwd):\n",
     "        self.F = F\n",
     "        self.parameterization = parameterization\n",
@@ -189,8 +206,13 @@
     "        return L96_eq1_xdot(X, self.F) - self.parameterization(param, X)\n",
     "\n",
     "    def __call__(self, X0, dt, nt, param=[0]):\n",
-    "        # X0 - initial conditions, dt - time increment, nt - number of forward steps to take\n",
-    "        # param - parameters of our closure\n",
+    "        \"\"\"\n",
+    "        Args:\n",
+    "            X0: initial conditions\n",
+    "            dt: time increment\n",
+    "            nt: number of forward steps to take\n",
+    "            param: parameters of our closure\n",
+    "        \"\"\"\n",
     "        time, hist, X = (\n",
     "            dt * np.arange(nt + 1),\n",
     "            np.zeros((nt + 1, len(X0))) * np.nan,\n",
@@ -204,6 +226,14 @@
     "        return hist, time"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "5f6fd74a-3427-4dd3-b8b4-013acf2c49ff",
+   "metadata": {},
+   "source": [
+    "As a first step, we illustrate introducing a polynomial parameterization to GCM and then compare the model to the true trajectories obtained from the real world with the same time interval as `dt` used by the model."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -211,7 +241,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# As a first step, we illustrate introducing a polynomial parameterization to GCM\n",
     "naive_parameterization = lambda param, X: np.polyval(param, X)\n",
     "F, dt, T = 18, 0.01, 5.0\n",
     "gcm = GCM(F, naive_parameterization)\n",
@@ -225,16 +254,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Comparing model and true trajectories.\n",
-    "\n",
-    "# This samples the real world with the same time interval as \"dt\" used by the model\n",
-    "Xtrue, _, _ = W.run(dt, T)\n",
+    "# Compare with the real world\n",
+    "X_true, _, _ = W.run(dt, T)\n",
     "\n",
-    "plt.plot(t, Xtrue[:, 4], label=\"Truth\")\n",
+    "plt.figure(dpi=150)\n",
+    "plt.plot(t, X_true[:, 4], label=\"Truth\")\n",
     "plt.plot(t, X[:, 4], label=\"Model\")\n",
     "plt.xlabel(\"$t$\")\n",
     "plt.ylabel(\"$X_4(t)$\")\n",
-    "plt.legend();"
+    "plt.legend(fontsize=7);"
    ]
   },
   {
@@ -246,7 +274,7 @@
     "\n",
     "The parameterization problem boils down to defining the formulation and finding the best parameters in order to minimize the distance between the true trajectory and the model trajectory.\n",
     "\n",
-    "Within M2LINES we are approaching this problem as a Machine Learning problem, namely we want to learn parameterizations from objective measures of their skills through an optimization procedure. \n",
+    "**With M2LINES, we are approaching this problem as a Machine Learning problem. We want to learn parameterizations from objective measures of their skills through an optimization procedure.**\n",
     "\n",
     "But we are not only interested in learning the parameters of existing formulations. More generally, we would like to learn the formulation too.   "
    ]
@@ -272,7 +300,7 @@
    "id": "a6083799",
    "metadata": {},
    "source": [
-    "But this is not a good assumption because two identical reduced dimension state ($X$, macro-state) can be associated with very different fine scale states ($Y$, micro-state). Given the non-linearity of the evolution equation for $Z$, the two large scale trajectories will diverge at some point. Let's illustrate that with L96 alone. "
+    "But this is not a good assumption because the two identical reduced dimension states ($X$, macro-state) can be associated with very different fine scale states ($Y$, micro-state). Given the non-linearity of the evolution equation for $Z$, the two large scale trajectories will diverge at some point. Let's illustrate that with L96 alone. "
    ]
   },
   {
@@ -282,7 +310,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# - randomising the initial Ys\n",
+    "# Randomising the initial Ys\n",
     "np.random.seed(13)\n",
     "\n",
     "# Duplicating L96 to create perturbed versions that include random perturbations in Y\n",
@@ -303,15 +331,25 @@
    "outputs": [],
    "source": [
     "# Running L96 and perturbed versions to compare results\n",
-    "Xtrue, _, _ = W.run(dt, T)\n",
-    "Xpert1, _, _ = Wp1.run(dt, T)\n",
-    "Xpert2, _, _ = Wp2.run(dt, T)\n",
-    "plt.plot(t, Xtrue[:, 4], label=\"Truth\")\n",
-    "plt.plot(t, Xpert1[:, 4], label=\"Perturbed\")\n",
-    "plt.plot(t, Xpert2[:, 4], label=\"Perturbed\")\n",
+    "X_true, _, _ = W.run(dt, T)\n",
+    "X_pert1, _, _ = Wp1.run(dt, T)\n",
+    "X_pert2, _, _ = Wp2.run(dt, T)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "04d98879-6013-4161-b6b6-12f45ee7227e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.figure(dpi=150)\n",
+    "plt.plot(t, X_true[:, 4], label=\"Truth\")\n",
+    "plt.plot(t, X_pert1[:, 4], label=\"Perturbed\")\n",
+    "plt.plot(t, X_pert2[:, 4], label=\"Perturbed\")\n",
     "plt.xlabel(\"$t$\")\n",
     "plt.ylabel(\"$X_4(t)$\")\n",
-    "plt.legend();"
+    "plt.legend(fontsize=7);"
    ]
   },
   {
@@ -319,7 +357,7 @@
    "id": "81b73260",
    "metadata": {},
    "source": [
-    "So even very small uncertainties in the micro-state ($Y$) of L96 can lead to large scale changes (ie of the variable $X$) over short time... \n",
+    "So even very small uncertainties in the micro-state ($Y$) of L96 can lead to large scale changes (i.e. of the variable $X$) over short time.\n",
     "\n",
     "In a Model that does not know anything about micro-state $Y$, it is possible to introduce this uncertainty through a stochastic form in the parameterization.\n",
     "\n",
@@ -349,20 +387,37 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# - Let's define again our GCM\n",
+    "# Defining our GCM\n",
     "F, dt, T = 18, 0.01, 100.0\n",
     "gcm = GCM(F, naive_parameterization)\n",
-    "Xgcm, t = gcm(W.X, dt, int(T / dt), param=[0.85439536, 1.75218026])\n",
-    "\n",
-    "# - ... the true state\n",
-    "Xtrue, _, _ = W.run(dt, T)\n",
-    "\n",
-    "## and plot the results\n",
-    "plt.plot(t[:500], Xtrue[:500, 4], label=\"Truth\")\n",
-    "plt.plot(t[:500], Xgcm[:500, 4], label=\"GCM\")\n",
+    "X_gcm, t = gcm(W.X, dt, int(T / dt), param=[0.85439536, 1.75218026])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9f7079af-4c99-4f13-83a6-00144979c654",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Setting the true state\n",
+    "X_true, _, _ = W.run(dt, T)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "354b8190-af9b-4bc2-a4c4-de6a9266cc56",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Plotting the results\n",
+    "plt.figure(dpi=150)\n",
+    "plt.plot(t[:500], X_true[:500, 4], label=\"Truth\")\n",
+    "plt.plot(t[:500], X_gcm[:500, 4], label=\"GCM\")\n",
     "plt.xlabel(\"$t$\")\n",
     "plt.ylabel(\"$X_4(t)$\")\n",
-    "plt.legend();"
+    "plt.legend(fontsize=7);"
    ]
   },
   {
@@ -380,8 +435,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# - A first simple choice :\n",
-    "def pointwise(X1, X2, L=1.0):  # computed over some window t<L.\n",
+    "def pointwise_msq(X1, X2, L=1.0):\n",
+    "    \"\"\"Pointwise mean square error computed over some window t < L.\"\"\"\n",
     "    D = (X1 - X2)[np.where(t < L)]\n",
     "    return np.sqrt(D**2).mean(axis=0)"
    ]
@@ -393,8 +448,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "measure_dist = lambda l: pointwise(Xtrue, Xgcm, L=l)\n",
-    "dist = np.array([measure_dist(l) for l in t])"
+    "measure_distance = lambda l: pointwise_msq(X_true, X_gcm, L=l)\n",
+    "\n",
+    "# Since t is sorted, t[0] will give an empty distance in the `pointwise_msq` function.\n",
+    "# So we calculate the mean square error starting from the 2nd element i.e. t[1]\n",
+    "dist = np.array([measure_distance(l) for l in t[1:]])"
    ]
   },
   {
@@ -404,12 +462,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# - plotting how this distance grows with the length of the window\n",
-    "#  for all the components of X\n",
-    "plt.plot(t, dist)\n",
+    "# Plotting how this distance grows with the length of the window\n",
+    "# for all the components of X\n",
+    "plt.figure(dpi=150)\n",
+    "plt.plot(t[1:], dist)\n",
     "plt.xlabel(\"$t$\")\n",
-    "plt.ylabel(\"$||X_{true}-X_{gcm}||_2$\")\n",
-    "plt.show()"
+    "plt.ylabel(\"$||X_{true}-X_{gcm}||_2$\");"
    ]
   },
   {
@@ -427,8 +485,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# - mean state metric :\n",
     "def dist_mean(X1, X2, L=1.0):\n",
+    "    \"\"\"Mean state metric.\"\"\"\n",
     "    _X1 = X1[np.where(t < L)]\n",
     "    _X2 = X2[np.where(t < L)]\n",
     "    return np.sqrt((_X1.mean(axis=0) - _X2.mean(axis=0)) ** 2)"
@@ -441,7 +499,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "measure_dist = lambda l: dist_mean(Xtrue, Xgcm, L=l)\n",
+    "measure_dist = lambda l: dist_mean(X_true, X_gcm, L=l)\n",
     "dist = np.array([measure_dist(l) for l in t])"
    ]
   },
@@ -452,12 +510,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# - plotting how this distance grows with the length of the window\n",
-    "#  for all the components of X\n",
+    "# Plotting how this distance grows with the length of the window\n",
+    "# for all the components of X\n",
+    "plt.figure(dpi=150)\n",
     "plt.plot(t, dist)\n",
     "plt.xlabel(\"$t$\")\n",
-    "plt.ylabel(\"$||\\overline{X_{true}}-\\overline{X_{gcm}}||_2$\")\n",
-    "plt.show()"
+    "plt.ylabel(\"$||\\overline{X_{true}}-\\overline{X_{gcm}}||_2$\");"
    ]
   },
   {
@@ -465,31 +523,13 @@
    "id": "c769ec8e",
    "metadata": {},
    "source": [
-    "After some time, this metric seems to converge to some non-trivial values which are probably indicative of how well our model captures the \"climate\" of the system. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "155d68b1",
-   "metadata": {},
-   "source": [
-    "It is also very common to formulate closures based on databases of initial tendencies.  In the Large Eddy Simulation  community, this is refered to as *a priori* skill, because you don't need to run the full model to compute it. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1cdb760c",
-   "metadata": {},
-   "source": [
-    "This is the sort of game that several of us have been playing, trying for instance to estimate subgrid fluxes from knowledge of the large scale quantities  $$ \\nabla\\cdot \\mathbf{s} =   \\nabla\\cdot\\big(\\overline{\\mathbf{u}\\,\\Phi} - \\overline{\\mathbf{u}}\\,\\overline{\\Phi}\\big) \\simeq f(\\overline{\\mathbf{u}},\\overline{\\Phi})$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4c6fd549",
-   "metadata": {},
-   "source": [
-    "(Note that this is not exactly the same problem as the *a priori* LES problem, because of the interplay with time-discretization. Let's neglect that for the moment.) "
+    "After some time, this metric seems to converge to some non-trivial values which are probably indicative of how well our model captures the \"climate\" of the system.\n",
+    "\n",
+    "It is also very common to formulate closures based on databases of initial tendencies.  In the Large Eddy Simulation  community, this is refered to as *a priori* skill, because you don't need to run the full model to compute it.\n",
+    "\n",
+    "This is the sort of game that several of us have been playing, trying for instance to estimate subgrid fluxes from knowledge of the large scale quantities  $$ \\nabla\\cdot \\mathbf{s} =   \\nabla\\cdot\\big(\\overline{\\mathbf{u}\\,\\Phi} - \\overline{\\mathbf{u}}\\,\\overline{\\Phi}\\big) \\simeq f(\\overline{\\mathbf{u}},\\overline{\\Phi})$$\n",
+    "\n",
+    "*Note that this is not exactly the same problem as the `a priori` LES problem, because of the interplay with time-discretization. Let's neglect that for the moment.*"
    ]
   },
   {
@@ -522,12 +562,10 @@
    "source": [
     "F, dt, T = 18, 0.01, 0.01\n",
     "\n",
+    "# Let's define again the true state\n",
+    "X_true, _, _ = W.run(dt, T)\n",
     "\n",
-    "# - Let's define again the true state\n",
-    "Xtrue, _, _ = W.run(dt, T)\n",
-    "\n",
-    "#  and an ensemble of trajectories :\n",
-    "\n",
+    "# and an ensemble of trajectories\n",
     "gcm = GCM(F, naive_parameterization)\n",
     "\n",
     "n = 100\n",
@@ -540,8 +578,8 @@
     "\n",
     "for i, p1 in enumerate(_p1):\n",
     "    for j, p2 in enumerate(_p2):\n",
-    "        Xgcm, t = gcm(W.X, dt, int(T / dt), param=[p1, p2])\n",
-    "        score[i, j] = norm_initial_tendency(Xtrue, Xgcm)"
+    "        X_gcm, t = gcm(W.X, dt, int(T / dt), param=[p1, p2])\n",
+    "        score[i, j] = norm_initial_tendency(X_true, X_gcm)"
    ]
   },
   {
@@ -551,11 +589,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "plt.figure(dpi=150)\n",
     "plt.pcolormesh(xp1, yp2, score)\n",
     "plt.colorbar()\n",
     "plt.xlabel(\"$p_2$\")\n",
     "plt.ylabel(\"$p_1$\")\n",
-    "plt.title(\"Score\")"
+    "plt.title(\"Score\");"
    ]
   },
   {
@@ -589,14 +628,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# - Generic GCM parameters\n",
+    "# Generic GCM parameters\n",
     "F, dt, T = 18, 0.01, 20.0\n",
     "X0 = W.X\n",
     "parameterization = naive_parameterization\n",
     "params = [0.85439536, 1.75218026]\n",
     "\n",
-    "# - sampling real world over a longer period of time\n",
-    "Xtrue, _, _ = W.run(dt, T)"
+    "# Sampling real world over a longer period of time\n",
+    "X_true, _, _ = W.run(dt, T)"
    ]
   },
   {
@@ -607,17 +646,6 @@
     "### Comparing GCMs with small errors in the forcing F"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d15e5e2d",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "max_relative_error = 0.01\n",
-    "Fs = F + max_relative_error * (np.random.rand(40) - 0.5)  # array of perturbed forcings"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -625,8 +653,13 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "max_relative_error = 0.01\n",
+    "Fs = F + max_relative_error * (np.random.rand(40) - 0.5)  # array of perturbed forcings\n",
+    "\n",
     "GCMs = [GCM(Fp, parameterization) for Fp in Fs]\n",
+    "\n",
     "Xs = []  # list of trajectories for individual perturbed forcings\n",
+    "\n",
     "for gcm in GCMs:\n",
     "    X, t = gcm(X0, dt, int(T / dt), param=params)\n",
     "    Xs.append(X)"
@@ -639,13 +672,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "plt.figure(dpi=150)\n",
     "plt.plot(t, Xs[0][:, 4], label=\"Model with perturbed forcings\", color=\"grey\", lw=1)\n",
     "for X in Xs[1:]:\n",
     "    plt.plot(t, X[:, 4], color=\"grey\", lw=1)\n",
-    "plt.plot(t, Xtrue[:, 4], label=\"Truth\", lw=2)\n",
+    "plt.plot(t, X_true[:, 4], label=\"Truth\", lw=2)\n",
     "plt.xlabel(\"$t$\")\n",
     "plt.ylabel(\"$X_4(t)$\")\n",
-    "plt.legend();"
+    "plt.legend(fontsize=7);"
    ]
   },
   {
@@ -664,11 +698,15 @@
    "outputs": [],
    "source": [
     "relative_error = 0.01\n",
+    "\n",
     "X0s = X0 + max_relative_error * (\n",
     "    np.random.rand(50, 8) - 0.5\n",
     ")  # array of perturbed initial conditions\n",
+    "\n",
     "gcm = GCM(F, parameterization)\n",
+    "\n",
     "Xs = []  # list of trajectories for individual preturbed initial conditions\n",
+    "\n",
     "for X0 in X0s:\n",
     "    X, t = gcm(X0, dt, int(T / dt), param=params)\n",
     "    Xs.append(X)"
@@ -681,13 +719,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "plt.figure(dpi=150)\n",
     "plt.plot(t, Xs[0][:, 4], label=\"Model with perturbed ICs\", color=\"grey\", lw=1)\n",
     "for X in Xs[1:]:\n",
     "    plt.plot(t, X[:, 4], color=\"grey\", lw=1)\n",
-    "plt.plot(t, Xtrue[:, 4], label=\"Truth\", lw=2)\n",
+    "plt.plot(t, X_true[:, 4], label=\"Truth\", lw=2)\n",
     "plt.xlabel(\"$t$\")\n",
     "plt.ylabel(\"$X_4(t)$\")\n",
-    "plt.legend();"
+    "plt.legend(fontsize=7);"
    ]
   },
   {
@@ -695,7 +734,10 @@
    "id": "d8ebe8ba",
    "metadata": {},
    "source": [
-    "The above two experiments illustrate that the definition of our distance metrics should take into account the other sources of errors in our GCMs : errors on the parameters, on the forcing and on the initial conditions. "
+    "The above two experiments illustrate that the definition of our distance metrics should take into account the other sources of errors in our GCMs:\n",
+    "- errors on the *parameters*\n",
+    "- errors on the *forcing*\n",
+    "- errors on the *initial conditions*"
    ]
   },
   {
@@ -715,11 +757,11 @@
    "source": [
     "F, dt, T = 18, 0.01, 10.0\n",
     "X0 = W.X\n",
-    "#\n",
+    "\n",
     "euler_GCM = GCM(F, naive_parameterization, time_stepping=EulerFwd)\n",
     "rk2_GCM = GCM(F, naive_parameterization, time_stepping=RK2)\n",
     "rk4_GCM = GCM(F, naive_parameterization, time_stepping=RK4)\n",
-    "#\n",
+    "\n",
     "euler_X, t = euler_GCM(X0, dt, int(T / dt), param=[0.85439536, 1.75218026])\n",
     "rk2_X, _ = rk2_GCM(X0, dt, int(T / dt), param=[0.85439536, 1.75218026])\n",
     "rk4_X, _ = rk4_GCM(X0, dt, int(T / dt), param=[0.85439536, 1.75218026])"
@@ -732,13 +774,13 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "plt.figure(dpi=150)\n",
     "plt.plot(t, euler_X[:, 3], label=\"euler\")\n",
     "plt.plot(t, rk2_X[:, 3], label=\"RK2\")\n",
     "plt.plot(t, rk4_X[:, 3], label=\"RK4\")\n",
-    "plt.legend()\n",
     "plt.xlabel(\"$t$\")\n",
     "plt.ylabel(\"$X_3(t)$\")\n",
-    "plt.show()"
+    "plt.legend(fontsize=7);"
    ]
   },
   {
@@ -746,18 +788,10 @@
    "id": "69c2b3ac",
    "metadata": {},
    "source": [
-    "This last experiment illustrates that depending in the numerical schemes of the GCM, the distance between the model prediction and the true state can be substantial. There again, we have an additional constraint on the definition of our distance metrics.\n",
+    "This last experiment illustrates that depending on the numerical schemes of the GCM, the distance between the model prediction and the true state can be substantial. There again, we have an additional constraint on the definition of our distance metrics.\n",
     "\n",
-    "More generally, this also suggests that the optimal closures we will find will probably depend on the numerical schemes that are used in our GCMs, or more generally on the structural uncertainties of our GCMs, which are difficult to quantify..."
+    "More generally, this also suggests that the optimal closures that we find will probably depend on the numerical schemes that are used in our GCMs, or more generally, on the structural uncertainties of our GCMs, which are difficult to quantify."
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "61c29072-bb48-4903-907a-833e54a92b8c",
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
@@ -776,7 +810,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.12"
+   "version": "3.9.16"
   }
  },
  "nbformat": 4,