From 9a9403ff655d164dd3570e999b8db5f29e42e7fc Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Wed, 16 Feb 2022 14:10:58 +0100
Subject: [PATCH 01/17] Add documentation on faster NMOLD.

---
 doc/fileformat/nmodl.rst  | 117 ++++++++++++++++++++++++++++++++++++++
 mechanisms/default/hh.mod |   9 ++-
 2 files changed, 121 insertions(+), 5 deletions(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 8c62bf930b..36436f7f63 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -191,3 +191,120 @@ contain full ``segments``).
 
 Modelers are encouraged to verify the expected behavior of the reversal potentials of ions
 as it can lead to vastly different model behavior.
+
+Tips for Faster NMODL
+======================
+
+NMODL is a quirky language and not well specified, which results in NMODL files
+being treated as un-understandable and untouchable. This in turn leads to
+sub-optimal performance, especially since mechanisms take up a large amount of
+the simulations' runtime budget. With some understanding of the subject matter,
+however, it is quite straightforward to obtain clean and performant NMODL files.
+We regularly have seen speed-ups factors from tuning NMODL of around three.
+
+First, let us discuss how NMODL becomes part of a simulation. NMODL mechanisms
+are given in ``.mod`` files, whose layout and syntax has been discussed above.
+These are compiled by ``modcc`` into a series of callbacks as specified by the
+mechanisms ABI. These operate on data held in Arbor's internal storage. But,
+``modcc`` does not generate machine code, it goes through C++ (and/or CUDA) as
+an intermediary which is processed by a standard C++ compiler like GCC (or nvcc)
+to produce either a shared object (for external catalogues) and code directly
+linked into Arbor (the built-in catalogues).
+
+Now, we turn to a series of tips we found helpful in producing fast NMODL
+mechanisms.
+
+``RANGE``
+---------
+
+Parameters and ``ASSIGNED`` variables marked as ``RANGE`` will be stored as an
+array with one entry per CV in Arbor. Reading and writing these incurs a memory
+access and thus affects cache and memory utilisation metrics. It is often
+cheaper to use ``LOCAL`` variables instead, even if that means foregoing the
+ability to re-use a comuted value. Compute is so much faster than memory on
+modern hardware that re-use at the expense of memory accesses is seldom
+profitable, except for the most complex terms.
+
+``PROCEDURE``
+-------------
+
+Prefer ``FUNCTION`` over ``PROCEDURE``. The latter *require* ``ASSIGNED RANGE``
+variables to return values and thus stress the memory system for little gain, as
+noted above. Also, they may not be inlined, as opposed to a ``FUNCTION``.
+
+```PARAMETER``
+--------------
+
+``PARAMETER`` should only be used for values that must be set by the simulator.
+All fixed values should be ``CONSTANT`` instead. These can be inlined and folded
+into the computations, allowing for further optimisations.
+
+Sharing Expressions Between ``INITIAL`` and ``BREAKPOINT`` or ``DERIVATIVE``
+----------------------------------------------------------------------------
+
+This is often done using a ``PROCEDURE``, which we know is inefficient. On top,
+this ``PROCEDURE`` will likely compute more outputs than strictly needed to
+accomodate both blocks. DRY code is a good idea nevertheless, so use a series of
+``FUNCTION`` instead to compute common expressions.
+
+This leads naturally to a common optimisation in H-H style ion channels. If you
+heeded the advice above, you will likely see this patter emerge:
+
+.. code::
+
+   na   = n_alpha()
+   nb   = n_beta()
+	 ntau = 1/(na + nb)
+   ninf = na*ntau
+
+   n' = (ninf - n)/ntau
+
+Written out in this explicit way it becomes obvious that this can be expressed
+in a more compact way
+
+.. code::
+
+   na   = n_alpha()
+   nb   = n_beta()
+	 nrho = na + nb
+
+   n' = (na - n)*nrho
+
+The latter code is much faster and neither ``modcc`` nor the external C++ might
+pick it up as an optimisations, largely due to floating point accuracy
+requirements. This is less easy to see when partially hidden in a ``PROCEDURE``.
+
+The Conductance Trick
+---------------------
+
+``modcc``, Arbor's NMODL compiler, applies symbolic differentiation to the
+current expression to find the conductance as ``g = d U/d I`` which are then
+used to compute the voltage update. ``g`` is thus computed multiple times every
+timestep and if the corresponding expression is inefficient, it will cost more
+time than needed. The differentiation implementation is less than smart and will
+not optimise the resulting code. This is an internal detail of Arbor, but it can
+help to produce well performing code. Here is an example
+
+.. code::
+
+  : BAD, will compute m^4 * h every step
+  i = m^4 * h * (v - e)
+
+  : GOOD, will just use a constant value of g
+  LOCAL g
+  g = m^4 * h
+  i = g * (v - e)
+
+Note that we do not lose accuracy here, since Arbor does not support
+higher-order ODEs and thus will treat ``g(v)`` as if ``v`` is a constant across
+a single timestep.
+
+
+Small Tips and Micro-Optimisations
+----------------------------------
+
+- Divisions cost a bit more than multiplications and additions.
+- ``m * m`` is more efficient than ``m^2``. This holds for higher powers as well
+  and if you want to squeeze out the utmost of performance use
+  exponentiation-by-squaring. (Although GCC does this for you. Most of the
+  time.)
diff --git a/mechanisms/default/hh.mod b/mechanisms/default/hh.mod
index 1c26322e1d..5ebf317993 100644
--- a/mechanisms/default/hh.mod
+++ b/mechanisms/default/hh.mod
@@ -25,13 +25,12 @@ ASSIGNED { q10 }
 
 BREAKPOINT {
     SOLVE states METHOD cnexp
-    LOCAL gk, m_, n_, n2
+    LOCAL gk, gna, n2
 
-    n_ = n
-    m_ = m
-    n2 = n_*n_
+    n2 = n*n
     gk = gkbar*n2*n2
-    ina = gnabar*m_*m_*m_*h*(v - ena)
+    gna = gnabar*m*m*m*h*
+    ina = gna*(v - ena)
     ik  = gk*(v - ek)
     il  = gl*(v - el)
 }

From 3582b12a2313821087f76fd29d47d569ab3269fb Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Wed, 16 Feb 2022 14:16:19 +0100
Subject: [PATCH 02/17] Typo.

---
 mechanisms/default/hh.mod | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mechanisms/default/hh.mod b/mechanisms/default/hh.mod
index 5ebf317993..d4a34700cb 100644
--- a/mechanisms/default/hh.mod
+++ b/mechanisms/default/hh.mod
@@ -29,7 +29,7 @@ BREAKPOINT {
 
     n2 = n*n
     gk = gkbar*n2*n2
-    gna = gnabar*m*m*m*h*
+    gna = gnabar*m*m*m*h
     ina = gna*(v - ena)
     ik  = gk*(v - ek)
     il  = gl*(v - el)

From 93e3ab12a6d2aaaa96154e7c278ee073fccc5238 Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 14:38:10 +0100
Subject: [PATCH 03/17] Update doc/fileformat/nmodl.rst

Co-authored-by: Brent Huisman <brenthuisman@users.noreply.github.com>
---
 doc/fileformat/nmodl.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 36436f7f63..0d14886b44 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -196,7 +196,7 @@ Tips for Faster NMODL
 ======================
 
 NMODL is a quirky language and not well specified, which results in NMODL files
-being treated as un-understandable and untouchable. This in turn leads to
+existing NMODL being treated as difficult to understand and best left as-is. This in turn leads to
 sub-optimal performance, especially since mechanisms take up a large amount of
 the simulations' runtime budget. With some understanding of the subject matter,
 however, it is quite straightforward to obtain clean and performant NMODL files.

From a29507087b428b176e2bf6dabe38abab4b75e299 Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 14:44:36 +0100
Subject: [PATCH 04/17] Update doc/fileformat/nmodl.rst

Co-authored-by: Brent Huisman <brenthuisman@users.noreply.github.com>
---
 doc/fileformat/nmodl.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 0d14886b44..c935d6fd38 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -237,7 +237,7 @@ noted above. Also, they may not be inlined, as opposed to a ``FUNCTION``.
 
 ``PARAMETER`` should only be used for values that must be set by the simulator.
 All fixed values should be ``CONSTANT`` instead. These can be inlined and folded
-into the computations, allowing for further optimisations.
+allowing for further optimisations.
 
 Sharing Expressions Between ``INITIAL`` and ``BREAKPOINT`` or ``DERIVATIVE``
 ----------------------------------------------------------------------------

From 585ad87c33b53bae49366489cfeeebc4f6d5c9d6 Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 14:48:45 +0100
Subject: [PATCH 05/17] Update doc/fileformat/nmodl.rst

Co-authored-by: Brent Huisman <brenthuisman@users.noreply.github.com>
---
 doc/fileformat/nmodl.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index c935d6fd38..0d6a5e9aeb 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -242,7 +242,7 @@ allowing for further optimisations.
 Sharing Expressions Between ``INITIAL`` and ``BREAKPOINT`` or ``DERIVATIVE``
 ----------------------------------------------------------------------------
 
-This is often done using a ``PROCEDURE``, which we know is inefficient. On top,
+This is often done using a ``PROCEDURE``, which we now know is inefficient. On top,
 this ``PROCEDURE`` will likely compute more outputs than strictly needed to
 accomodate both blocks. DRY code is a good idea nevertheless, so use a series of
 ``FUNCTION`` instead to compute common expressions.

From e438661d9ed3743051e1aac6a700b94533b0be18 Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 14:52:08 +0100
Subject: [PATCH 06/17] Update doc/fileformat/nmodl.rst

Co-authored-by: Brent Huisman <brenthuisman@users.noreply.github.com>
---
 doc/fileformat/nmodl.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 0d6a5e9aeb..679d67696e 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -205,7 +205,7 @@ We regularly have seen speed-ups factors from tuning NMODL of around three.
 First, let us discuss how NMODL becomes part of a simulation. NMODL mechanisms
 are given in ``.mod`` files, whose layout and syntax has been discussed above.
 These are compiled by ``modcc`` into a series of callbacks as specified by the
-mechanisms ABI. These operate on data held in Arbor's internal storage. But,
+:ref:`mechanism_abi`. These operate on data held in Arbor's internal storage. But,
 ``modcc`` does not generate machine code, it goes through C++ (and/or CUDA) as
 an intermediary which is processed by a standard C++ compiler like GCC (or nvcc)
 to produce either a shared object (for external catalogues) and code directly

From 20d55ae8f2bc06a33acab0b7704232f579d6c464 Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 14:55:18 +0100
Subject: [PATCH 07/17] Update doc/fileformat/nmodl.rst

Co-authored-by: Brent Huisman <brenthuisman@users.noreply.github.com>
---
 doc/fileformat/nmodl.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 679d67696e..6c1ea04e81 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -229,7 +229,7 @@ profitable, except for the most complex terms.
 -------------
 
 Prefer ``FUNCTION`` over ``PROCEDURE``. The latter *require* ``ASSIGNED RANGE``
-variables to return values and thus stress the memory system for little gain, as
+variables to return values and thus stress the memory system, which, as
 noted above. Also, they may not be inlined, as opposed to a ``FUNCTION``.
 
 ```PARAMETER``

From a5b49d2ab82e20e99906911260a874c476179784 Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 14:56:29 +0100
Subject: [PATCH 08/17] Update doc/fileformat/nmodl.rst

Co-authored-by: Brent Huisman <brenthuisman@users.noreply.github.com>
---
 doc/fileformat/nmodl.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 6c1ea04e81..12766aa214 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -230,7 +230,7 @@ profitable, except for the most complex terms.
 
 Prefer ``FUNCTION`` over ``PROCEDURE``. The latter *require* ``ASSIGNED RANGE``
 variables to return values and thus stress the memory system, which, as
-noted above. Also, they may not be inlined, as opposed to a ``FUNCTION``.
+noted above, is not most efficient on current hardware. Also, they may not be inlined, as opposed to a ``FUNCTION``.
 
 ```PARAMETER``
 --------------

From a0d08da21f2d4a986ca4592df37506fb7d37d59a Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 16:10:17 +0100
Subject: [PATCH 09/17] Update doc/fileformat/nmodl.rst

Co-authored-by: Brent Huisman <brenthuisman@users.noreply.github.com>
---
 doc/fileformat/nmodl.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 12766aa214..fdebc99266 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -195,7 +195,7 @@ as it can lead to vastly different model behavior.
 Tips for Faster NMODL
 ======================
 
-NMODL is a quirky language and not well specified, which results in NMODL files
+NMODL is a quirky language and not formally specified, which results in NMODL files
 existing NMODL being treated as difficult to understand and best left as-is. This in turn leads to
 sub-optimal performance, especially since mechanisms take up a large amount of
 the simulations' runtime budget. With some understanding of the subject matter,

From 13d5dd75e74fb4e499abfa8e12a61afd7e1f5c65 Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 17:21:08 +0100
Subject: [PATCH 10/17] Review.

---
 doc/fileformat/nmodl.rst | 72 ++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 33 deletions(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index fdebc99266..7acef395a5 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -195,21 +195,22 @@ as it can lead to vastly different model behavior.
 Tips for Faster NMODL
 ======================
 
-NMODL is a quirky language and not formally specified, which results in NMODL files
-existing NMODL being treated as difficult to understand and best left as-is. This in turn leads to
-sub-optimal performance, especially since mechanisms take up a large amount of
-the simulations' runtime budget. With some understanding of the subject matter,
-however, it is quite straightforward to obtain clean and performant NMODL files.
-We regularly have seen speed-ups factors from tuning NMODL of around three.
-
-First, let us discuss how NMODL becomes part of a simulation. NMODL mechanisms
-are given in ``.mod`` files, whose layout and syntax has been discussed above.
-These are compiled by ``modcc`` into a series of callbacks as specified by the
-:ref:`mechanism_abi`. These operate on data held in Arbor's internal storage. But,
-``modcc`` does not generate machine code, it goes through C++ (and/or CUDA) as
-an intermediary which is processed by a standard C++ compiler like GCC (or nvcc)
-to produce either a shared object (for external catalogues) and code directly
-linked into Arbor (the built-in catalogues).
+NMODL is a quirky language and not formally specified, which results in NMODL
+files existing NMODL being treated as difficult to understand and best left
+as-is. This in turn leads to sub-optimal performance, especially since
+mechanisms take up a large amount of the simulations' runtime budget. With some
+understanding of the subject matter, however, it is quite straightforward to
+obtain clean and performant NMODL files. We regularly have seen speed-ups
+factors of roughly three from optimising NMODL.
+
+First, let us discuss how NMODL becomes part of an Arbor simulation. NMODL
+mechanisms are given in ``.mod`` files, whose layout and syntax has been
+discussed above. These are compiled by ``modcc`` into a series of callbacks as
+specified by the :ref:`mechanism_abi`. These operate on data held in Arbor's
+internal storage. But, ``modcc`` does not generate machine code, it goes through
+C++ (and/or CUDA) as an intermediary which is processed by a standard C++
+compiler like GCC (or nvcc) to produce either a shared object (for external
+catalogues) and code directly linked into Arbor (the built-in catalogues).
 
 Now, we turn to a series of tips we found helpful in producing fast NMODL
 mechanisms.
@@ -219,25 +220,29 @@ mechanisms.
 
 Parameters and ``ASSIGNED`` variables marked as ``RANGE`` will be stored as an
 array with one entry per CV in Arbor. Reading and writing these incurs a memory
-access and thus affects cache and memory utilisation metrics. It is often
-cheaper to use ``LOCAL`` variables instead, even if that means foregoing the
-ability to re-use a comuted value. Compute is so much faster than memory on
+access and thus affects cache and memory utilisation metrics. It is often more
+efficient to use ``LOCAL`` variables instead, even if that means foregoing the
+ability to re-use a computed value. Compute is so much faster than memory on
 modern hardware that re-use at the expense of memory accesses is seldom
-profitable, except for the most complex terms.
+profitable, except for the most complex terms. ``LOCAL`` variables become just
+that in the generated code: a local variable that is likely residing in a
+register and used only as long as needed.
 
 ``PROCEDURE``
 -------------
 
 Prefer ``FUNCTION`` over ``PROCEDURE``. The latter *require* ``ASSIGNED RANGE``
-variables to return values and thus stress the memory system, which, as
-noted above, is not most efficient on current hardware. Also, they may not be inlined, as opposed to a ``FUNCTION``.
+variables to return values and thus stress the memory system, which, as noted
+above, is not most efficient on current hardware. Also, they may not be inlined,
+as opposed to a ``FUNCTION``.
 
 ```PARAMETER``
 --------------
 
 ``PARAMETER`` should only be used for values that must be set by the simulator.
-All fixed values should be ``CONSTANT`` instead. These can be inlined and folded
-allowing for further optimisations.
+All fixed values should be ``CONSTANT`` instead. These will be inlined by
+``modcc`` and propagated through the computations which can uncover more
+optimisation potential.
 
 Sharing Expressions Between ``INITIAL`` and ``BREAKPOINT`` or ``DERIVATIVE``
 ----------------------------------------------------------------------------
@@ -270,20 +275,21 @@ in a more compact way
 
    n' = (na - n)*nrho
 
-The latter code is much faster and neither ``modcc`` nor the external C++ might
-pick it up as an optimisations, largely due to floating point accuracy
+The latter code is faster. Neither ``modcc`` nor the external C++ are likely
+pick it up this optimisation, largely due to floating point accuracy
 requirements. This is less easy to see when partially hidden in a ``PROCEDURE``.
 
-The Conductance Trick
----------------------
+Complex Expressions in Current Computation
+------------------------------------------
 
 ``modcc``, Arbor's NMODL compiler, applies symbolic differentiation to the
-current expression to find the conductance as ``g = d U/d I`` which are then
+current expression to find the conductance as ``g = d I/d U`` which are then
 used to compute the voltage update. ``g`` is thus computed multiple times every
 timestep and if the corresponding expression is inefficient, it will cost more
-time than needed. The differentiation implementation is less than smart and will
-not optimise the resulting code. This is an internal detail of Arbor, but it can
-help to produce well performing code. Here is an example
+time than needed. The differentiation implementation quite naive and will not
+optimise the resulting expressions. This is an internal detail of Arbor and
+might change in the future, but for this particular optimisation can help to
+produce better performing code. Here is an example
 
 .. code::
 
@@ -296,8 +302,8 @@ help to produce well performing code. Here is an example
   i = g * (v - e)
 
 Note that we do not lose accuracy here, since Arbor does not support
-higher-order ODEs and thus will treat ``g(v)`` as if ``v`` is a constant across
-a single timestep.
+higher-order ODEs and thus will treat ``g`` as a constant across
+a single timestep even if ``g`` actually depends on ``v``.
 
 
 Small Tips and Micro-Optimisations

From 71367f395fddd34ac75f82833be6a8f33589592e Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 17:23:33 +0100
Subject: [PATCH 11/17] Warn NRN users

---
 doc/fileformat/nmodl.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 7acef395a5..5baf0e5fd7 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -213,7 +213,8 @@ compiler like GCC (or nvcc) to produce either a shared object (for external
 catalogues) and code directly linked into Arbor (the built-in catalogues).
 
 Now, we turn to a series of tips we found helpful in producing fast NMODL
-mechanisms.
+mechanisms. Note that if you are looking for help with NMODL in the context of
+NEURON this guide might not help.
 
 ``RANGE``
 ---------

From b7e9b9a9db420c257a8c876932aa2f08d00fcc4f Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 17:25:09 +0100
Subject: [PATCH 12/17] Remove quirky

---
 doc/fileformat/nmodl.rst | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 5baf0e5fd7..febe431dba 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -195,9 +195,10 @@ as it can lead to vastly different model behavior.
 Tips for Faster NMODL
 ======================
 
-NMODL is a quirky language and not formally specified, which results in NMODL
-files existing NMODL being treated as difficult to understand and best left
-as-is. This in turn leads to sub-optimal performance, especially since
+NMODL is a language without formal specification and many unexpected
+characteristics (many of which are not supported in Arbor), which results in
+NMODL files existing NMODL being treated as difficult to understand and best
+left as-is. This in turn leads to sub-optimal performance, especially since
 mechanisms take up a large amount of the simulations' runtime budget. With some
 understanding of the subject matter, however, it is quite straightforward to
 obtain clean and performant NMODL files. We regularly have seen speed-ups

From 7388c164822bedf85bdf1c7a7d433a4529f7e24e Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Fri, 18 Feb 2022 17:44:35 +0100
Subject: [PATCH 13/17] Fix borked suggestion commit.

---
 doc/fileformat/nmodl.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index febe431dba..53b85a87ab 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -197,8 +197,8 @@ Tips for Faster NMODL
 
 NMODL is a language without formal specification and many unexpected
 characteristics (many of which are not supported in Arbor), which results in
-NMODL files existing NMODL being treated as difficult to understand and best
-left as-is. This in turn leads to sub-optimal performance, especially since
+existing NMODL files being treated as difficult to understand and best left
+as-is. This in turn leads to sub-optimal performance, especially since
 mechanisms take up a large amount of the simulations' runtime budget. With some
 understanding of the subject matter, however, it is quite straightforward to
 obtain clean and performant NMODL files. We regularly have seen speed-ups

From ac76826e6fb90b45325a3b95c64eeb79339a0265 Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Mon, 21 Feb 2022 07:43:56 +0100
Subject: [PATCH 14/17] add exprelr.

---
 doc/fileformat/nmodl.rst | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 53b85a87ab..01e327265b 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -290,7 +290,7 @@ used to compute the voltage update. ``g`` is thus computed multiple times every
 timestep and if the corresponding expression is inefficient, it will cost more
 time than needed. The differentiation implementation quite naive and will not
 optimise the resulting expressions. This is an internal detail of Arbor and
-might change in the future, but for this particular optimisation can help to
+might change in the future, but for now this particular optimisation can help to
 produce better performing code. Here is an example
 
 .. code::
@@ -307,6 +307,29 @@ Note that we do not lose accuracy here, since Arbor does not support
 higher-order ODEs and thus will treat ``g`` as a constant across
 a single timestep even if ``g`` actually depends on ``v``.
 
+Specialised Functions
+---------------------
+
+Another common pattern is the use of a guarded exponential of the form
+
+.. code::
+
+   if (x != 1) {
+     r = x*exp(1 - x)
+   } else {
+     r = x
+   }
+
+This incurs some extra cost on most platforms. However, it can be written in
+Arbor's NMODL dialect as
+
+.. code::
+
+   exprelr(x)
+
+which is more efficient and has the same guarantees. NMODL files originating
+from NEURON often use this or related functions, e.g. ``vtrap(x, y) =
+y*exprelr(x/y)``.
 
 Small Tips and Micro-Optimisations
 ----------------------------------

From 893f2f9e6b21452ae3aca1a20e0a2bceb3d5bc5a Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Mon, 21 Feb 2022 10:49:13 +0100
Subject: [PATCH 15/17] Clarify optimisation comment.

---
 doc/fileformat/nmodl.rst | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 01e327265b..8a017c32cd 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -277,9 +277,11 @@ in a more compact way
 
    n' = (na - n)*nrho
 
-The latter code is faster. Neither ``modcc`` nor the external C++ are likely
-pick it up this optimisation, largely due to floating point accuracy
-requirements. This is less easy to see when partially hidden in a ``PROCEDURE``.
+The latter code is faster, but neither ``modcc`` nor the external C++ compiler
+will perform this optimisation (GCC/Clang *might* attempt it if asked to relax
+floating point accuracy with ``-ffast-math`` or ``-Ofast``. However, Arbor
+refrains from using this when compiling mechanism code.). This is less easy to
+see when partially hidden in a ``PROCEDURE``.
 
 Complex Expressions in Current Computation
 ------------------------------------------

From 8d843103ff4cb5471ba47f96767985e342744fb7 Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Mon, 21 Feb 2022 10:53:06 +0100
Subject: [PATCH 16/17] Footnote.

---
 doc/fileformat/nmodl.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index 8a017c32cd..f2670de882 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -267,7 +267,7 @@ heeded the advice above, you will likely see this patter emerge:
    n' = (ninf - n)/ntau
 
 Written out in this explicit way it becomes obvious that this can be expressed
-in a more compact way
+compactly as
 
 .. code::
 
@@ -278,11 +278,13 @@ in a more compact way
    n' = (na - n)*nrho
 
 The latter code is faster, but neither ``modcc`` nor the external C++ compiler
-will perform this optimisation (GCC/Clang *might* attempt it if asked to relax
-floating point accuracy with ``-ffast-math`` or ``-Ofast``. However, Arbor
-refrains from using this when compiling mechanism code.). This is less easy to
+will perform this optimisation [#]_. This is less easy to
 see when partially hidden in a ``PROCEDURE``.
 
+.. [#] GCC/Clang *might* attempt it if asked to relax floating point accuracy
+       with ``-ffast-math`` or ``-Ofast``. However, Arbor refrains from using
+       this option when compiling mechanism code.
+
 Complex Expressions in Current Computation
 ------------------------------------------
 

From 1b3d16cfb8fabe13c060deab8c3fbdf10086a97b Mon Sep 17 00:00:00 2001
From: Thorsten Hater <24411438+thorstenhater@users.noreply.github.com>
Date: Mon, 21 Feb 2022 15:18:42 +0100
Subject: [PATCH 17/17] Tabs and Typos.

---
 doc/fileformat/nmodl.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/fileformat/nmodl.rst b/doc/fileformat/nmodl.rst
index f2670de882..f1f795ce37 100644
--- a/doc/fileformat/nmodl.rst
+++ b/doc/fileformat/nmodl.rst
@@ -261,7 +261,7 @@ heeded the advice above, you will likely see this patter emerge:
 
    na   = n_alpha()
    nb   = n_beta()
-	 ntau = 1/(na + nb)
+   ntau = 1/(na + nb)
    ninf = na*ntau
 
    n' = (ninf - n)/ntau
@@ -273,9 +273,9 @@ compactly as
 
    na   = n_alpha()
    nb   = n_beta()
-	 nrho = na + nb
+   nrho = na + nb
 
-   n' = (na - n)*nrho
+   n' = na - n*nrho
 
 The latter code is faster, but neither ``modcc`` nor the external C++ compiler
 will perform this optimisation [#]_. This is less easy to