Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation on faster NMODL. #1840

Merged
merged 17 commits into from
Feb 24, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions doc/fileformat/nmodl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -191,3 +191,120 @@ contain full ``segments``).

Modelers are encouraged to verify the expected behavior of the reversal potentials of ions
as it can lead to vastly different model behavior.

Tips for Faster NMODL
======================

NMODL is a quirky language and not well specified, which results in NMODL files
being treated as un-understandable and untouchable. This in turn leads to
sub-optimal performance, especially since mechanisms take up a large amount of
the simulations' runtime budget. With some understanding of the subject matter,
however, it is quite straightforward to obtain clean and performant NMODL files.
We regularly have seen speed-ups factors from tuning NMODL of around three.

First, let us discuss how NMODL becomes part of a simulation. NMODL mechanisms
are given in ``.mod`` files, whose layout and syntax has been discussed above.
These are compiled by ``modcc`` into a series of callbacks as specified by the
mechanisms ABI. These operate on data held in Arbor's internal storage. But,
``modcc`` does not generate machine code, it goes through C++ (and/or CUDA) as
an intermediary which is processed by a standard C++ compiler like GCC (or nvcc)
to produce either a shared object (for external catalogues) and code directly
linked into Arbor (the built-in catalogues).

Now, we turn to a series of tips we found helpful in producing fast NMODL
mechanisms.

``RANGE``
---------

Parameters and ``ASSIGNED`` variables marked as ``RANGE`` will be stored as an
array with one entry per CV in Arbor. Reading and writing these incurs a memory
access and thus affects cache and memory utilisation metrics. It is often
cheaper to use ``LOCAL`` variables instead, even if that means foregoing the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cheaper to use ``LOCAL`` variables instead, even if that means foregoing the
computationally cheaper to use ``LOCAL`` variables instead, even if that means foregoing the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not 'computationally' cheaper per-se. Actually it's worse, but cache/memory use makes more than up for it.
That's basically the core tenant of this guide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'cheaper' is a bit jargonish, hence attempt at making it more specific.

ability to re-use a comuted value. Compute is so much faster than memory on
Copy link
Contributor

@brenthuisman brenthuisman Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ability to re-use a comuted value. Compute is so much faster than memory on
ability to re-use a computed value. Compute is faster than memory on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'so' here is part of the 'X is so Y that Z' structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Computed'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have too much faith in my spell checker.

modern hardware that re-use at the expense of memory accesses is seldom
profitable, except for the most complex terms.

``PROCEDURE``
-------------

Prefer ``FUNCTION`` over ``PROCEDURE``. The latter *require* ``ASSIGNED RANGE``
variables to return values and thus stress the memory system for little gain, as
noted above. Also, they may not be inlined, as opposed to a ``FUNCTION``.

```PARAMETER``
--------------

``PARAMETER`` should only be used for values that must be set by the simulator.
All fixed values should be ``CONSTANT`` instead. These can be inlined and folded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
All fixed values should be ``CONSTANT`` instead. These can be inlined and folded
All fixed values should be ``CONSTANT`` instead. These can be inlined,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constant Folding is a thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jargon. Can we clarify or link to an explanation?

into the computations, allowing for further optimisations.

Sharing Expressions Between ``INITIAL`` and ``BREAKPOINT`` or ``DERIVATIVE``
----------------------------------------------------------------------------

This is often done using a ``PROCEDURE``, which we know is inefficient. On top,
this ``PROCEDURE`` will likely compute more outputs than strictly needed to
accomodate both blocks. DRY code is a good idea nevertheless, so use a series of
``FUNCTION`` instead to compute common expressions.

This leads naturally to a common optimisation in H-H style ion channels. If you
heeded the advice above, you will likely see this patter emerge:

.. code::

na = n_alpha()
nb = n_beta()
ntau = 1/(na + nb)
ninf = na*ntau

n' = (ninf - n)/ntau

Written out in this explicit way it becomes obvious that this can be expressed
in a more compact way

.. code::

na = n_alpha()
nb = n_beta()
nrho = na + nb

n' = (na - n)*nrho

The latter code is much faster and neither ``modcc`` nor the external C++ might
pick it up as an optimisations, largely due to floating point accuracy
requirements. This is less easy to see when partially hidden in a ``PROCEDURE``.

The Conductance Trick
---------------------

``modcc``, Arbor's NMODL compiler, applies symbolic differentiation to the
current expression to find the conductance as ``g = d U/d I`` which are then
used to compute the voltage update. ``g`` is thus computed multiple times every
timestep and if the corresponding expression is inefficient, it will cost more
time than needed. The differentiation implementation is less than smart and will
not optimise the resulting code. This is an internal detail of Arbor, but it can
help to produce well performing code. Here is an example

.. code::

: BAD, will compute m^4 * h every step
i = m^4 * h * (v - e)

: GOOD, will just use a constant value of g
LOCAL g
g = m^4 * h
i = g * (v - e)

Note that we do not lose accuracy here, since Arbor does not support
higher-order ODEs and thus will treat ``g(v)`` as if ``v`` is a constant across
a single timestep.


Small Tips and Micro-Optimisations
----------------------------------

- Divisions cost a bit more than multiplications and additions.
- ``m * m`` is more efficient than ``m^2``. This holds for higher powers as well
and if you want to squeeze out the utmost of performance use
exponentiation-by-squaring. (Although GCC does this for you. Most of the
time.)
9 changes: 4 additions & 5 deletions mechanisms/default/hh.mod
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,12 @@ ASSIGNED { q10 }

BREAKPOINT {
SOLVE states METHOD cnexp
LOCAL gk, m_, n_, n2
LOCAL gk, gna, n2

n_ = n
m_ = m
n2 = n_*n_
n2 = n*n
gk = gkbar*n2*n2
ina = gnabar*m_*m_*m_*h*(v - ena)
gna = gnabar*m*m*m*h
ina = gna*(v - ena)
ik = gk*(v - ek)
il = gl*(v - el)
}
Expand Down