callbacks.tex

\chapter{Callbacks}\label{s:callbacks}

JavaScript relies heavily on \gref{g:callback-function}{callback functions}:
Instead of a function giving us a result immediately,
we give it another function that tells it what to do next.
Many other languages use them as well,
but JavaScript is often the first place that programmers with data science backgrounds encounter them.
In order to understand how they work and how to use them,
we must first understand what actually happens when functions are defined and called.

\section{The Call Stack}\label{s:callbacks-callstack}

When JavaScript \gref{g:parse}{parses} the expression \texttt{let\ name\ =\ "text"},
it allocates a block of memory big enough for four characters
and stores a reference to that block of characters in the variable \texttt{name}.
We can show this by drawing a \gref{g:memory-diagram}{memory diagram}
like the one in \figref{f:callbacks-name-value}.

\figpdf{figures/callbacks-name-value.pdf}{Name and Value}{f:callbacks-name-value}

When we write:

\begin{minted}{js}
oneMore = (x) => {
  return x + 1
}
\end{minted}

JavaScript allocates a block of memory big enough to store several instructions,
translates the text of the function into instructions,
and stores a reference to those instructions in the variable \texttt{oneMore}
(\figref{f:callbacks-one-more}).

\figpdf{figures/callbacks-one-more.pdf}{Functions in Memory}{f:callbacks-one-more}

The only difference between these two cases is what's on the other end of the reference:
four characters or a bunch of instructions that add one to a number.
This means that we can assign the function to another variable,
just as we would assign a number:

\begin{minted}{js}
const anotherName = oneMore
console.log(anotherName(5))
\end{minted}

\begin{minted}{text}
6
\end{minted}

Doing this does \emph{not} call the function:
as \figref{f:callbacks-alias-function} shows,
it creates a second name that refers to the same block of instructions.

\figpdf{figures/callbacks-alias-function.pdf}{Aliasing a Function}{f:callbacks-alias-function}

As explained in \chapref{s:basics},
when JavaScript calls a function it assigns the arguments in the call to the function's parameters.
In order for this to be safe,
we need to ensure that there are no \gref{g:name-collision}{name collisions},
i.e.,
that if there is a variable called \texttt{something} and one of the function's parameters is also called \texttt{something},
the function will use the right one.
The way every modern language implements this is to use a \gref{g:call-stack}{call stack}.
Instead of putting all our variables in one big table,
we have one table for global variables
and one extra table for each function call.
This means that if we assign 100 to \texttt{x},
call \texttt{oneMore(2\ *\ x\ +\ 1)},
and look at memory in the middle of that call,
we will see what's in \figref{f:callbacks-call-stack}.

\figpdf{figures/callbacks-call-stack.pdf}{The Call Stack}{f:callbacks-call-stack}

\section{Functions of Functions}\label{s:callbacks-func}

The call stack allows us to write and call functions
without worrying about whether we're accidentally going to refer to the wrong variable.
And since functions are just another kind of data,
we can pass one function into another.
For example,
we can write a function called \texttt{doTwice} that calls some other function two times:

\begin{minted}{js}
const doTwice = (action) => {
  action()
  action()
}

const hello = () => {
  console.log('hello')
}

doTwice(hello)
\end{minted}

\begin{minted}{text}
hello
hello
\end{minted}

Again,
this is clearer when we look at the state of memory while \texttt{doTwice} is running
(\figref{f:callbacks-do-twice}).

\figpdf{figures/callbacks-do-twice.pdf}{Functions of Functions}{f:callbacks-do-twice}

This becomes more useful when the function or functions passed in have parameters of their own.
For example,
the function \texttt{pipeline} passes a value to one function,
then takes that function's result and passes it to a second,
and returns the final result:

\begin{minted}{js}
const pipeline = (initial, first, second) => {
  return second(first(initial))
}
\end{minted}

Let's use this to combine
a function that trims blanks off the starts and ends of strings
and another function that replaces spaces with dots:

\begin{minted}{js}
const trim = (text) => { return text.trim() }
const dot = (text) => { return text.replace(/ /g, '.') }

const original = '  this example uses text  '

const trimThenDot = pipeline(original, trim, dot)
console.log(`trim then dot: |${trimThenDot}|`)
\end{minted}

\begin{minted}{text}
trim then dot: |this.example.uses.text|
\end{minted}

During the call to \texttt{temp\ =\ first(initial)},
but before a value has been returned to be assigned to \texttt{temp},
memory looks like \figref{f:callbacks-pipeline}.

\figpdf{figures/callbacks-pipeline.pdf}{Implementing a Pipeline}{f:callbacks-pipeline}

Reversing the order of the functions changes the result:

\begin{minted}{js}
const dotThenTrim = pipeline(original, dot, trim)
console.log(`dot then trim: |${dotThenTrim}|`)
\end{minted}

\begin{minted}{text}
dot then trim: |..this.example.uses.text..|
\end{minted}

We can make a more general pipeline by passing an array of functions:

\begin{minted}{js}
const pipeline = (initial, operations) => {
  let current = initial
  for (let op of operations) {
    current = op(current)
  }
  return current
}
\end{minted}

Let's add a function \texttt{double} to our suite of text manglers:

\begin{minted}{js}
const double = (text) => { return text + text }
\end{minted}

and then try it out:

\begin{minted}{js}
const original = ' some text '
const final = pipeline(original, [double, trim, dot])
console.log(`|${original}| -> |${final}|`)
\end{minted}

\begin{minted}{text}
| some text | -> |some.text..some.text|
\end{minted}

\section{Anonymous Functions}\label{s:callbacks-anonymous}

Remember the function \texttt{oneMore}?
We can pass it a value that we have calculated on the fly:

\begin{minted}{js}
oneMore = (x) => {
  return x + 1
}

console.log(oneMore(3 * 2))
\end{minted}

\begin{minted}{text}
7
\end{minted}

Behind the scenes,
JavaScript allocates a nameless temporary variable to hold the value of \texttt{3\ *\ 2},
then passes a reference to that temporary variable into \texttt{oneMore}.
We can do the same thing with functions,
i.e., create one on the fly without giving it a name as we're passing it into some other function.
For example,
suppose that instead of pushing one value through a pipeline of functions,
we want to call a function once for each value in an array:

\begin{minted}{js}
const transform = (values, operation) => {
  let result = []
  for (let v of values) {
    result.push(operation(v))
  }
  return result
}

const data = ['one', 'two', 'three']
const upper = transform(data, (x) => { return x.toUpperCase() })
console.log(`upper: ${upper}`)
\end{minted}

\begin{minted}{text}
upper: ONE,TWO,THREE
\end{minted}

Taking the first letter of a word is so simple that it's hardly worth giving the function a name,
so let's define it on the fly:

\begin{minted}{js}
const first = transform(data, (x) => { return x[0] })
console.log(`first: ${first}`)
\end{minted}

\begin{minted}{text}
first: o,t,t
\end{minted}

A function that is created this way is sometimes called an \gref{g:anonymous-function}{anonymous function},
since its creator doesn't give it a name.
When JavaScript programmers use the term ``callback function'',
they usually mean a function defined and used like this.

\section{Functional Programming}\label{s:callbacks-functional}

\gref{g:functional-programming}{Functional programming} is a style of programming
that relies heavily on \gref{g:higher-order-function}{higher-order functions} like \texttt{pipeline}
that take other functions as parameters.
In addition,
functional programming expects that functions won't modify data in place,
but will instead create new data from old.
For example,
a true believer in functional programming would be saddened by this:

\begin{minted}{js}
const impure = (values) => {
  for (let i in values) {
    values[i] += 1
  }
}
\end{minted}

\noindent
and would politely, even patiently, suggest that it be rewritten like this:

\begin{minted}{js}
const pure = (values) -> {
  result = []
  for (let v of values) {
    result.push(v + 1)
  }
  return result
}
\end{minted}

JavaScript arrays provide several methods to support functional programming.
For example,
\texttt{Array.some} returns \texttt{true} if \emph{any} element in an array passes a test,
while \texttt{Array.every} returns \texttt{true} if \emph{all} elements in an array pass a test.

Here's how they work:

\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
console.log('some longer than 3:',
            data.some((x) => { return x.length > 3 }))
console.log('all longer than 3:',
            data.every((x) => { return x.length > 3 }))
\end{minted}

\begin{minted}{text}
some longer than 3: true
all longer than 3: false
\end{minted}

\texttt{Array.filter} creates a new array containing only values that pass a test:

\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
console.log('those longer than 3:',
            data.filter((x) => { return x.length > 3 }))
\end{minted}

\begin{minted}{text}
those longer than 3: [ 'this', 'test' ]
\end{minted}

So do all of the elements with more than 3 characters start with a `t'?

\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
const result = data
               .filter((x) => { return x.length > 3 })
               .every((x) => { return x[0] === 't' })
console.log(`all longer than 3 start with t: ${result}`)
\end{minted}

\begin{minted}{text}
all longer than 3 start with t: true
\end{minted}

\texttt{Array.map} creates a new array by calling a function for each element of an existing array:

\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
console.log('shortened', data.map((x) => { return x.slice(0, 2) }))
\end{minted}

\begin{minted}{text}
shortened [ 'th', 'is', 'a', 'te' ]
\end{minted}

And finally,
\texttt{Array.reduce} reduces an array to a single value
using a combining function and a starting value.
The combining function must take two values,
which are the current running total and the next value from the array;
if the array is empty,
\texttt{Array.reduce} returns the starting value.

\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']

const concatFirst = (accumulator, nextValue) => {
  return accumulator + nextValue[0]
}
let acronym = data.reduce(concatFirst, '')
console.log(`acronym of ${data} is ${acronym}`)

// In one step.
acronym = data.reduce((accum, next) => {
  return accum + next[0]
}, '')
console.log('all in one step:', acronym)
\end{minted}

\begin{minted}{text}
acronym of this,is,a,test is tiat
all in one step: tiat
\end{minted}

The indentation of the ``in one step'' call may look a little odd,
but this is the style the JavaScript community has settled on.

\section{Closures}\label{s:callbacks-closures}

The last tool we need to introduce is an extremely useful side-effect of the way memory is handled
called a \gref{g:closure}{closure}.
The easiest way to explain it is by example.
We have already defined a function called \texttt{pipeline} that chains any number of other functions together:

\begin{minted}{js}
const pipeline = (initial, operations) => {
  let current = initial
  for (let op of operations) {
    current = op(current)
  }
  return current
}
\end{minted}

However,
\texttt{pipeline} only works if each function in the array \texttt{operations} has a single parameter.
If we want to be able to add 1,
add 2,
and so on,
we have to write separate functions,
which is annoying.

A better option is to write a function that creates the function we want:

\begin{minted}{js}
const adder = (increment) => {
  const f = (value) => {
    return value + increment
  }
  return f
}

const add_1 = adder(1)
const add_2 = adder(2)
console.log(`add_1(100) is ${add_1(100)}, add_2(100) is ${add_2(100)}`)
\end{minted}

\begin{minted}{text}
add_1(100) is 101, add_2(100) is 102
\end{minted}

The best way to understand what's going on is to draw a step-by-step memory diagram.
In step 1, we call \texttt{adder(1)}
(\figref{f:callbacks-adder-1}).
\texttt{adder} creates a new function that includes a reference to that 1 we just passed in
(\figref{f:callbacks-adder-2}).
In step 3,
\texttt{adder} returns that function, which is assigned to \texttt{add\_1}
(\figref{f:callbacks-adder-3}).
Crucially,
the function that \texttt{add\_1} refers to still has a reference to the value 1,
even though that value isn't referred to any longer by anyone else.

\figpdf{figures/callbacks-adder-1.pdf}{Creating an Adder (Step 1)}{f:callbacks-adder-1}

\figpdf{figures/callbacks-adder-2.pdf}{Creating an Adder (Step 2)}{f:callbacks-adder-2}

\figpdf{figures/callbacks-adder-3.pdf}{Creating an Adder (Step 3)}{f:callbacks-adder-3}

In steps 4-6,
we repeat these three steps to create another function that has a reference to the value 2,
and assign that function to \texttt{add\_2}
(\figref{f:callbacks-adder-4}).

\figpdf{figures/callbacks-adder-4.pdf}{Creating an Adder (Steps 4-6)}{f:callbacks-adder-4}

When we now call \texttt{add\_1} or \texttt{add\_2},
they add the value passed in and the value they've kept a reference to.

This trick of capturing a reference to a value inside something else
is called a \gref{g:closure}{closure}.
It works because JavaScript holds on to values as long as anything,
anywhere,
still refers to them.
Closures solve our pipeline problem by letting us define little functions
on the fly
and give them extra data to work with:

\begin{minted}{js}
const result = pipeline(100, [adder(1), adder(2)])
console.log(`adding 1 and 2 to 100 -> ${result}`)
\end{minted}

\begin{minted}{text}
adding 1 and 2 to 100 -> 103
\end{minted}

Again, \texttt{adder(1)} and \texttt{adder(2)} do not add anything to anything:
they define new (unnamed) functions that add 1 and 2 respectively when called.

Programmers often go one step further and define little functions like this inline:

\begin{minted}{js}
const result = pipeline(100, [(x) => x + 1, (x) => x + 2])
console.log(`adding 1 and 2 to 100 -> ${result}`)
\end{minted}

\begin{minted}{text}
adding 1 and 2 to 100 -> 103
\end{minted}

As this example shows,
if the body of a function is a single expression,
it doesn't have to be enclosed in \texttt{\{...\}} and \texttt{return} doesn't need to be used.

\section{Exercises}\label{s:callbacks-exercises}

\exercise{Side Effects With \texttt{forEach}}

JavaScript arrays have a method called \texttt{forEach},
which calls a callback function once for each element of the array.
Unlike \texttt{map},
\texttt{forEach} does \emph{not} save the values returned by these calls
or return an array of results.
The full syntax is:

\begin{minted}{js}
someArray.forEach((value, location, container) => {
  // 'value' is the value in 'someArray'
  // 'location' is the index of that value
  // 'container' is the containing array (in this case, 'someArray')
})
\end{minted}

If you only need the value,
you can provide a callback that only takes one parameter;
if you only need the value and its location,
you can provide a callback that takes two.

Use this to write a function \texttt{doubleInPlace}
that doubles all the values in an array in place:

\begin{minted}{js}
const vals = [1, 2, 3]
doubleInPlace(vals)
console.log(`vals after change: ${vals}`)
\end{minted}

\begin{minted}{text}
vals after change: 2,4,6
\end{minted}

\exercise{Annotating Data}

Given an array of objects representing observations of wild animals:

\begin{minted}{js}
data = [
  {'date': '1977-7-16', 'sex': 'M', 'species': 'NL'},
  {'date': '1977-7-16', 'sex': 'M', 'species': 'NL'},
  {'date': '1977-7-16', 'sex': 'F', 'species': 'DM'},
  {'date': '1977-7-16', 'sex': 'M', 'species': 'DM'},
  {'date': '1977-7-16', 'sex': 'M', 'species': 'DM'},
  {'date': '1977-7-16', 'sex': 'M', 'species': 'PF'},
  {'date': '1977-7-16', 'sex': 'F', 'species': 'PE'},
  {'date': '1977-7-16', 'sex': 'M', 'species': 'DM'}
]
\end{minted}

\noindent
write a function that returns a new array of objects like this:

\begin{minted}{js}
newData = [
  {'seq': 3, 'year': '1977', 'sex': 'F', 'species': 'DM'},
  {'seq': 7, 'year': '1977', 'sex': 'F', 'species': 'PE'}
]
\end{minted}

\emph{without} using any loops.
The changes are:

\begin{itemize}
\item
  The \texttt{date} field is replaced with just the `year.
\item
  Only observations of female animals are retained.
\item
  The retained records are given sequence numbers to relate them back to the original data.
  (These sequence numbers are 1-based rather than 0-based.)
\end{itemize}

You will probably want to use \texttt{Array.reduce} to generate the sequence numbers.

\section*{Key Points}

\input{keypoints/callbacks}