forked from software-tools-books/js4ds
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcallbacks.tex
555 lines (435 loc) · 16.5 KB
/
callbacks.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
\chapter{Callbacks}\label{s:callbacks}
JavaScript relies heavily on \gref{g:callback-function}{callback functions}:
Instead of a function giving us a result immediately,
we give it another function that tells it what to do next.
Many other languages use them as well,
but JavaScript is often the first place that programmers with data science backgrounds encounter them.
In order to understand how they work and how to use them,
we must first understand what actually happens when functions are defined and called.
\section{The Call Stack}\label{s:callbacks-callstack}
When JavaScript \gref{g:parse}{parses} the expression \texttt{let\ name\ =\ "text"},
it allocates a block of memory big enough for four characters
and stores a reference to that block of characters in the variable \texttt{name}.
We can show this by drawing a \gref{g:memory-diagram}{memory diagram}
like the one in \figref{f:callbacks-name-value}.
\figpdf{figures/callbacks-name-value.pdf}{Name and Value}{f:callbacks-name-value}
When we write:
\begin{minted}{js}
oneMore = (x) => {
return x + 1
}
\end{minted}
JavaScript allocates a block of memory big enough to store several instructions,
translates the text of the function into instructions,
and stores a reference to those instructions in the variable \texttt{oneMore}
(\figref{f:callbacks-one-more}).
\figpdf{figures/callbacks-one-more.pdf}{Functions in Memory}{f:callbacks-one-more}
The only difference between these two cases is what's on the other end of the reference:
four characters or a bunch of instructions that add one to a number.
This means that we can assign the function to another variable,
just as we would assign a number:
\begin{minted}{js}
const anotherName = oneMore
console.log(anotherName(5))
\end{minted}
\begin{minted}{text}
6
\end{minted}
Doing this does \emph{not} call the function:
as \figref{f:callbacks-alias-function} shows,
it creates a second name that refers to the same block of instructions.
\figpdf{figures/callbacks-alias-function.pdf}{Aliasing a Function}{f:callbacks-alias-function}
As explained in \chapref{s:basics},
when JavaScript calls a function it assigns the arguments in the call to the function's parameters.
In order for this to be safe,
we need to ensure that there are no \gref{g:name-collision}{name collisions},
i.e.,
that if there is a variable called \texttt{something} and one of the function's parameters is also called \texttt{something},
the function will use the right one.
The way every modern language implements this is to use a \gref{g:call-stack}{call stack}.
Instead of putting all our variables in one big table,
we have one table for global variables
and one extra table for each function call.
This means that if we assign 100 to \texttt{x},
call \texttt{oneMore(2\ *\ x\ +\ 1)},
and look at memory in the middle of that call,
we will see what's in \figref{f:callbacks-call-stack}.
\figpdf{figures/callbacks-call-stack.pdf}{The Call Stack}{f:callbacks-call-stack}
\section{Functions of Functions}\label{s:callbacks-func}
The call stack allows us to write and call functions
without worrying about whether we're accidentally going to refer to the wrong variable.
And since functions are just another kind of data,
we can pass one function into another.
For example,
we can write a function called \texttt{doTwice} that calls some other function two times:
\begin{minted}{js}
const doTwice = (action) => {
action()
action()
}
const hello = () => {
console.log('hello')
}
doTwice(hello)
\end{minted}
\begin{minted}{text}
hello
hello
\end{minted}
Again,
this is clearer when we look at the state of memory while \texttt{doTwice} is running
(\figref{f:callbacks-do-twice}).
\figpdf{figures/callbacks-do-twice.pdf}{Functions of Functions}{f:callbacks-do-twice}
This becomes more useful when the function or functions passed in have parameters of their own.
For example,
the function \texttt{pipeline} passes a value to one function,
then takes that function's result and passes it to a second,
and returns the final result:
\begin{minted}{js}
const pipeline = (initial, first, second) => {
return second(first(initial))
}
\end{minted}
Let's use this to combine
a function that trims blanks off the starts and ends of strings
and another function that replaces spaces with dots:
\begin{minted}{js}
const trim = (text) => { return text.trim() }
const dot = (text) => { return text.replace(/ /g, '.') }
const original = ' this example uses text '
const trimThenDot = pipeline(original, trim, dot)
console.log(`trim then dot: |${trimThenDot}|`)
\end{minted}
\begin{minted}{text}
trim then dot: |this.example.uses.text|
\end{minted}
During the call to \texttt{temp\ =\ first(initial)},
but before a value has been returned to be assigned to \texttt{temp},
memory looks like \figref{f:callbacks-pipeline}.
\figpdf{figures/callbacks-pipeline.pdf}{Implementing a Pipeline}{f:callbacks-pipeline}
Reversing the order of the functions changes the result:
\begin{minted}{js}
const dotThenTrim = pipeline(original, dot, trim)
console.log(`dot then trim: |${dotThenTrim}|`)
\end{minted}
\begin{minted}{text}
dot then trim: |..this.example.uses.text..|
\end{minted}
We can make a more general pipeline by passing an array of functions:
\begin{minted}{js}
const pipeline = (initial, operations) => {
let current = initial
for (let op of operations) {
current = op(current)
}
return current
}
\end{minted}
Let's add a function \texttt{double} to our suite of text manglers:
\begin{minted}{js}
const double = (text) => { return text + text }
\end{minted}
and then try it out:
\begin{minted}{js}
const original = ' some text '
const final = pipeline(original, [double, trim, dot])
console.log(`|${original}| -> |${final}|`)
\end{minted}
\begin{minted}{text}
| some text | -> |some.text..some.text|
\end{minted}
\section{Anonymous Functions}\label{s:callbacks-anonymous}
Remember the function \texttt{oneMore}?
We can pass it a value that we have calculated on the fly:
\begin{minted}{js}
oneMore = (x) => {
return x + 1
}
console.log(oneMore(3 * 2))
\end{minted}
\begin{minted}{text}
7
\end{minted}
Behind the scenes,
JavaScript allocates a nameless temporary variable to hold the value of \texttt{3\ *\ 2},
then passes a reference to that temporary variable into \texttt{oneMore}.
We can do the same thing with functions,
i.e., create one on the fly without giving it a name as we're passing it into some other function.
For example,
suppose that instead of pushing one value through a pipeline of functions,
we want to call a function once for each value in an array:
\begin{minted}{js}
const transform = (values, operation) => {
let result = []
for (let v of values) {
result.push(operation(v))
}
return result
}
const data = ['one', 'two', 'three']
const upper = transform(data, (x) => { return x.toUpperCase() })
console.log(`upper: ${upper}`)
\end{minted}
\begin{minted}{text}
upper: ONE,TWO,THREE
\end{minted}
Taking the first letter of a word is so simple that it's hardly worth giving the function a name,
so let's define it on the fly:
\begin{minted}{js}
const first = transform(data, (x) => { return x[0] })
console.log(`first: ${first}`)
\end{minted}
\begin{minted}{text}
first: o,t,t
\end{minted}
A function that is created this way is sometimes called an \gref{g:anonymous-function}{anonymous function},
since its creator doesn't give it a name.
When JavaScript programmers use the term ``callback function'',
they usually mean a function defined and used like this.
\section{Functional Programming}\label{s:callbacks-functional}
\gref{g:functional-programming}{Functional programming} is a style of programming
that relies heavily on \gref{g:higher-order-function}{higher-order functions} like \texttt{pipeline}
that take other functions as parameters.
In addition,
functional programming expects that functions won't modify data in place,
but will instead create new data from old.
For example,
a true believer in functional programming would be saddened by this:
\begin{minted}{js}
const impure = (values) => {
for (let i in values) {
values[i] += 1
}
}
\end{minted}
\noindent
and would politely, even patiently, suggest that it be rewritten like this:
\begin{minted}{js}
const pure = (values) -> {
result = []
for (let v of values) {
result.push(v + 1)
}
return result
}
\end{minted}
JavaScript arrays provide several methods to support functional programming.
For example,
\texttt{Array.some} returns \texttt{true} if \emph{any} element in an array passes a test,
while \texttt{Array.every} returns \texttt{true} if \emph{all} elements in an array pass a test.
Here's how they work:
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
console.log('some longer than 3:',
data.some((x) => { return x.length > 3 }))
console.log('all longer than 3:',
data.every((x) => { return x.length > 3 }))
\end{minted}
\begin{minted}{text}
some longer than 3: true
all longer than 3: false
\end{minted}
\texttt{Array.filter} creates a new array containing only values that pass a test:
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
console.log('those longer than 3:',
data.filter((x) => { return x.length > 3 }))
\end{minted}
\begin{minted}{text}
those longer than 3: [ 'this', 'test' ]
\end{minted}
So do all of the elements with more than 3 characters start with a `t'?
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
const result = data
.filter((x) => { return x.length > 3 })
.every((x) => { return x[0] === 't' })
console.log(`all longer than 3 start with t: ${result}`)
\end{minted}
\begin{minted}{text}
all longer than 3 start with t: true
\end{minted}
\texttt{Array.map} creates a new array by calling a function for each element of an existing array:
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
console.log('shortened', data.map((x) => { return x.slice(0, 2) }))
\end{minted}
\begin{minted}{text}
shortened [ 'th', 'is', 'a', 'te' ]
\end{minted}
And finally,
\texttt{Array.reduce} reduces an array to a single value
using a combining function and a starting value.
The combining function must take two values,
which are the current running total and the next value from the array;
if the array is empty,
\texttt{Array.reduce} returns the starting value.
\begin{minted}{js}
const data = ['this', 'is', 'a', 'test']
const concatFirst = (accumulator, nextValue) => {
return accumulator + nextValue[0]
}
let acronym = data.reduce(concatFirst, '')
console.log(`acronym of ${data} is ${acronym}`)
// In one step.
acronym = data.reduce((accum, next) => {
return accum + next[0]
}, '')
console.log('all in one step:', acronym)
\end{minted}
\begin{minted}{text}
acronym of this,is,a,test is tiat
all in one step: tiat
\end{minted}
The indentation of the ``in one step'' call may look a little odd,
but this is the style the JavaScript community has settled on.
\section{Closures}\label{s:callbacks-closures}
The last tool we need to introduce is an extremely useful side-effect of the way memory is handled
called a \gref{g:closure}{closure}.
The easiest way to explain it is by example.
We have already defined a function called \texttt{pipeline} that chains any number of other functions together:
\begin{minted}{js}
const pipeline = (initial, operations) => {
let current = initial
for (let op of operations) {
current = op(current)
}
return current
}
\end{minted}
However,
\texttt{pipeline} only works if each function in the array \texttt{operations} has a single parameter.
If we want to be able to add 1,
add 2,
and so on,
we have to write separate functions,
which is annoying.
A better option is to write a function that creates the function we want:
\begin{minted}{js}
const adder = (increment) => {
const f = (value) => {
return value + increment
}
return f
}
const add_1 = adder(1)
const add_2 = adder(2)
console.log(`add_1(100) is ${add_1(100)}, add_2(100) is ${add_2(100)}`)
\end{minted}
\begin{minted}{text}
add_1(100) is 101, add_2(100) is 102
\end{minted}
The best way to understand what's going on is to draw a step-by-step memory diagram.
In step 1, we call \texttt{adder(1)}
(\figref{f:callbacks-adder-1}).
\texttt{adder} creates a new function that includes a reference to that 1 we just passed in
(\figref{f:callbacks-adder-2}).
In step 3,
\texttt{adder} returns that function, which is assigned to \texttt{add\_1}
(\figref{f:callbacks-adder-3}).
Crucially,
the function that \texttt{add\_1} refers to still has a reference to the value 1,
even though that value isn't referred to any longer by anyone else.
\figpdf{figures/callbacks-adder-1.pdf}{Creating an Adder (Step 1)}{f:callbacks-adder-1}
\figpdf{figures/callbacks-adder-2.pdf}{Creating an Adder (Step 2)}{f:callbacks-adder-2}
\figpdf{figures/callbacks-adder-3.pdf}{Creating an Adder (Step 3)}{f:callbacks-adder-3}
In steps 4-6,
we repeat these three steps to create another function that has a reference to the value 2,
and assign that function to \texttt{add\_2}
(\figref{f:callbacks-adder-4}).
\figpdf{figures/callbacks-adder-4.pdf}{Creating an Adder (Steps 4-6)}{f:callbacks-adder-4}
When we now call \texttt{add\_1} or \texttt{add\_2},
they add the value passed in and the value they've kept a reference to.
This trick of capturing a reference to a value inside something else
is called a \gref{g:closure}{closure}.
It works because JavaScript holds on to values as long as anything,
anywhere,
still refers to them.
Closures solve our pipeline problem by letting us define little functions
on the fly
and give them extra data to work with:
\begin{minted}{js}
const result = pipeline(100, [adder(1), adder(2)])
console.log(`adding 1 and 2 to 100 -> ${result}`)
\end{minted}
\begin{minted}{text}
adding 1 and 2 to 100 -> 103
\end{minted}
Again, \texttt{adder(1)} and \texttt{adder(2)} do not add anything to anything:
they define new (unnamed) functions that add 1 and 2 respectively when called.
Programmers often go one step further and define little functions like this inline:
\begin{minted}{js}
const result = pipeline(100, [(x) => x + 1, (x) => x + 2])
console.log(`adding 1 and 2 to 100 -> ${result}`)
\end{minted}
\begin{minted}{text}
adding 1 and 2 to 100 -> 103
\end{minted}
As this example shows,
if the body of a function is a single expression,
it doesn't have to be enclosed in \texttt{\{...\}} and \texttt{return} doesn't need to be used.
\section{Exercises}\label{s:callbacks-exercises}
\exercise{Side Effects With \texttt{forEach}}
JavaScript arrays have a method called \texttt{forEach},
which calls a callback function once for each element of the array.
Unlike \texttt{map},
\texttt{forEach} does \emph{not} save the values returned by these calls
or return an array of results.
The full syntax is:
\begin{minted}{js}
someArray.forEach((value, location, container) => {
// 'value' is the value in 'someArray'
// 'location' is the index of that value
// 'container' is the containing array (in this case, 'someArray')
})
\end{minted}
If you only need the value,
you can provide a callback that only takes one parameter;
if you only need the value and its location,
you can provide a callback that takes two.
Use this to write a function \texttt{doubleInPlace}
that doubles all the values in an array in place:
\begin{minted}{js}
const vals = [1, 2, 3]
doubleInPlace(vals)
console.log(`vals after change: ${vals}`)
\end{minted}
\begin{minted}{text}
vals after change: 2,4,6
\end{minted}
\exercise{Annotating Data}
Given an array of objects representing observations of wild animals:
\begin{minted}{js}
data = [
{'date': '1977-7-16', 'sex': 'M', 'species': 'NL'},
{'date': '1977-7-16', 'sex': 'M', 'species': 'NL'},
{'date': '1977-7-16', 'sex': 'F', 'species': 'DM'},
{'date': '1977-7-16', 'sex': 'M', 'species': 'DM'},
{'date': '1977-7-16', 'sex': 'M', 'species': 'DM'},
{'date': '1977-7-16', 'sex': 'M', 'species': 'PF'},
{'date': '1977-7-16', 'sex': 'F', 'species': 'PE'},
{'date': '1977-7-16', 'sex': 'M', 'species': 'DM'}
]
\end{minted}
\noindent
write a function that returns a new array of objects like this:
\begin{minted}{js}
newData = [
{'seq': 3, 'year': '1977', 'sex': 'F', 'species': 'DM'},
{'seq': 7, 'year': '1977', 'sex': 'F', 'species': 'PE'}
]
\end{minted}
\emph{without} using any loops.
The changes are:
\begin{itemize}
\item
The \texttt{date} field is replaced with just the `year.
\item
Only observations of female animals are retained.
\item
The retained records are given sequence numbers to relate them back to the original data.
(These sequence numbers are 1-based rather than 0-based.)
\end{itemize}
You will probably want to use \texttt{Array.reduce} to generate the sequence numbers.
\section*{Key Points}
\input{keypoints/callbacks}