-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathNormalFamily132.tex
706 lines (528 loc) · 24.2 KB
/
NormalFamily132.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
\chapter{The Normal Distributions}
\label{chap:normal}
Again, these are the famous ``bell-shaped curves,'' so called because
their densities have that shape.
\section{Density and Properties}
The density for a normal distribution is
\begin{equation}
\label{nointegral}
f_W(t) = \frac{1}{\sqrt{2\pi} \sigma} ~ e^{- 0.5 \left (\frac{t-\mu}{\sigma}
\right )^2}, -\infty < t < \infty
\end{equation}
Again, this is a two-parameter family, indexed by the parameters $\mu$
and $\sigma$, which turn out to be the mean\footnote{Remember, this is a
synonym for expected value.} and standard deviation $\mu$ and $\sigma$,
The notation for it is $N(\mu,\sigma^2)$ (it is customary to state
the variance $\sigma^2$ rather than the standard deviation).
And we write
\begin{equation}
X ~ \widetilde{ } ~ N(\mu,\sigma^2)
\end{equation}
to mean that the random variable X has the distribution $N(\mu,\sigma^2)$.
(The tilde is read ``is distributed as.'')
{\bf Note:} Saying ``X has a $N(\mu,\sigma^2)$ distribution'' is {\it
more} than simply saying ``X has mean $\mu$ and variance $\sigma^2$.''
The former statement tells us not only the mean and variance of X, but
{\it also} the fact that X has a ``bell-shaped'' density in the
(\ref{nointegral}) family.
\subsection{Closure Under Affine Transformation}
\label{affine}
The family is closed under affine transformations:
\begin{quote}
If
\begin{equation}
X ~ \widetilde{} ~ N(\mu,\sigma^2)
\end{equation}
and we set
\begin{equation}
Y = cX + d
\end{equation}
then
\begin{equation}
Y ~ \widetilde{} ~ N(c\mu+d,c^2\sigma^2)
\end{equation}
\end{quote}
For instance, suppose X is the height of a randomly selection UC Davis
student, measured in inches. Human heights do have approximate normal
distributions; a histogram plot of the student heights would look
bell-shaped. Now let Y be the student's height in centimeters. Then we
have the situation above, with c = 2.54 and d = 0. The claim about
affine transformations of normally distributed random variables would
imply that a histogram of Y would again be bell-shaped.
Consider the above statement carefully.
\begin{quote}
It is saying much more than simply that Y has mean $c\mu + d$ and
variance $c^2\sigma^2$, which would follow from our our ``mailing
tubes'' such as (\ref{varcu}) {\it even if X did not have a normal
distribution}. The key point is that this new variable Y is also a
member of the normal family, i.e.\ its density is still given by
(\ref{nointegral}), now with the new mean and variance.
\end{quote}
Let's derive this, using the reasoning of Section \ref{densgofx}.
For convenience, suppose $c > 0$. Then
\begin{eqnarray}
F_Y(t) &=& P(Y \leq t) ~~ (\textrm{definition of } F_Y) \\
&=& P(cX + d \leq t) ~~ (\textrm{definition of Y}) \\
&=& P \left ( X \leq \frac{t-d}{c} \right ) ~~ (\textrm{algebra}) \\
&=& F_X \left (\frac{t-d}{c} \right )~~ (\textrm{definition of } F_X)
\label{xcdf}
\end{eqnarray}
Therefore
\begin{eqnarray}
f_Y(t) &=& \frac{d}{dt} F_Y(t) ~~ (\textrm{definition of } f_Y) \\
&=& \frac{d}{dt} F_X \left (\frac{t-d}{c} \right ) ~~ (\textrm{from }
(\ref{xcdf})) \\
&=& f_X \left ( \frac{t-d}{c} \right ) \cdot \frac{d}{dt} \frac{t-d}{c}
~~ (\textrm{definition of } f_X \textrm{ and the Chain Rule}) \\
&=& \frac{1}{c} \cdot
\frac{1}{\sqrt{2\pi} \sigma} ~ e^{- 0.5
\left (\frac{\frac{t-d}{c}-\mu}{\sigma} \right )^2}
~~ (\textrm{from } (\ref{nointegral}) \\
&=& \frac{1}{\sqrt{2\pi} (c \sigma)} ~
e^{- 0.5 \left (\frac{t-(c\mu +d)}{c\sigma} \right )^2} ~~ (\textrm{algebra})
\end{eqnarray}
That last expression is the $N(c\mu+d,c^2\sigma^2)$ density, so we are
done!
\subsection{Closure Under Independent Summation}
\label{sumindep}
If X and Y are independent random variables, each having a normal
distribution, then their sum S = X + Y also is normally distributed.
This is a pretty remarkable phenomenon, not true for most other
parametric families. If for instance X and Y each with, say, a U(0,1)
distribution, then the density of S turns out to be triangle-shaped, NOT
another uniform distribution. (This can be derived using the methods of
Section \ref{convolution}.)
Note that if X and Y are independent and normally distributed, then the
two properties above imply that cX + dY will also have a normal
distribution, for any constants c and d.
More generally:
\begin{quote}
For constants $a_1,...,a_k$ and {\it independent} random variables
$X_1,...,X_k$, with
\begin{equation}
X_i ~ \widetilde{ } ~ N(\mu_i, \sigma_i^2
\end{equation}
form the new random variable $Y = a_1 X_1 +...+ a_k X_k$. Then
\begin{equation}
\label{lincombnormal}
Y ~ \widetilde{ } ~ N(\sum_{i=1}^k a_i \mu_i, \sum_{i=1}^k a_i^2 \sigma_i^2)
\end{equation}
\end{quote}
{\bf Lack of intuition:}
The reader should ponder how remarkable this property of the normal
family is, because there is no intuitive explanation for it.
Imagine random variables $X$ and $Y$, each with a normal distribution.
Say the mean and variances are 10 and 4 for $X$, and 18 and 6 for $Y$.
We repeat our experiment 1000 times for our ``notebook,'' i.e.\ 1000
lines with 2 columns. If we draw a histogram of the $X$ column, we'll
get a bell-shaped curve, and the same will be true for the $Y$ column.
But now add a $Z$ column, for $Z = X + Y$. Why in the world should a
histogram of the $Z$ column also be bell-shaped?
\section{R Functions}
\begin{lstlisting}
dnorm(x, mean = 0, sd = 1)
pnorm(q, mean = 0, sd = 1)
qnorm(p, mean = 0, sd = 1)
rnorm(n, mean = 0, sd = 1)
\end{lstlisting}
Here {\bf mean} and {\bf sd} are of course the mean and standard
deviation of the distribution. The other arguments are as in our
previous examples.
\section{The Standard Normal Distribution}
\label{stdnorm}
\begin{definition}
If $Z ~ \widetilde{} ~ N(0,1)$ we say the random variable Z has a {\it
standard normal distribution}.
Note that if $X ~ \widetilde{} ~ N(\mu,\sigma^2)$, and if we set
\begin{equation}
Z = \frac{X - \mu}{\sigma}
\end{equation}
then
\begin{equation}
\label{zisn01}
Z ~ \widetilde{} ~ N(0,1)
\end{equation}
\end{definition}
The above statements follow from the earlier material:
\begin{itemize}
\item Define $Z = \frac{X - \mu}{\sigma}$.
\item Rewrite it as $Z = \frac{1}{\sigma} \cdot X +
(\frac{-\mu}{\sigma})$.
% \item By (\ref{aubv}) we know that
\item Since E(cU + d) = c EU + d for any random variable U and constants
c and d, we have
\begin{equation}
EZ = \frac{1}{\sigma} EX - \frac{\mu}{\sigma} = 0
\end{equation}
and (\ref{affinevar}) and (\ref{varcu}) imply that Var(X) = 1.
\item OK, so we know that Z has mean 0 and variance 1. But does it have
a normal distribution? Yes, due to our discussion above titled
``Closure Under Affine Transformations."
\end{itemize}
By the way, the N(0,1) cdf is traditionally denoted by $\Phi$.
\section{Evaluating Normal cdfs}
The function in (\ref{nointegral}) does not have a closed-form
indefinite integral. Thus probabilities involving normal random
variables must be approximated. Traditionally, this is done with a
table for the cdf of N(0,1), which is included as an appendix to almost
any statistics textbook; the table gives the cdf values for that
distribution.
But this raises a question: There are infinitely many distributions in
the normal family. Don't we need a separate table for each? That of
course would not be possible, and in fact it turns out that this one
table---the one for the N(0,1) distribution--- is sufficient for the
entire normal family. Though we of course will use R to gt such
probabilities, it will be quite instructive to see how these table
operations work.
Here's why one table is enough: Say X has an $N(10,2.5^2)$ distribution.
How can we get a probability like, say, $P(X < 12)$ using the N(0,1)
table? Write
\begin{equation}
P(X < 12) = P\left (Z < \frac{12-10}{2.5}\right ) = P(Z < 0.8)
\end{equation}
Since on the right-hand side Z has a standard normal distribution,
we can find that latter probably from the N(0,1) table!
As noted, traditionally it has played a central role, as one could
transform any probability involving some normal distribution to an
equivalent probability involving N(0,1). One would then use a table of
N(0,1) to find the desired probability.
The transformation $Z = (X - \mu)/\sigma$ will play a big role in other
contexts in future chapters, but for the sole purpose of simply
evaluating normal probabilities, we can be much more direct. Nowadays,
probabilities for any normal distribution, not just N(0,1), are easily
available by computer. In the R statistical package, the normal cdf for
any mean and variance is available via the function {\bf pnorm()}. The
call form is
\begin{Verbatim}[fontsize=\relsize{-2}]
pnorm(q,mean=0,sd=1)
\end{Verbatim}
This returns the value of the cdf evaluated at {\bf q}, for a normal
distribution having the specified mean and standard deviation (default
values of 0 and 1).
We can use {\bf rnorm()} to simulate normally distributed random
variables. The call is
\begin{Verbatim}[fontsize=\relsize{-2}]
rnorm(n,mean=0,sd=1)
\end{Verbatim}
which returns a vector of {\bf n} random variates from the specified
normal distribution.
There are also of course the corresponding density and quantile
functions, {\bf dnorm()} and {\bf qnorm()}.
\section{Example: Network Intrusion}
\label{netintrude}
As an example, let's look at a simple version of the network intrusion
problem. Suppose we have found that in Jill's remote logins to a
certain computer, the number X of disk sectors she reads or writes has
an approximate normal distribution with a mean of 500 and a standard
deviation of 15.
Before we continue, a comment on modeling: Since the number of sectors
is discrete, it could not have an exact normal distribution. But then,
no random variable in practice has an exact normal or other continuous
distribution, as discussed in Section \ref{unicorns}, and the
distribution can indeed by approximately normal.
Now, say our network intrusion monitor finds that Jill---or someone
posing as her---has logged in and has read or written 535 sectors.
Should we be suspicious?
To answer this question, let's find $P(X \geq 535)$: Let $Z =
(X-500)/15$. From our discussion above, we know that Z has a N(0,1)
distribution, so
\begin{equation}
P(X \geq 535) = P \left (Z \geq \frac{535-500}{15} \right )
= 1 - \Phi(35/15) = 0.01
\end{equation}
Again, traditionally we would obtain that 0.01 value from a N(0,1) cdf
table in a book. With R, we would just use the function {\bf pnorm()}:
\begin{Verbatim}[fontsize=\relsize{-2}]
> 1 - pnorm(535,500,15)
[1] 0.009815329
\end{Verbatim}
Anyway, that 0.01 probability makes us suspicious. While it {\it could}
really be Jill, this would be unusual behavior for Jill, so we start to
suspect that it isn't her. It's suspicious enough for us to probe more
deeply, e.g. by looking at which files she (or the impostor)
accessed---were they rare for Jill too?
Now suppose there are two logins to Jill's account, accessing X and Y
sectors, with X+Y = 1088. Is this rare for her, i.e. is $P(X+Y >
1088)$? small?
We'll assume X and Y are independent. We'd have to give some thought as
to whether this assumption is reasonable, depending on the details of
how we observed the logins, etc., but let's move ahead on this basis.
From page \pageref{sumindep}, we know that the sum S = X+Y is again normally
distributed. Due to the properties in Chapter \ref{dis}, we know S has
mean $2 \cdot 500$ and variance $2 \cdot 15^2$. The desired
probability is then found via
\begin{lstlisting}
1 - pnorm(1088,1000,sqrt(450))
\end{lstlisting}
which is about 0.00002. That is indeed a small number, and we should be
highly suspicious.
Note again that the normal model (or any other continuous model) can
only be approximate, especially in the tails of the distribution, in
this case the right-hand tail. But it is clear that S is only rarely
larger than 1088, and the matter mandates further investigation.
Of course, this is very crude analysis, and real intrusion detection
systems are much more complex, but you can see the main ideas here.
\section{Example: Class Enrollment Size}
\label{classize}
After years of experience with a certain course, a university has found
that online pre-enrollment in the course is approximately normally
distributed, with mean 28.8 and standard deviation 3.1. Suppose that in
some particular offering, pre-enrollment was capped at 25, and it hit
the cap. Find the probability that the actual demand for the course was
at least 30.
Note that this is a conditional probability! Evaulate it as follows.
Let N be the actual demand. Then the key point is that we are given that
$N \geq 25$, so
\begin{eqnarray}
P(N \geq 30 | N \geq 25)
&=&
\frac
{P(N \geq 30 \textrm{ and } N \geq 25)}
{P(N \geq 25)} ~~~~ ((\ref{genand})) \\
&=&
\frac
{P(N \geq 30)}
{P(N \geq 25)} \\
&=&
\frac
{1 -\Phi \left [ (30-28.8)/3.1 \right ] }
{1 -\Phi \left [ (25-28.8)/3.1 \right ] } \\
&=& 0.39
\end{eqnarray}
Sounds like it may be worth moving the class to a larger room before
school starts.
Since we are approximating a discrete random variable by a continuous
one, it might be more accurate here to use a {\bf correction for
continuity}, described in Section \ref{correctcontin}.
\section{More on the Jill Example}
Continuing the Jill example, suppose there is never an intrusion, i.e.
all logins are from Jill herself. Say we've set our network intrusion
monitor to notify us every time Jill logs in and accesses 535 or more
disk sectors. In what proportion of all such notifications will Jill
have accessed at least 545 sectors?
This is $P(X \geq 545 $|$ X \geq 535)$. By an analysis similar
to that in Section \ref{classize}, this probability is
\begin{lstlisting}
(1 - pnorm(545,500,15)) / (1 - pnorm(535,500,15))
\end{lstlisting}
\section{Example: River Levels}
Consider a certain river, and L, its level (in feet) relative to its
average. There is a flood whenever L $>$ 8, and it is reported that
2.5\% of days have flooding. Let's assume that the level
L is normally distributed; the above information implies that the mean
is 0.
Suppose the standard deviation of L, $\sigma$, goes up by 10\%. How much
will the percentage of flooding days increase?
To solve this, let's first find $\sigma$. We have that
\begin{equation}
0.025 = P(L > 8) = P \left ( \frac{L-0}{\sigma} > \frac{8-0}{\sigma}
\right )
\end{equation}
Since $(L-0)/\sigma$ has a N(0,1) distribution, we can find the 0.975
point in its cdf:
\begin{lstlisting}
> qnorm(0.975,0,1)
[1] 1.959964
\end{lstlisting}
So,
\begin{equation}
1.96 = \frac{8-0}{\sigma}
\end{equation}
so $\sigma$ is about 4.
If it increases to 4.4, then we can evaluate $P(L > 8)$ by
\begin{lstlisting}
> 1 - pnorm(8,0,4.4)
[1] 0.03451817
\end{lstlisting}
So, a 10\% increase in $\sigma$ would lead in this case to about a 40\%
increase in flood days.
\section{Example: Upper Tail of a Light Bulb Distribution}
Suppose we model light bulb lifetimes as having a normal
distribution with mean and standard deviation 500 and 50 hours,
respectively. Give a loop-free R expression for finding the value of d
such that 30\% of all bulbs have lifetime more than d.
You should develop the ability to recognize when we need {\bf p}-series
and {\bf q}-series functions. Here we need
\begin{lstlisting}
qnorm(1-0.30,500,50)
\end{lstlisting}
\section{The Central Limit Theorem}
\label{theclt}
The Central Limit Theorem (CLT) says, roughly speaking, that a random
variable which is a sum of many components will have an approximate
normal distribution. So, for instance, human weights are approximately
normally distributed, since a person is made of many components. The
same is true for SAT test scores,\footnote{This refers to the raw
scores, before scaling by the testing company.} as the total score is
the sum of scores on the individual problems.
There are many versions of the CLT. The basic one requires that the
summands be independent and identically distributed:\footnote{A more
mathematically precise statement of the theorem is given in Section
\ref{formalclt}.}
\begin{theorem}
\label{impreciseclt}
Suppose $X_1, X_2, ...$ are independent random variables, all having the
same distribution which has mean m and variance $v^2$. Form the new
random variable $T = X_1+...+X_n$. Then for large n, the distribution
of T is approximately normal with mean nm and variance $nv^2$.
\end{theorem}
The larger n is, the better the approximation, but typically n = 20 or
even n = 10 is enough.
\section{Example: Cumulative Roundoff Error}
Suppose that computer roundoff error in computing the square roots of
numbers in a certain range is distributed uniformly on (-0.5,0.5), and
that we will be computing the sum of n such square roots. Suppose we
compute a sum of 50 square roots. Let's find the approximate
probability that the sum is more than 2.0 higher than it should be.
(Assume that the error in the summing operation is negligible compared
to that of the square root operation.)
Let $U_1,...,U_{50}$ denote the errors on the individual terms in the
sum. Since we are computing a sum, the errors are added too, so our
total error is
\begin{equation}
T = U_1 + ... + U_{50}
\end{equation}
By the Central Limit Theorem, since T is a sum, it has an approximately
normal distribution, with mean 50 EU and variance 50 Var(U), where U is
a random variable having the distribution of the $U_i$. From Section
\ref{unifprops}, we know that
\begin{equation}
EU = (-0.5+0.5) / 2 = 0, ~~ Var(U) = \frac{1}{12} [0.5-(-0.5)]^2 =
\frac{1}{12}
\end{equation}
So, the approximate distribution of T is N(0,50/12). We can then use R to
find our desired probability:
\begin{lstlisting}
> 1 - pnorm(2,mean=0,sd=sqrt(50/12))
[1] 0.1635934
\end{lstlisting}
\section{Example: R Evaluation of a Central Limit Theorem
Approximation}
Say $W = U_1 + ... + U_{50}$, with the $U_i$ being independent and
identically distributed (i.i.d.) with uniform distributions on (0,1).
Give an R expression for the approximate value of $P(W < 23.4)$.
W has an approximate normal distribution, with mean $50 \times
0.5$ and variance $50 \times (1/12)$. So we need
\begin{lstlisting}
pnorm(23.4,25,sqrt(50/12))
\end{lstlisting}
\section{Example: Bug Counts}
As an example, suppose the number of bugs per 1,000 lines of code has a
Poisson distribution with mean 5.2. Let's find the probability of
having more than 106 bugs in 20 sections of code, each 1,000 lines long.
We'll assume the different sections act independently in terms of bugs.
Here $X_i$ is the number of bugs in the i$^{th}$ section of code, and T
is the total number of bugs. This is another clear candidate for using
the CLT.
Since each $X_i$ has a Poisson distribution, $m = v^2 = 5.2$. So, T,
being a sum, is approximately distributed normally with mean and
variance $20 \times 5.2$. So, we can find the approximate probability
of having more than 106 bugs:
\begin{Verbatim}[fontsize=\relsize{-2}]
> 1 - pnorm(106,20*5.2,sqrt(20*5.2))
[1] 0.4222596
\end{Verbatim}
\section{Example: Coin Tosses}
\label{correctcontin}
Binomially distributed random variables, though discrete, also are
approximately normally distributed. Here's why:
Say T has a binomial distribution with n trials. Then we
can write T as a sum of indicator random variables (Section
\ref{indicator}):
\begin{equation}
T = T_1+...+T_n
\end{equation}
where $T_i$ is 1 for a success and 0 for a failure on the i$^{th}$
trial. Since we have a sum of independent, identically distributed
terms, the CLT applies. Thus we use the CLT if we have binomial
distributions with large n.
For example, let's find the approximate probability of getting more than
12 heads in 20 tosses of a coin. X, the number of heads, has a binomial
distribution with n = 20 and p = 0.5 Its mean and variance are then
np = 10 and np(1-p) = 5. So, let $Z = (X-10)/\sqrt{5}$, and write
\begin{equation}
\label{gt12}
P(X > 12) = P(Z > \frac{12-10}{\sqrt{5}})
\approx 1 - \Phi(0.894) = 0.186
\end{equation}
Or:
\begin{Verbatim}[fontsize=\relsize{-2}]
> 1 - pnorm(12,10,sqrt(5))
[1] 0.1855467
\end{Verbatim}
The exact answer is 0.132, not too close. Why such a big error?
The main reason is n here is rather small. But actually, we can still
improve the approximation quite a bit, as follows.
Remember, the reason we did the above normal calculation was that X is
approximately normal, from the CLT. This is an approximation of the
distribution of a discrete random variable by a continuous one, which
introduces additional error.
We can get better accuracy by using the {\bf correction of continuity},
which can be motivated as follows. As an alternative to (\ref{gt12}),
we might write
\begin{equation}
P(X > 12) = P( X \geq 13) = P(Z > \frac{13-10}{\sqrt{5}})
\approx 1 - \Phi(1.342) = 0.090
\end{equation}
That value of 0.090 is considerably smaller than the 0.186 we got from
(\ref{gt12}). We could ``split the difference'' this way:
\begin{equation}
P(X > 12) = P( X \geq 12.5) = P(Z > \frac{12.5-10}{\sqrt{5}})
\approx 1 - \Phi(1.118) = 0.132
\end{equation}
(Think of the number 13 ``owning'' the region between 12.5 and 13.5, 14
owning the part between 13.5 and 14.5 and so on.) Since the exact answer
to seven decimal places is 0.131588, the strategy has improved accuracy
substantially.
The term {\it correction for continuity} alludes to the fact that we
are approximately a discrete distribution by a continuous one.
\section{Example: Normal Approximation to Gamma Family}
Recall from above that the gamma distribution, or at least the Erlang,
arises as a sum of independent random variables. Thus the Central Limit
Theorem implies that the gamma distribution should be approximately
normal for large (integer) values of r. We see in Figure \ref{gammas}
that even with r = 10 it is rather close to normal.\footnote{It should
be mentioned that technically, the CLT, which concerns convergence of
cdfs, does not imply convergence of densities. However, under mild
mathematical conditions, convergence of densities occurs too.}
\section{Example: Museum Demonstration}
Many science museums have the following visual demonstration of the CLT.
There are many balls in a chute, with a triangular array of r rows of pins
beneath the chute. Each ball falls through the rows of pins, bouncing
left and right with probability 0.5 each, eventually being collected
into one of r bins, numbered 0 to r. A ball will end up in bin i if it
bounces rightward in i of the r rows of pins, i = 0,1,...,r. Key point:
\begin{quote}
Let X denote the bin number at which a ball ends up. X is the number of
rightward bounces (``successes'') in r rows (``trials''). Therefore X
has a binomial distribution with n = r and p = 0.5
\end{quote}
Each bin is wide enough for only one ball, so the balls in a bin will
stack up. And since there are many balls, the height of the stack in
bin i will be approximately proportional to P(X = i). And since the
latter will be approximately given by the CLT, the stacks of balls will
roughly look like the famous bell-shaped curve!
There are many online simulations of this museum demonstration, such as
\url{http://www.mathsisfun.com/data/quincunx.html}. By collecting the
balls in bins, the apparatus basically simulates a histogram for $X$,
which will then be approximately bell-shaped.
\section{Importance in Modeling}
\label{normalimp}
Needless to say, there are no random variables in the real world that
are exactly normally distributed. In addition to our comments at the
beginning of this chapter that no real-world random variable has a
continuous distribution, there are no practical applications in which a
random variable is not bounded on both ends. This contrasts with normal
distributions, which extend from $-\infty$ to $\infty$.
Yet, many things in nature do have approximate normal distributions, so
normal distributions play a key role in statistics. Most of the
classical statistical procedures assume that one has sampled from a
population having an approximate distribution. In addition, it will be
seen later than the CLT tells us in many of these cases that the
quantities used for statistical estimation are approximately normal,
even if the data they are calculated from are not.
Recall from above that the gamma distribution, or at least the Erlang,
arises as a sum of independent random variables. Thus the Central Limit
Theorem implies that the gamma distribution should be approximately
normal for large (integer) values of r. We see in Figure \ref{gammas}
that even with r = 10 it is rather close to normal.