-
Notifications
You must be signed in to change notification settings - Fork 3
/
Lesson10.Rmd
961 lines (735 loc) · 42 KB
/
Lesson10.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
---
title: "Lesson 10: Inference for One Mean with Sigma Known (Confidence Interval)"
output:
html_document:
theme: cerulean
toc: true
toc_float: false
---
<script type="text/javascript">
function showhide(id) {
var e = document.getElementById(id);
e.style.display = (e.style.display == 'block') ? 'none' : 'block';
}
</script>
<div style="width:50%;float:right;">
#### Optional Videos for this Lesson {.tabset .tabset-pills}
##### Part 1
<iframe id="kaltura_player_1644542597" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1644542597&entry_id=1_bmffh72n" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 2
<iframe id="kaltura_player_1644542855" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1644542855&entry_id=1_q22ujikq" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 3
<iframe id="kaltura_player_1644543101" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1644543101&entry_id=1_gyupa4ce" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 4
<iframe id="kaltura_player_1644543358" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1644543358&entry_id=1_ytcv298k" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 5
<iframe id="kaltura_player_1644543549" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1644543549&entry_id=1_7n4z1h2g" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 6
<iframe id="kaltura_player_1644543853" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1644543853&entry_id=1_x7e49tmb" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
####
</div><div style="clear:both;"></div>
## Lesson Outcomes
By the end of this lesson, you should be able to:
1. Recognize when a one mean (sigma known) confidence interval is appropriate
2. Calculate the sample size required to achieve a specified margin of error and level of confidence
3. Explain the meaning of a level of confidence
4. Create a confidence interval for a single mean with σ known using the following steps:
a. Find the point estimate
b. Calculate the margin of error for the given level of confidence
c. Calculate a confidence interval from the point estimate and the margin of error
d. Interpret the confidence interval
e. Check the requirements for the confidence interval
5. Explain how the margin of error is affected by the sample size and level of confidence
<br>
## Political Polls
<img src="./Images/Step4.png">
During an election in the United States, many polls are conducted to determine the attitudes of likely voters. Poll results are usually reported as percentages. For example, a poll might state that 49% favor the Republican candidate and 51% favor the Democratic candidate.
Polls always include a **margin of error**. The margin of error gives an estimate of the variability in the responses. A common value for the margin of error in political polls is 3%.
When we consider the margin of error, we estimate that the true proportion of people who favor the Republican candidate is 49% ± 3%, or in other words, between 46% and 52%. It is not obvious whether the Republican candidate is favored by more or less than 50% of the voters. In this case, the political race is too close to know who might win.
In this reading, we will explore the margin of error and its role in estimating a parameter.
<br>
## Background
### Point Estimators
We have learned about several statistics. Remember, a statistic is any number computed based on data. The sample statistics we have discussed are used to estimate population parameters.
<center>
<table>
<thead>
<tr class="header">
<th></th>
<th><p>Sample Statistic</p></th>
<th><p>Population Parameter</p></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><p>Mean</p></td>
<td><p>$\bar x$</p></td>
<td><p>$\mu$</p></td>
</tr>
<tr class="even">
<td><p>Standard Deviation</p></td>
<td><p>$s$</p></td>
<td><p>$\sigma$</p></td>
</tr>
<tr class="odd">
<td><p>Variance</p></td>
<td><p>$s^2$</p></td>
<td><p>$\sigma^2$</p></td>
</tr>
<tr class="even">
<td><p>$\vdots$</p></td>
<td><p>$\vdots$</p></td>
<td><p>$\vdots$</p></td>
</tr>
</tbody>
</table>
</center>
<!--
<center>
{| class="wikitable"
!
! Sample Statistic
! Population Parameter
|-
| Mean
| align="center" | $\bar x$
| align="center" | $\mu$
|-
| Standard Deviation
| align="center" | $s$
| align="center" | $\sigma$
|-
| Variance
| align="center" | $s^2$
| align="center" | $\sigma^2$
|-
| align="center" | $\vdots$
| align="center" | $\vdots$
| align="center" | $\vdots$
|}
</center>
-->
The statistics above are called **point estimators** because they are just one number (one point on a number line) that is used to estimate a parameter. Parameters are generally unknown values. Think about the mean. If $\mu$ is unknown, how do we know if $\bar x$ is close to it?
The short answer is that we will never know *for sure* if $\bar x$ is close to $\mu$. This does not mean that we are helpless. The laws of probability and the normal distribution provide a way for us to create a range of plausible values for a parameter (e.g. $\mu$) based on a statistic (e.g. $\bar x$).
### Interval Estimators
A point estimator gives one specific value as an estimate of a parameter. An **interval estimator** is a range of plausible values for a parameter. We can create an interval estimate by starting with a point estimate and adding or subtracting the margin of error.
In the political poll mentioned above, the point estimate for the support of the Republican candidate is 49%. By adding and subtracting the margin of error, we get the interval estimate: 46% to 52%.
A **confidence interval** is a commonly used interval estimator. In this reading, we will explore how to create a confidence interval for the mean when $\sigma$ is known.
### The Margin of Error
#### Properties of Bell-shaped Curves
The following questions will help you review your understanding of the normal distribution.
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
1. Fill in the blank in the following sentence.<br> "The 68-95-99.7% rule only applies for distributions that are _________."
<a href="javascript:showhide('Q1')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q1" style="display:none;">
* The 68-95-99.7% rule only applies for distributions that are bell-shaped.
* In the past, some students have answered that the data must be normally distributed. Actually, the 68-95-99.7% rule will work for any distribution that is mound-shaped and symmetrical. As long as the data are unimodal and not skewed left or skewed right, this rule works well.
</div>
<br>
2. Approximately what percentage of data from a bell-shaped distribution will lie within two standard deviations of the mean?
<a href="javascript:showhide('Q2')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q2" style="display:none;">
* Using the 68-95-99.7% rule, about 95% of the observations will lie within two standard deviations of the mean.
</div>
</div>
<br>
#### The Distribution of the Sample Mean
We learned in the reading [Distribution of Sample Means & The Central Limit Theorem](Lesson06.html){target="_blank"} about the characteristics of the sample mean, $\bar x$. Specifically, if the population from which the data are drawn is (i) approximately normal or (ii) if the sample size is large, then $\bar x$ will be approximately normally distributed. Furthermore, if the original population has mean $\mu$ and standard deviation $\sigma$, then the sampling distribution of $\bar x$ will have mean $\mu$ and standard deviation $\sigma/\sqrt{n}$.
So, if either condition (i) or (ii) is met, then we can consider the sample mean $\bar x$ as a normal random variable with mean $\mu$ and standard deviation $\sigma/\sqrt{n}$. According to the 68-95-99.7% rule for symmetric bell-shaped distributions, about 95% of the time, the sample mean ($\bar x$) will lie within two standard deviations of the population mean ($\mu$).
This is an important concept. Make sure that you understand the logic above before you continue reading.
#### How Far is $\bar x$ from $\mu$, or in other words, How Far is $\mu$ from $\bar x$?
Assuming that $\bar x$ is approximately normally distributed, about 95% of the time, it will be within two standard deviations of $\mu$.
<img src="./Images/SamplingDistOfXbar-Sketch.png">
Remember, the standard deviation of $\bar x$ is $\frac{\sigma}{\sqrt{n}}$. For the variable $\bar x$, two standard deviations would be equal to $2 \frac{\sigma}{\sqrt{n}}$.
If we collect a random sample from a population and $\bar x$ is normally distributed, then about 95% of the time the sample mean $\bar x$ will be less than $2 \frac{\sigma}{\sqrt{n}}$ units away from the population mean $\mu$. Notice that this is true, whether or not we know $\mu$.
We can express this as a probability statement:
$$
P\left( \mu - 2 \frac{\sigma}{\sqrt{n}} < \bar X < \mu + 2 \frac{\sigma}{\sqrt{n}} \right) \approx 0.95
$$
Here is the magic: If $\bar x$ is within 2 standard deviations of $\mu$, then $\mu$ is within 2 standard deviations of $\bar x$. It may seem silly to state it, but this is very important.
<a href="javascript:showhide('math')"><span style="font-size:8pt;">Click here if you love math</span></a>
<div id="math" style="display:none;">
<!--
{| width="100%" bgcolor="#000000" border="0" cellpadding="3px" cellspacing="2px" style="margin:auto;"
|- valign="top" align="left" style="background: #ffffff;"
-->
Starting with
$$\mu - 2 \frac{\sigma}{\sqrt{n}} < \bar X < \mu + 2 \frac{\sigma}{\sqrt{n}}$$
We subtract $\mu$ from all parts of the inequality:
$$\mu - 2 \frac{\sigma}{\sqrt{n}} - \mu < \bar X - \mu < \mu + 2 \frac{\sigma}{\sqrt{n}} - \mu$$
Which reduces to
$$- 2 \frac{\sigma}{\sqrt{n}} < \bar X - \mu < 2 \frac{\sigma}{\sqrt{n}}$$
Now, subtract $\bar X$ from all the terms:
$$- 2 \frac{\sigma}{\sqrt{n}} - \bar X < \bar X - \mu -\bar X < 2 \frac{\sigma}{\sqrt{n}} -\bar X$$
And this simplifies to
$$- 2 \frac{\sigma}{\sqrt{n}}-\bar X < -\mu < 2 \frac{\sigma}{\sqrt{n}}-\bar X$$
Multiplying all the terms by $-1$, we get
$$2 \frac{\sigma}{\sqrt{n}}+\bar X > \mu > -2 \frac{\sigma}{\sqrt{n}}+\bar X$$
Rewriting this reversing the order of the terms, we get
$$\bar X - 2 \frac{\sigma}{\sqrt{n}} < \mu < \bar X + 2 \frac{\sigma}{\sqrt{n}}$$
</div>
<br>
So, the statement
$$
P\left( \mu - 2 \frac{\sigma}{\sqrt{n}} < \bar X < \mu + 2 \frac{\sigma}{\sqrt{n}} \right) \approx 0.95
$$
is equivalent to the statement
$$
P\left( \bar X - 2 \frac{\sigma}{\sqrt{n}} < \mu < \bar X + 2 \frac{\sigma}{\sqrt{n}} \right) \approx 0.95
$$
<br>
## Putting It All Together: Confidence Intervals for $\mu$ when $\sigma$ is Known
Notice that if we know $\sigma$, then with approximately 95% confidence, $\mu$ will be between the following two values:
$$\left( \bar X - 2 \frac{\sigma}{\sqrt{n}}, ~ \bar X + 2 \frac{\sigma}{\sqrt{n}} \right)$$
This equation gives a confidence interval for $\mu$.
A confidence interval is actually a point estimate $\left( \bar x \right)$ plus or minus the margin of error $\left( 2 \frac{\sigma}{\sqrt{n}} \right)$.
We use the letter $m$ to denote the margin of error:
$$m = 2 \frac{\sigma}{\sqrt{n}}$$
Using this definition for $m$, our confidence interval can be written as
$$( \bar x - m, ~ \bar x + m )$$
or
$$\bar x \pm m$$
This pattern will be repeated throughout the course as we create confidence intervals of the form:
$$
\left(
\begin{align}
point~ ~ ~ & & margin~ & & point~ ~ ~ & & margin~ \\
estimate & ~ ~- & of~error & ~ ~, & estimate & ~ ~+ & of~error
\end{align}
\right)
$$
or
$$(\text{point estimate}) \pm (\text{margin of error})$$
</br>
<a id="z"></a>
### A Little More Precision
#### What $z$ Corresponds to 95% of the Area?
The 68-95-99.7% Rule for bell-shaped distribution is just a quick approximation. This is useful for estimation, but more precision is usually required.
<div class="QuestionsHeading">Answer the following question:</div>
<div class="Questions">
3. For a standard normal distribution, between what two $z$-scores will 95% of the data fall? In other words, find the values $-z$ and $z$ such that the area between them is equal to 0.95. Use the
[Normal Probability Applet](http://statistics.byuimath.com/index.php?title=Normal_Probability_Applet){target="_blank"}.
<a href="javascript:showhide('hint')"><span style="font-size:8pt;">Click here for a hint</span></a>
<div id="hint" style="display:none;">
- To find the value of $z^*$, given a specific confidence level, do the following:
+ *Open the applet at [Normal Probability Applet](http://statistics.byuimath.com/index.php?title=Normal_Probability_Applet){target="_blank"}
+ Shade the area between the two red lines (the middle portion of the graph) only and enter the decimal value for the desired percentage (e.g. 0.95 for 95%) in the *Area* box.
+ The value of $z^*$ will be displayed below. Two numbers will be given along the horizontal axis. The only difference in these numbers is their sign: one is positive and one is negative. The positive value is the desired $z^*$.
</div>
<br>
<a href="javascript:showhide('Q3')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q3" style="display:none;">
* 95% of the area under the standard normal curve is between $z=-1.96$ and $z=1.96$
<img src="./Images/Applet0.95-middle-LoRes.png">
* For normally distributed data, 95% of the observations will fall within 1.96 standard deviations of the mean.
</div>
</div>
<br>
The actual formula for a 95% confidence interval for $\mu$ (when $\sigma$ is known) is:
$$
\left( \bar x - 1.96 \frac{\sigma}{\sqrt{n}}, \bar x + 1.96 \frac{\sigma}{\sqrt{n}} \right)
$$
Please notice that the number 2 was used as an approximation for the actual value of 1.96. When computing a 95% confidence interval for a mean with $\sigma$ known, please use 1.96 in the equation.
#### Worked Example: Rolling a Die 25 Times
<img src="./Images/Step4.png">
We will compute a 95% confidence interval for the mean of $n=25$ rolls of a fair die.
A die was rolled 25 times. The values rolled were:
$$
3 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 4 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 6 ~ ~ ~ 5 ~ ~ ~ 6 ~ ~ ~ 3 ~ ~ ~ 4 ~ ~ ~ 2 ~ ~ ~ 2 ~ ~ ~ 5 ~ ~ ~ 6 ~ ~ ~ 2 ~ ~ ~ 6 ~ ~ ~ 4 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 5
$$
The mean of these values is $\bar x = 3.04$.
It is a fact that for the outcome of a six-sided die, $\sigma=\sqrt{\frac{35}{12}} \approx 1.7078$.
Applying the formula for a 95% confidence interval,
$$
\left( \bar x - z^* \frac{\sigma}{\sqrt{n}}, ~ \bar x + z^* \frac{\sigma}{\sqrt{n}} \right)
$$
we get:
$$
\left( 3.04 - 1.96 \frac{1.7078}{\sqrt{25}}, ~ 3.04 + 1.96 \frac{1.7078}{\sqrt{25}} \right)
$$
or
$$
(2.37, ~ 3.71)
$$
#### <a id = "z*"></a>What about Other Confidence Levels?
Most of the time, researchers report 95% confidence intervals. The number 95% is called the **confidence level**. Sometimes it is desirable to use a level of confidence that is different than 95%. In that case, we need to use a number besides 1.96 in the calculations.
Suppose we want to create a 90% confidence interval. What value would we put in the blanks in the confidence interval formula below?
$$
\left( \bar x - \underline{~ ~ ~ ? ~ ~ ~} \frac{\sigma}{\sqrt{n}}, ~ \bar x + \underline{~ ~ ~ ? ~ ~ ~} \frac{\sigma}{\sqrt{n}} \right)
$$
Another way to state this question is to ask, between what two $z$-scores will 90% of the data in a standard normal distribution fall? We need to find the values of $-z$ and $z$ such that the area between them is equal to 0.90. Again, we use the [Normal Probability Applet](http://statistics.byuimath.com/index.php?title=Normal_Probability_Applet){target="_blank"}.
<img src="./Images/Updated_z-score_curve.JPG">
So, for a 90% confidence interval, the value of $z*$ would be 1.645. Note: We use an asterisk ($*$) to indicate that $z$ was not computed from data, but was determined based on a *chosen* confidence level, 90%.
The formula for a 90% confidence interval for a mean when $\sigma$ is known is:
$$
\left( \bar x - 1.645 \frac{\sigma}{\sqrt{n}}, ~ \bar x + 1.645 \frac{\sigma}{\sqrt{n}} \right)
$$
When $\sigma$ is known, we use $z^* = 1.96$ to create 95% confidence intervals, and we use $z^* = 1.645$ to create 90% confidence intervals.
#### Formula for the Confidence Interval for $\mu$ ($\sigma$ Known)
With this notation, the confidence interval formula generalizes to:
$$
\left( \bar x - z^* \frac{\sigma}{\sqrt{n}}, ~ \bar x + z^* \frac{\sigma}{\sqrt{n}} \right)
$$
where $z^*$ is determined by the level of confidence. If you want a 90% confidence interval, then $z^*$ is the number of standard deviations on either side of the mean that you must go to capture 90% of the data.
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
4. If $\bar x$ follows a normal distribution and $\sigma$ is known, what is the equation for a 99% confidence interval for the true population mean?
<a href="javascript:showhide('Q4')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q4" style="display:none;">
- For a 99% confidence level, $z^* = 2.576$. The equation for the 99% confidence interval becomes:
<center>
$$
\left( \bar x - 2.576 \frac{\sigma}{\sqrt{n}}, ~ \bar x + 2.576 \frac{\sigma}{\sqrt{n}} \right)
$$
</center>
</div>
<br>
5. Apply the equation from the previous problem to find a 99% confidence interval for the mean value rolled on the die. Use the data given above. For convenience, they are reproduced here:
$$
3 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 4 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 6 ~ ~ ~ 5 ~ ~ ~ 6 ~ ~ ~ 3 ~ ~ ~ 4 ~ ~ ~ 2 ~ ~ ~ 2 ~ ~ ~ 5 ~ ~ ~ 6 ~ ~ ~ 2 ~ ~ ~ 6 ~ ~ ~ 4 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 5
$$
<a href="javascript:showhide('Q5')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q5" style="display:none;">
- If we were making a 99% confidence interval, we follow the same procedure as for a 95% confidence interval, only we use $z^*=2.576$. The 99% confidence interval for the true mean is:
$$
\left( 3.04 - 2.576 \frac{1.7078}{\sqrt{25}}, ~ 3.04 + 2.576 \frac{1.7078}{\sqrt{25}} \right)
$$
which simplifies to:
$$
(2.16, ~ 3.92)
$$
</div>
</div>
<br>
<!-- <span id="z"></span> -->
The most commonly used values for $z^*$ are:
<center>
<table>
<thead>
<tr class="header">
<th><p>Confidence<br />
Level</p></th>
<th><p>$z^*$</p></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><p>90%</p></td>
<td><p>1.645</p></td>
</tr>
<tr class="even">
<td><p>95%</p></td>
<td><p>1.960</p></td>
</tr>
<tr class="odd">
<td><p>99%</p></td>
<td><p>2.576</p></td>
</tr>
<tr class="even">
<td></td>
<td></td>
</tr>
</tbody>
</table>
</center>
<!-- <center> -->
<!-- {| class="wikitable" style="text-align:center" -->
<!-- ! Confidence <br> Level || $z^*$ -->
<!-- |- -->
<!-- |90% || 1.645 -->
<!-- |- -->
<!-- |95% || 1.960 -->
<!-- |- -->
<!-- |99% || 2.576 -->
<!-- |- -->
<!-- |} -->
<!-- </center> -->
Any other values that you need can be determined using the Normal Probability Applet.
<!--
#### Example: Confidence Interval for a Mean of Die Rolls
<img src="./Images/Step4.png">
Roll a fair, six sided die 25 times. Find the mean of the values you rolled. Record your results and find the mean $\bar x$.
<center>
{| class="wikitable"
|-
|
**Note:** ``die* is the singular form for ``dice.*
|}
</center>
The parameter $\mu$ is the mean of all the values you would roll with your die if you continued rolling it forever. Usually, we do not know the value of $\mu$, so we estimate it by the value $\bar x$.
For the roll of a fair six-sided die, the standard deviation is $\sigma=\sqrt{\frac{35}{12}} \approx 1.7078$. In order to do this computation, we must know $\sigma$, the population standard deviation. This number will be given to you when needed.
SPSS does not have a built-in routine to compute a confidence interval for a mean when $\sigma$ is unknown. This is not a flaw in SPSS, but it reflects the fact that in practice, we never know $\sigma$.
-->
### Factors Affecting the Width of the Confidence Interval
Recall that the formula to compute the confidence interval for a mean, where $\sigma$ is known is:
$$
\left( \bar x - z^* \frac{\sigma}{\sqrt{n}}, ~ \bar x + z^* \frac{\sigma}{\sqrt{n}} \right)
$$
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
6. What would happen to the confidence interval if the sample size $n$ was increased, but the other values were still the same?
<a href="javascript:showhide('Q6')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q6" style="display:none;">
- Since the sample size $n$ is in the denominator, if $n$ is increased, $\sqrt{n}$ would increase, and the fraction $\frac{\sigma}{\sqrt{n}}$ would decrease, which would lead to a smaller margin of error. In other words, if the sample size is increased, the width of a confidence interval will decrease--it will become narrower.
</div>
<br>
7. What would happen to the confidence interval if the confidence level were increased, say from 95% to 99%?
<a href="javascript:showhide('Q7')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q7" style="display:none;">
- Increasing the confidence level will lead to larger values of $z^*$. If $z^*$ is increased, then the margin of error $z^* \frac{\sigma}{\sqrt{n}}$ would increase. This would lead to a wider confidence interval.
</div>
</div>
<br>
### Interpretation of Confidence Intervals
<img src="./Images/Step5.png">
How do we interpret confidence intervals? What do they really mean?
<center>
<img src="./Images/CoinToss.png">
(Image source: Flickr/ICMA Photos)
</center>
<br>
Consider a coin with two sides: one called "heads" and the other called "tails". Imagine that you flipped this coin, but you have not looked at it yet. What is the probability that the coin shows heads?
Strangely enough, the answer is, it depends! If the head is facing up, then the probability that the coin shows heads is 1. If the head is facing down, then the probability that it shows heads is 0.
The coin has been tossed. There is no randomness left in the process. So, either the head is facing up or it is facing down. So, the probability that the coin shows heads is either 1 or 0. (We just don't know which.) The fact that we do not know the outcome does not change it or make it random.
Before we toss the coin, the probability that the coin will show heads is $\frac{1}{2}=0.5$. After we toss the coin, the probability that we get heads is 1 or 0.
Transferring this reasoning to confidence intervals, we get a similar result.
Once we have collected data on something, there is no randomness in the system. Any confidence interval that is created using that data will either contain the true parameter ($\mu$) or it will not. After collecting data, the probability that a specific confidence interval will contain $\mu$ is either 1 or 0.
<br>
<div class="message Tip">The correct interpretation of a 95% confidence interval is to say, "We are 95% confident that the true mean lies within the lower and upper bounds of the confidence interval."</div>
<br>
<br>
Consider the 95% confidence interval for the true mean of 25 rolls of a fair die. We found the 95% confidence interval to be: $(2.37, 3.71)$.
When we interpret this confidence interval, we say, "We are 95% *confident* that the true mean is between 2.37 and 3.71."
The word, "confident" implies that if we repeated this process many, many times, 95% of the confidence intervals we would get would contain the true mean $\mu$. It does not imply anything about whether or not one specific confidence interval will contain the true mean.
We do not say that "there is a 95% probability (or chance) that the true mean is between 2.37 and 3.71." The probability that the true mean $\mu$ is between 2.37 and 3.71 is either 1 or 0.
### Requirements
There are three requirements that need to be checked when computing a confidence interval for a mean with $\sigma$ known:
1. A simple random sample was drawn from the population
2. $\bar x$ is normally distributed
3. $\sigma$ is assumed to be known
The requirement of normality is satisfied if (a) the raw data are normally distributed or (b) the sample size is large. This procedure is robust to moderate departures from normality. Even if the requirement that $\bar x$ is normally distributed is not satisfied perfectly, it is usually okay to conduct the test.
In practice, we never really know $\sigma$. This procedure is primarily used to help you understand the idea of confidence intervals. When $\sigma$ is unknown, we use a slightly different computation.
### Example: Costs of CABG Surgery
<center>
<img src="./Images/Surgery-Flickr-1517206758_c7412.png">
(Photo credit: Vanderbilt Photo/Neil Brake)
</center>
<br>
This reading focuses on an important aspect of designing a study: determining the sample size.
<img src="./Images/Step1.png">
It is important for health care administrators to know the mean hospital costs for patients who have <u>c</u>oronary <u>a</u>rtery <u>b</u>ypass <u>g</u>raft (CABG) surgery. A large hospital is planning a study to determine their mean costs for patients who have CABG surgery.
A study will be conducted in which the charts of patients who had CABG surgery will be sampled, and their hospital costs will be recorded. For budgetary reasons, the hospital administrators do not want to collect a sample that is too large. However, if the sample size is not large enough, the confidence interval will be too wide to be useful as a planning tool.
After a discussion among the senior administration, they have determined that they want to estimate the mean hospital costs of CABG surgery within $2000 (i.e., plus or minus \$2000.) In other words, they want the confidence interval for the true mean to have a margin of error of \$2000 dollars.
Recall the equation for the confidence interval is:
$$
\left( \bar x - z^* \frac{\sigma}{\sqrt{n}}, \bar x + z^* \frac{\sigma}{\sqrt{n}} \right)
$$
The part of the equation that is added to and subtracted from $\bar x$ is called the **margin of error**. We will denote the margin of error by the letter $m$.
$$
m=z^* \frac{\sigma}{\sqrt{n}}
$$
To use this formula, the parameter $\sigma$ must be given to us. It is the true standard deviation of the data you are observing. If you do not know $\sigma$, you can estimate it using the standard deviation reported in a previous study or by conducting a pilot study.
A study published by another hospital reported that the standard deviation of the costs for CABG surgery was \$28,705. <!--<cite>Chu09</cite>-->
In the following questions, you will compute the margin of error, $m$, for a future study of the hospital costs for CABG surgery. The hospital administrators want to use a 95% level of confidence. Assume the standard deviation can be estimated to be $\sigma = \$28,705$.
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
8. If the hospital administrators want to be 95% confident in the results, what should the value of $z^*$ be?
<a href="javascript:showhide('Q8')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q8" style="display:none;">
<center>
$$
\begin{align}
z^*=1.96
\end{align}
$$
</center>
</div>
<br>
9. If the hospital collects a sample of $n=100$ patients' costs, what would the margin of error be?
<a href="javascript:showhide('Q9')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q9" style="display:none;">
<center>
$$
\begin{array}{1cl}
m &= z^* \frac{\sigma}{\sqrt{n}} \\
&= 1.96 \cdot \frac{28705}{\sqrt{100}} \\
&= \$5626.18
\end{array}
$$
</center>
</div>
<br>
10. If the hospital collects a sample of $n=1000$ patients' costs, what would the margin of error be?
<a href="javascript:showhide('Q10')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q10" style="display:none;">
<center>
$$
\begin{array}{1cl}
m &= z^* \frac{\sigma}{\sqrt{n}} \\
&= 1.96 \cdot \frac{28705}{\sqrt{1000}} \\
&= \$1779.15
\end{array}
$$
</center>
</div>
<br>
11. Choose any other value for the sample size, $n$. Find the margin of error for your sample size.
<a href="javascript:showhide('Q11')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q11" style="display:none;">
* Answers will vary.
</div>
<br>
12. Repeat Question 11 until you find the sample size that will yield a margin of error as close to \$2000 as possible, without going over.
<a href="javascript:showhide('Q12')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q12" style="display:none;">
<center>
$$
n = 792
$$
</center>
</div>
</div>
<br>
## Sample Size Calculations
The process you followed in Questions 8 - 12 is effective, but tedious. There must be an easier way! The trick is to solve the margin of error equation $m = z^* \frac{\sigma}{\sqrt{n}}$ for $n$.
<a href="javascript:showhide('math2')"><span style="font-size:8pt;">Click here if you love math</span></a>
<div id="math2" style="display:none;">
$$m = z^* \frac{\sigma}{\sqrt{n}}$$
Multiply both sides of the equation by $\sqrt{n}$:
$$m \cdot \sqrt{n} = z^* \frac{\sigma}{\sqrt{n}} \cdot \sqrt{n}$$
Which reduces to:
$$m \cdot \sqrt{n} = z^* \cdot \sigma$$
Divide both sides of the equation by $m$.
Cancelling the $m$'s on the left hand side:
$$\sqrt{n} = \frac{z^* \cdot \sigma}{m}$$
Squaring both sides:
$$\left( \sqrt{n} \right)^2 = \left( \frac{z^* \cdot \sigma}{m} \right)^2$$
Which simplifies to the desired result:
$$n = \left( \frac{z^* \cdot \sigma}{m} \right)^2$$
</div>
This gives us the **sample size formula**, which tells the number of observations required in order to obtain a specified margin of error:
$$
n = \left( \frac{z^*\sigma}{m} \right)^2
$$
Once the level of confidence is selected, $z^*$ is automatically determined. In practice, the most common level of confidence is 95%, which means that $z^*$ would equal 1.96.
<br>
<div class="message Note">If you need a reminder on finding the value of $z^*$, click [here](#z).</div>
<br>
<br>
<div class="QuestionsHeading">Answer the following question:</div>
<div class="Questions">
13. In Question 12, you used the process of guess-and-check to find the sample size. For this question, use the sample size formula to compute the sample size required to estimate the mean cost of CABG surgery, $\mu$, within \$2000 with 95% confidence. Recall that in a previous study, the standard deviation was found to be $\sigma = \$28,705$. <!--<cite>Chu09</cite>-->
<a href="javascript:showhide('Q13')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q13" style="display:none;">
<center>
$$
\begin{align}
n &= \left( \frac{z^*\sigma}{m} \right)^2 \\
&= \left( \frac{1.96 \cdot 28705}{2000} \right)^2 \\
&= 791.3
\end{align}
$$
</center>
</div>
</div>
<br>
### Rounding Up
In Question 13, you computed the sample size required to get a margin of error of \$2000. Notice that the result was 791.3, which is not a whole number. What does that mean? Does that suggest that you will survey only a fraction of the last patient's costs? Of course not!
When doing sample size calculations, if the answer is not a whole number, you *always round up to the next highest whole number*. This will allow you to get a sample size that is large enough to attain your desired margin of error. If you want to estimate the mean cost of CABG surgery within \$2000 with 95% confidence, you will need to survey the files of 792 patients.
<br>
<div class="message Tip">When doing sample size calculations, you *always* round *up*.</div>
<br>
<br>
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
14. Answer this question without doing any computations. If the hospital administrators wanted to estimate the mean cost of CABG surgery with a margin of error of \$1000 at the 95% confidence level, should the sample size be larger or smaller than $n=792$?
<a href="javascript:showhide('Q14')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q14" style="display:none;">
* In question 5, we determined that the sample size required to have a margin of error of $2000 is 792.
The margin of error is in the denominator of the sample size formula. If we want to reduce the margin of error, we need to increase the sample size. The sample size should be larger than $n=792$.
</div>
<br>
15. Find the sample size required to estimate the mean cost of CABG surgery with a margin of error of \$1000 and with a 95% confidence level.
<a href="javascript:showhide('Q15')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q15" style="display:none;">
<center>
$$
\begin{array}{1cl}
n &= \left( \frac{z^*\sigma}{m} \right)^2 \\
&= \left( \frac{1.96 \cdot 28705}{1000} \right)^2 \\
&= 3165.4
\end{array}
$$
</center>
We always round up in sample size calculations, so the required sample size is $n=3166$.
</div>
<br>
16. What happened to the sample size required when the margin of error is cut in half from \$2000 to \$1000?
<a href="javascript:showhide('Q16')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q16" style="display:none;">
* If we divide the two sample sizes, we get:
<center>
$$
\frac{3166}{792} \approx 4
$$
</center>
* In order to reduce the margin of error by half, we need to quadruple the sample size.
</div>
<br>
17. Remember the sample size required to have a margin of error of \$2000 at the 95% level of confidence is $n=792$. If we wanted to estimate the mean cost with the same margin of error at the 99% level of confidence, would the sample size be larger or smaller?
<a href="javascript:showhide('Q17')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q17" style="display:none;">
* The value of $z^*$ would increase from 1.96 to 2.576, which would increase the required sample size.
</div>
<br>
18. Find the sample size required to estimate the mean cost of CABG surgery with a margin of error of \$2000 and at the 99% confidence level.
<a href="javascript:showhide('Q18')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q18" style="display:none;">
<center>
$$
\begin{array}{1cl}
n &= \left( \frac{z^*\sigma}{m} \right)^2 \\
&= \left( \frac{2.576 \cdot 28705}{2000} \right)^2 \\
&= 1366.9
\end{array}
$$
</center>
We always round up in sample size calculations, so the required sample size is $n=1367$.
</div>
<br>
19. If we increase the confidence level, what happens to the required sample size?
<a href="javascript:showhide('Q19')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q19" style="display:none;">
* If we increase the confidence level the required sample size also increases.
</div>
</div>
<br>
### Example: Grade Inflation
The administration at BYU-Idaho is concerned about the possibility of grade inflation in their courses. Grades in a course are coded on a scale from 0 to 4, as given in the following table:
<center>
```{r echo=FALSE, message=FALSE, warning = FALSE}
library(pander)
grd <- data.frame("A" = 4.0,
"A-" = 3.7,
"B+" = 3.3,
"B" = 3.0,
"B-" = 2.7,
"C+" = 2.3,
"C" = 2.0,
"C-" = 1.7,
"D+" = 1.3,
"D" = 1.0,
"D-" = 0.7,
"F" = 0.0)
colnames(grd)<-c("A", "A-", "B+", "B", "B-", "C+", "C", "C-", "D+", "D", "D-", "F")
pander(grd)
```
</center>
<!-- <center> -->
<!-- {| class="wikitable" style="text-align:center" -->
<!-- |colspan="12" | Grade Point Scale at BYU-Idaho -->
<!-- |- -->
<!-- |A || A- || B+ || B || B- || C+ || C || C- || D+ || D || D- || F -->
<!-- |- -->
<!-- | 4.0 || 3.7 || 3.3 || 3.0 || 2.7 || 2.3 || 2.0 || 1.7 || 1.3 || 1.0 || 0.7 || 0.0 -->
<!-- |- -->
<!-- |} -->
<!-- </center> -->
Historical records indicate that the standard deviation of the numerical value for the grades students' receive in a course is $\sigma = 0.68$.
Due to the confidential nature of grades, the administration will not release the current value of the mean grade earned on campus. An old reference indicates that the true mean value of the grades earned at BYU-Idaho students is 3.15. It would be interesting to estimate the current mean grade point average at BYU-Idaho.
You have been asked to help study the issue of grade inflation at BYU-Idaho. A certain number of individual grades will need to be sampled to estimate the mean grade earned at BYU-Idaho.
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
20. What sample size would be required to estimate the true mean grade earned at BYU-Idaho with 95% confidence and a margin of error of 0.2 grade points?
<a href="javascript:showhide('Q20')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q20" style="display:none;">
<center>
$$
\begin{array}{1cl}
n &= \left( \frac{z^*\sigma}{m} \right)^2 \\
&= \left( \frac{1.96 \cdot 0.68}{0.2} \right)^2 \\
&= 44.4
\end{array}
$$
</center>
* We always round up in sample size calculations, so the required sample size is $n=45$.
</div>
<br>
21. What sample size would be required to estimate the true mean grade earned at BYU-Idaho with 98% confidence and a margin of error of 0.2 grade points?
<a href="javascript:showhide('Q21')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q21" style="display:none;">
* Using the Normal Probability Applet, we find that $z^* = 2.3263$ for a 98% confidence level.
<center>
$$
\begin{array}{1cl}
n &= \left( \frac{z^*\sigma}{m} \right)^2 \\
&= \left( \frac{2.3263 \cdot 0.68}{0.2} \right)^2 \\
&= 62.6
\end{array}
$$
</center>
* We always round up in sample size calculations, so the required sample size is $n=63$.
</div>
<br>
22. What sample size would be required to estimate the true mean grade earned at BYU-Idaho with 95% confidence and a margin of error of 0.1 grade points?
<a href="javascript:showhide('Q22')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q22" style="display:none;">
<center>
$$
\begin{array}{1cl}
n &= \left( \frac{z^*\sigma}{m} \right)^2 \\
&= \left( \frac{1.96 \cdot 0.68}{0.1} \right)^2 \\
&= 177.6
\end{array}
$$
</center>
We always round up in sample size calculations, so the required sample size is $n=178$.
</div>
<br>
23. What sample size would be required to estimate the true mean grade earned at BYU-Idaho with 98% confidence and a margin of error of 0.1 grade points?
<a href="javascript:showhide('Q23')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q23" style="display:none;">
<center>
$$
\begin{array}{1cl}
n &= \left( \frac{z^*\sigma}{m} \right)^2 \\
&= \left( \frac{2.326 \cdot 0.68}{0.1} \right)^2 \\
&= 250.2
\end{array}
$$
</center>
We always round up in sample size calculations, so the required sample size is $n=251$.
</div>
</div>
<br>
<br>
## Comments
What confidence level should be used? The most common choice for the confidence level is 95%. Other values are occasionally used, but this tends to meet the needs of most researchers.
How do we find the value of $\sigma$? In a statistics class, this number will be given to you in the problem. In real life, you will need to estimate this value based on either ($i$) a previous study, ($ii$) the results of a pilot study, or ($iii$) your best professional judgement (an educated guess). Sometimes it is not possible to use data from a previous study or to easily conduct a pilot study. In these cases, the only thing a researcher can do is to use their understanding of the process to make a good guess of the value of $\sigma$.
Ultimately, sample size calculations are nothing more than a good estimate of the actual sample size that will be required.
In many cases, budgets will limit the sample size that can be drawn. A researcher must assess whether a smaller-than-ideal sample will suffice for their needs. This is one of the difficult issues that arises in the actual practice of statistics.
<br>
## Summary
<div class="SummaryHeading">Remember...</div>
<div class="Summary">
- The **margin of error** gives an estimate of the variability of responses. It is calculated as $\displaystyle{m=z^*\frac{\sigma}{\sqrt{n}}}$ where $z^*$ represents a calculated z-score corresponding to a particular confidence level.
- A **confidence interval** is an interval estimator used to give a range of plausible values for a parameter. The width of a confidence interval depends on the chosen confidence level (and its corresponding value of $z^*$) as well as the sample size ($n$). This is the equation for calculating confidence intervals:
$$\displaystyle{\left(\bar x-z^*\frac{\sigma}{\sqrt{n}},~\bar x+z^*\frac{\sigma}{\sqrt{n}}\right)}$$
- The **sample size formula** allows us to estimate the number of observations required to obtain a specific margin of error. $\displaystyle{n=\left(\frac{z^*\sigma}{m}\right)^2}$
</div>
<br>
## Navigation
<center>
| **Previous Reading** | **This Reading** | **Next Reading** |
| :------------------: | :--------------: | :--------------: |
| [Lesson 9: <br> Inference for One Mean: Sigma Known (Hypothesis Test)](Lesson09.html) | Lesson 10: <br> Inference for One Mean: Sigma Known (Confidence Interval) | [Lesson 11: <br> Inference for One Mean: Sigma Unknown](Lesson11.html) |
</center>