-
Notifications
You must be signed in to change notification settings - Fork 3
/
Lesson24.Rmd
116 lines (89 loc) · 5.56 KB
/
Lesson24.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "Lesson 24: Review for Exam 4"
output:
html_document:
theme: cerulean
toc: true
toc_float: false
---
<script type="text/javascript">
function showhide(id) {
var e = document.getElementById(id);
e.style.display = (e.style.display == 'block') ? 'none' : 'block';
}
</script>
<br />
## Unit 4 Lesson Outcomes
<a href="javascript:showhide('oc')"><span style="font-size:8pt;">Show/Hide Outcomes</span></a>
<div id="oc" style="display:none;">
The expectation on the exam is the following outcomes. You should be able to do:
- All of the Outcomes from [Lesson 08 (Unit 1)](Lesson08.html), [Lesson 15 (Unit 2)](Lesson15.html), and [Lesson 20 (Unit 3)](Lesson20.html)
- Create a scatterplot of bivariate data.
- Interpret the overall pattern in a scatter plot to assess linearity and direction.
- Calculate the correlation coefficient.
- Interpret the correlation coefficient, $r$, as a measure of strength and direction of a linear relationship between two variables.
- Identify the explanatory and response variable in a study.
- Calculate the slope and intercept of a regression model.
- Interpret the slope of the regression model.
- Make predictions using a regression model
- Confidence Intervals for the slope of the regression line:
+ Calculate and interpret a confidence interval for the slope of the regression line given a confidence level.
+ Identify a point estimate and margin of error for the confidence interval.
+ Show the appropriate connections between the numerical and graphical summaries that support the confidence interval.
+ Check the requirements for the confidence interval.
- Hypothesis Testing for the slope of the regression line:
+ State the null and alternative hypothesis.
+ Calculate the test-statistic, degrees of freedom and p-value of the hypothesis test.
+ Assess the statistical significance by comparing the p-value to the $\alpha$-level.
+ Check the requirements for the hypothesis test.
+ Show the appropriate connections between the numerical and graphical summaries that support the hypothesis test.
+ Draw a correct conclusion for the hypothesis test.
</div>
<br>
## Unit 4 Lesson Summaries
Here are the summaries for each lesson in unit 4. Reviewing these key points from each lesson will help you in your preparation for the exam.
<a href="javascript:showhide('su')"><span style="font-size:8pt;">Show/Hide Summaries</span></a>
<div id="su">
<div class="RecapHeading">Lesson 21 Recap</div>
<div class="Summary">
- Creating **scatterplots** of bivariate data allows us to visualize the data by helping us understand its **shape** (linear or nonlinear), **direction** (positive, negative, or neither), and **strength** (strong, moderate, or weak).
- The **correlation coefficient ($r$)** is a number between $-1$ and $1$ that tells us the direction and strength of the linear association between two variables. A positive $r$ corresponds to a **positive association** while a negative $r$ corresponds to a **negative association**. A value of $r$ closer to $-1$ or $1$ indicates a stronger association than a value of $r$ closer to zero.
<br>
</div>
<br>
<div class="RecapHeading">Lesson 22 Recap</div>
<div class="Summary">
- In statistics, we write the **linear regression equation** as $\hat Y=b_0+b_1X$ where $b_0$ is the **Y-intercept** of the line and $b_1$ is the **slope** of the line. The values of $b_0$ and $b_1$ are calculated using software.
- Linear regression allows us to predict values of $Y$ for a given $X$. This is done by first calculating the coefficients $b_0$ and $b_1$ and then plugging in the desired value of $X$ and solving for $Y$.
- The **independent (or explanatory) variable ($X$)** is the variable which is *not* affected by what happens to the other variable. The **dependent (or response) variable ($Y$)** is the variable which *is* affected by what happens to the other variable. For example, in the correlation between number of powerboats and number of manatee deaths, the number of deaths is affected by the number of powerboats in the water, but not the other way around. So, we would assign $X$ to represent the number of powerboats and $Y$ to represent the number of manatee deaths.
<br>
</div>
<br>
<div class="RecapHeading">Lesson 23 Recap</div>
<div class="Summary">
- The unknown **true linear regression line** is $Y=\beta_0+\beta_1X$ where $\beta_0$ is the true y-intercept of the line and $\beta_1$ is the true slope of the line.
- A **residual** is the difference between the observed value of $Y$ for a given $X$ and the predicted value of $Y$ on the regression line for the same $X$. It can be expressed as:
$$
Residual = Y - \hat Y = Y - (b_0 + b_1 X)
$$
- To check all the requirements for bivariate inference you will need to create a **scatterplot** of $X$ and $Y$, a **residual plot**, and a **histogram of the residuals**.
- We conduct a hypothesis test on bivariate data to know if there is a linear relationship between the two variables. To determine this, we test the slope ($\beta_1$) on whether or not it equals zero. The appropriate hypotheses for this test are:
$$
\begin{array}{1cl}
H_0: & \beta_1=0 \\
H_a: & \beta_1\ne0
\end{array}
$$
- For bivariate inference we use software to calculate the sample coefficients, residuals, test statistic, $P$-value, and confidence intervals of the true linear regression coefficients.
<br>
</div>
<br>
</div>
<br>
<br>
## Navigation
<center>
| **Previous Reading** | **This Reading** |
| :------------------: | :--------------: |
| [Lesson 23: <br> Inference for Bivariate Data](Lesson23.html) | Lesson 24: <br> Review for Exam 4 |
</center>