From 71cca9796152ae85ddcb80156cd52b05b2bf2a7d Mon Sep 17 00:00:00 2001 From: Hjalt Date: Fri, 31 Jul 2015 15:14:31 +0200 Subject: [PATCH] The coefficients were off, probably because the wrong seed was used, hence so were the conclusions. Both have been adjusted. A note about heteroscedasticity was also added. --- ch3/applied.Rmd | 13 +- ch3/applied.html | 1170 +++++++++++++++++++++++----------------------- 2 files changed, 602 insertions(+), 581 deletions(-) diff --git a/ch3/applied.Rmd b/ch3/applied.Rmd index e5e0554..f240208 100644 --- a/ch3/applied.Rmd +++ b/ch3/applied.Rmd @@ -56,7 +56,8 @@ par(mfrow=c(2,2)) plot(lm.fit) ``` -Based on the residuals plots, there is some evidence of non-linearity. +Based on the residuals plots, there is some evidence of non-linearity as weell +as heteroscedasticity. 9a. @@ -473,11 +474,13 @@ plot(x1, x2) lm.fit = lm(y~x1+x2) summary(lm.fit) ``` -$$\beta_0 = 2.0533, \beta_1 = 1.6336, \beta_3 = 0.5588$$ -The regression coefficients are close to the true coefficients, although with -high standard error. We can reject the null hypothesis for $\beta_1$ because +$$\beta_0 = 2.1305, \beta_1 = 1.4396, \beta_3 = 1.0097 +The intercept is close to the true coefficient, while $\beta_1$ and $\beta_2$ +are deflated and inflated respectively. The standard error +is moreover high and the model covers 21 % of the overall variance. We can +reject the null hypothesis for $\beta_1$ because its p-value is below 5%. We cannot reject the null hypothesis for $\beta_2$ -because its p-value is much above the 5% typical cutoff, over 60%. +because its p-value is much above the 5% typical cutoff. 14d. ---- diff --git a/ch3/applied.html b/ch3/applied.html index e43bad0..a9ed250 100644 --- a/ch3/applied.html +++ b/ch3/applied.html @@ -12,15 +12,22 @@ - - - - - + + + + + + - - + +
@@ -59,22 +74,22 @@

8a.

Auto = read.csv("../data/Auto.csv", header=T, na.strings="?")
 Auto = na.omit(Auto)
 summary(Auto)
-
##       mpg         cylinders     displacement   horsepower   
-##  Min.   : 9.0   Min.   :3.00   Min.   : 68   Min.   : 46.0  
-##  1st Qu.:17.0   1st Qu.:4.00   1st Qu.:105   1st Qu.: 75.0  
-##  Median :22.8   Median :4.00   Median :151   Median : 93.5  
-##  Mean   :23.4   Mean   :5.47   Mean   :194   Mean   :104.5  
-##  3rd Qu.:29.0   3rd Qu.:8.00   3rd Qu.:276   3rd Qu.:126.0  
-##  Max.   :46.6   Max.   :8.00   Max.   :455   Max.   :230.0  
-##                                                             
-##      weight      acceleration       year        origin    
-##  Min.   :1613   Min.   : 8.0   Min.   :70   Min.   :1.00  
-##  1st Qu.:2225   1st Qu.:13.8   1st Qu.:73   1st Qu.:1.00  
-##  Median :2804   Median :15.5   Median :76   Median :1.00  
-##  Mean   :2978   Mean   :15.5   Mean   :76   Mean   :1.58  
-##  3rd Qu.:3615   3rd Qu.:17.0   3rd Qu.:79   3rd Qu.:2.00  
-##  Max.   :5140   Max.   :24.8   Max.   :82   Max.   :3.00  
-##                                                           
+
##       mpg          cylinders      displacement     horsepower   
+##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0  
+##  1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0  
+##  Median :22.75   Median :4.000   Median :151.0   Median : 93.5  
+##  Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5  
+##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0  
+##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0  
+##                                                                 
+##      weight      acceleration        year           origin     
+##  Min.   :1613   Min.   : 8.00   Min.   :70.00   Min.   :1.000  
+##  1st Qu.:2225   1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000  
+##  Median :2804   Median :15.50   Median :76.00   Median :1.000  
+##  Mean   :2978   Mean   :15.54   Mean   :75.98   Mean   :1.577  
+##  3rd Qu.:3615   3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000  
+##  Max.   :5140   Max.   :24.80   Max.   :82.00   Max.   :3.000  
+##                                                                
 ##                  name    
 ##  amc matador       :  5  
 ##  ford pinto        :  5  
@@ -91,26 +106,26 @@ 

8a.

## lm(formula = mpg ~ horsepower) ## ## Residuals: -## Min 1Q Median 3Q Max -## -13.571 -3.259 -0.344 2.763 16.924 +## Min 1Q Median 3Q Max +## -13.5710 -3.2592 -0.3435 2.7630 16.9240 ## ## Coefficients: -## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 39.93586 0.71750 55.7 <2e-16 *** -## horsepower -0.15784 0.00645 -24.5 <2e-16 *** +## Estimate Std. Error t value Pr(>|t|) +## (Intercept) 39.935861 0.717499 55.66 <2e-16 *** +## horsepower -0.157845 0.006446 -24.49 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 4.91 on 390 degrees of freedom -## Multiple R-squared: 0.606, Adjusted R-squared: 0.605 -## F-statistic: 600 on 1 and 390 DF, p-value: <2e-16
+## Residual standard error: 4.906 on 390 degrees of freedom +## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049 +## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16

i.

Yes, there is a relationship between horsepower and mpg as deterined by testing the null hypothesis of all regression coefficients equal to zero. Since the F-statistic is far larger than 1 and the p-value of the F-statistic is close to zero we can reject the null hypothesis and state there is a statistically significant relationship between horsepower and mpg.

ii.

-

To calculate the residual error relative to the response we use the mean of the response and the RSE. The mean of mpg is 23.4459. The RSE of the lm.fit was 4.906 which indicates a percentage error of 20.9248%. The \(R^2\) of the lm.fit was about 0.6059, meaning 60.5948% of the variance in mpg is explained by horsepower.

+

To calculate the residual error relative to the response we use the mean of the response and the RSE. The mean of mpg is 23.4459184. The RSE of the lm.fit was 4.906 which indicates a percentage error of 20.9247508%. The \(R^2\) of the lm.fit was about 0.6059483, meaning 60.5948258% of the variance in mpg is explained by horsepower.

iii.

@@ -119,52 +134,52 @@

iii.

iv.

predict(lm.fit, data.frame(horsepower=c(98)), interval="confidence")
-
##     fit   lwr   upr
-## 1 24.47 23.97 24.96
+
##        fit      lwr      upr
+## 1 24.46708 23.97308 24.96108
predict(lm.fit, data.frame(horsepower=c(98)), interval="prediction")
-
##     fit   lwr   upr
-## 1 24.47 14.81 34.12
+
##        fit     lwr      upr
+## 1 24.46708 14.8094 34.12476

8b.

plot(horsepower, mpg)
 abline(lm.fit)
-

plot of chunk unnamed-chunk-3

+

8c.

par(mfrow=c(2,2))
 plot(lm.fit)
-

plot of chunk unnamed-chunk-4

-

Based on the residuals plots, there is some evidence of non-linearity.

+

+

Based on the residuals plots, there is some evidence of non-linearity as weell as heteroscedasticity.

9a.

pairs(Auto)
-

plot of chunk unnamed-chunk-5

+

9b.

cor(subset(Auto, select=-name))
-
##                  mpg cylinders displacement horsepower  weight
-## mpg           1.0000   -0.7776      -0.8051    -0.7784 -0.8322
-## cylinders    -0.7776    1.0000       0.9508     0.8430  0.8975
-## displacement -0.8051    0.9508       1.0000     0.8973  0.9330
-## horsepower   -0.7784    0.8430       0.8973     1.0000  0.8645
-## weight       -0.8322    0.8975       0.9330     0.8645  1.0000
-## acceleration  0.4233   -0.5047      -0.5438    -0.6892 -0.4168
-## year          0.5805   -0.3456      -0.3699    -0.4164 -0.3091
-## origin        0.5652   -0.5689      -0.6145    -0.4552 -0.5850
-##              acceleration    year  origin
-## mpg                0.4233  0.5805  0.5652
-## cylinders         -0.5047 -0.3456 -0.5689
-## displacement      -0.5438 -0.3699 -0.6145
-## horsepower        -0.6892 -0.4164 -0.4552
-## weight            -0.4168 -0.3091 -0.5850
-## acceleration       1.0000  0.2903  0.2127
-## year               0.2903  1.0000  0.1815
-## origin             0.2127  0.1815  1.0000
+
##                     mpg  cylinders displacement horsepower     weight
+## mpg           1.0000000 -0.7776175   -0.8051269 -0.7784268 -0.8322442
+## cylinders    -0.7776175  1.0000000    0.9508233  0.8429834  0.8975273
+## displacement -0.8051269  0.9508233    1.0000000  0.8972570  0.9329944
+## horsepower   -0.7784268  0.8429834    0.8972570  1.0000000  0.8645377
+## weight       -0.8322442  0.8975273    0.9329944  0.8645377  1.0000000
+## acceleration  0.4233285 -0.5046834   -0.5438005 -0.6891955 -0.4168392
+## year          0.5805410 -0.3456474   -0.3698552 -0.4163615 -0.3091199
+## origin        0.5652088 -0.5689316   -0.6145351 -0.4551715 -0.5850054
+##              acceleration       year     origin
+## mpg             0.4233285  0.5805410  0.5652088
+## cylinders      -0.5046834 -0.3456474 -0.5689316
+## displacement   -0.5438005 -0.3698552 -0.6145351
+## horsepower     -0.6891955 -0.4163615 -0.4551715
+## weight         -0.4168392 -0.3091199 -0.5850054
+## acceleration    1.0000000  0.2903161  0.2127458
+## year            0.2903161  1.0000000  0.1815277
+## origin          0.2127458  0.1815277  1.0000000

9c.

@@ -175,25 +190,25 @@

9c.

## lm(formula = mpg ~ . - name, data = Auto) ## ## Residuals: -## Min 1Q Median 3Q Max -## -9.590 -2.157 -0.117 1.869 13.060 +## Min 1Q Median 3Q Max +## -9.5903 -2.1565 -0.1169 1.8690 13.0604 ## ## Coefficients: -## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -1.72e+01 4.64e+00 -3.71 0.00024 *** -## cylinders -4.93e-01 3.23e-01 -1.53 0.12780 -## displacement 1.99e-02 7.51e-03 2.65 0.00844 ** -## horsepower -1.70e-02 1.38e-02 -1.23 0.21963 -## weight -6.47e-03 6.52e-04 -9.93 < 2e-16 *** -## acceleration 8.06e-02 9.88e-02 0.82 0.41548 -## year 7.51e-01 5.10e-02 14.73 < 2e-16 *** -## origin 1.43e+00 2.78e-01 5.13 4.7e-07 *** +## Estimate Std. Error t value Pr(>|t|) +## (Intercept) -17.218435 4.644294 -3.707 0.00024 *** +## cylinders -0.493376 0.323282 -1.526 0.12780 +## displacement 0.019896 0.007515 2.647 0.00844 ** +## horsepower -0.016951 0.013787 -1.230 0.21963 +## weight -0.006474 0.000652 -9.929 < 2e-16 *** +## acceleration 0.080576 0.098845 0.815 0.41548 +## year 0.750773 0.050973 14.729 < 2e-16 *** +## origin 1.426141 0.278136 5.127 4.67e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 3.33 on 384 degrees of freedom -## Multiple R-squared: 0.821, Adjusted R-squared: 0.818 -## F-statistic: 252 on 7 and 384 DF, p-value: <2e-16 +## Residual standard error: 3.328 on 384 degrees of freedom +## Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182 +## F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16

i.

Yes, there is a relatioship between the predictors and the response by testing the null hypothesis of whether all the regression coefficients are zero. The F -statistic is far from 1 (with a small p-value), indicating evidence against the null hypothesis.

@@ -204,16 +219,16 @@

ii.

iii.

-

The regression coefficient for year, 0.7508, suggests that for every one year, mpg increases by the coefficient. In other words, cars become more fuel efficient every year by almost 1 mpg / year.

+

The regression coefficient for year, 0.7507727, suggests that for every one year, mpg increases by the coefficient. In other words, cars become more fuel efficient every year by almost 1 mpg / year.

9d.

par(mfrow=c(2,2))
 plot(lm.fit1)
-

plot of chunk unnamed-chunk-8 The fit does not appear to be accurate because there is a discernible curve pattern to the residuals plots. From the leverage plot, point 14 appears to have high leverage, although not a high magnitude residual.

+

The fit does not appear to be accurate because there is a discernible curve pattern to the residuals plots. From the leverage plot, point 14 appears to have high leverage, although not a high magnitude residual.

plot(predict(lm.fit1), rstudent(lm.fit1))
-

plot of chunk unnamed-chunk-9 There are possible outliers as seen in the plot of studentized residuals because there are data with a value greater than 3.

+

There are possible outliers as seen in the plot of studentized residuals because there are data with a value greater than 3.

9e.

@@ -225,23 +240,23 @@

9e.

## weight) ## ## Residuals: -## Min 1Q Median 3Q Max -## -13.293 -2.518 -0.348 1.840 17.772 +## Min 1Q Median 3Q Max +## -13.2934 -2.5184 -0.3476 1.8399 17.7723 ## ## Coefficients: -## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 5.26e+01 2.24e+00 23.52 < 2e-16 *** -## cylinders 7.61e-01 7.67e-01 0.99 0.32 -## displacement -7.35e-02 1.67e-02 -4.40 1.4e-05 *** -## weight -9.89e-03 1.33e-03 -7.44 6.7e-13 *** -## cylinders:displacement -2.99e-03 3.43e-03 -0.87 0.38 -## displacement:weight 2.13e-05 5.00e-06 4.25 2.6e-05 *** +## Estimate Std. Error t value Pr(>|t|) +## (Intercept) 5.262e+01 2.237e+00 23.519 < 2e-16 *** +## cylinders 7.606e-01 7.669e-01 0.992 0.322 +## displacement -7.351e-02 1.669e-02 -4.403 1.38e-05 *** +## weight -9.888e-03 1.329e-03 -7.438 6.69e-13 *** +## cylinders:displacement -2.986e-03 3.426e-03 -0.872 0.384 +## displacement:weight 2.128e-05 5.002e-06 4.254 2.64e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 4.1 on 386 degrees of freedom -## Multiple R-squared: 0.727, Adjusted R-squared: 0.724 -## F-statistic: 206 on 5 and 386 DF, p-value: <2e-16 +## Residual standard error: 4.103 on 386 degrees of freedom +## Multiple R-squared: 0.7272, Adjusted R-squared: 0.7237 +## F-statistic: 205.8 on 5 and 386 DF, p-value: < 2.2e-16

From the correlation matrix, I obtained the two highest correlated pairs and used them in picking my interaction effects. From the p-values, we can see that the interaction between displacement and weight is statistically signifcant, while the interactiion between cylinders and displacement is not.

@@ -254,27 +269,27 @@

9f.

## I(acceleration^2)) ## ## Residuals: -## Min 1Q Median 3Q Max -## -11.293 -2.508 -0.224 2.024 15.765 +## Min 1Q Median 3Q Max +## -11.2932 -2.5082 -0.2237 2.0237 15.7650 ## ## Coefficients: -## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 178.3030 10.8045 16.50 < 2e-16 *** -## log(weight) -14.7426 1.7399 -8.47 5.1e-16 *** -## sqrt(horsepower) -1.8519 0.3600 -5.14 4.3e-07 *** -## acceleration -2.1989 0.6390 -3.44 0.00064 *** -## I(acceleration^2) 0.0614 0.0186 3.31 0.00104 ** +## Estimate Std. Error t value Pr(>|t|) +## (Intercept) 178.30303 10.80451 16.503 < 2e-16 *** +## log(weight) -14.74259 1.73994 -8.473 5.06e-16 *** +## sqrt(horsepower) -1.85192 0.36005 -5.144 4.29e-07 *** +## acceleration -2.19890 0.63903 -3.441 0.000643 *** +## I(acceleration^2) 0.06139 0.01857 3.305 0.001037 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.99 on 387 degrees of freedom -## Multiple R-squared: 0.741, Adjusted R-squared: 0.739 -## F-statistic: 277 on 4 and 387 DF, p-value: <2e-16 +## Multiple R-squared: 0.7414, Adjusted R-squared: 0.7387 +## F-statistic: 277.3 on 4 and 387 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
 plot(lm.fit3)
-

plot of chunk unnamed-chunk-11

+

plot(predict(lm.fit3), rstudent(lm.fit3))
-

plot of chunk unnamed-chunk-12 Apparently, from the p-values, the log(weight), sqrt(horsepower), and acceleration^2 all have statistical significance of some sort. The residuals plot has less of a discernible pattern than the plot of all linear regression terms. The studentized residuals displays potential outliers (>3). The leverage plot indicates more than three points with high leverage.

+

Apparently, from the p-values, the log(weight), sqrt(horsepower), and acceleration^2 all have statistical significance of some sort. The residuals plot has less of a discernible pattern than the plot of all linear regression terms. The studentized residuals displays potential outliers (>3). The leverage plot indicates more than three points with high leverage.

However, 2 problems are observed from the above plots: 1) the residuals vs fitted plot indicates heteroskedasticity (unconstant variance over mean) in the model. 2) The Q-Q plot indicates somewhat unnormality of the residuals.

So, a better transformation need to be applied to our model. From the correlation matrix in 9a., displacement, horsepower and weight show a similar nonlinear pattern against our response mpg. This nonlinear pattern is very close to a log form. So in the next attempt, we use log(mpg) as our response variable.

The outputs show that log transform of mpg yield better model fitting (better R^2, normality of residuals).

@@ -286,30 +301,30 @@

9f.

## weight + acceleration + year + origin, data = Auto) ## ## Residuals: -## Min 1Q Median 3Q Max -## -0.4096 -0.0653 0.0008 0.0679 0.3392 +## Min 1Q Median 3Q Max +## -0.40955 -0.06533 0.00079 0.06785 0.33925 ## ## Coefficients: -## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 1.75e+00 1.66e-01 10.53 < 2e-16 *** -## cylinders -2.79e-02 1.16e-02 -2.42 0.016 * -## displacement 6.36e-04 2.69e-04 2.37 0.019 * -## horsepower -1.47e-03 4.94e-04 -2.99 0.003 ** -## weight -2.55e-04 2.33e-05 -10.93 < 2e-16 *** -## acceleration -1.35e-03 3.54e-03 -0.38 0.703 -## year 2.96e-02 1.82e-03 16.21 < 2e-16 *** -## origin 4.07e-02 9.96e-03 4.09 5.3e-05 *** +## Estimate Std. Error t value Pr(>|t|) +## (Intercept) 1.751e+00 1.662e-01 10.533 < 2e-16 *** +## cylinders -2.795e-02 1.157e-02 -2.415 0.01619 * +## displacement 6.362e-04 2.690e-04 2.365 0.01852 * +## horsepower -1.475e-03 4.935e-04 -2.989 0.00298 ** +## weight -2.551e-04 2.334e-05 -10.931 < 2e-16 *** +## acceleration -1.348e-03 3.538e-03 -0.381 0.70339 +## year 2.958e-02 1.824e-03 16.211 < 2e-16 *** +## origin 4.071e-02 9.955e-03 4.089 5.28e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 0.119 on 384 degrees of freedom -## Multiple R-squared: 0.88, Adjusted R-squared: 0.877 -## F-statistic: 400 on 7 and 384 DF, p-value: <2e-16 +## Residual standard error: 0.1191 on 384 degrees of freedom +## Multiple R-squared: 0.8795, Adjusted R-squared: 0.8773 +## F-statistic: 400.4 on 7 and 384 DF, p-value: < 2.2e-16
par(mfrow=c(2,2)) 
 plot(lm.fit2)
-

plot of chunk unnamed-chunk-14

+

plot(predict(lm.fit2),rstudent(lm.fit2))
-

plot of chunk unnamed-chunk-14

+

10a.

@@ -321,27 +336,27 @@

10a.

## ## Auto
summary(Carseats)
-
##      Sales         CompPrice       Income       Advertising   
-##  Min.   : 0.00   Min.   : 77   Min.   : 21.0   Min.   : 0.00  
-##  1st Qu.: 5.39   1st Qu.:115   1st Qu.: 42.8   1st Qu.: 0.00  
-##  Median : 7.49   Median :125   Median : 69.0   Median : 5.00  
-##  Mean   : 7.50   Mean   :125   Mean   : 68.7   Mean   : 6.63  
-##  3rd Qu.: 9.32   3rd Qu.:135   3rd Qu.: 91.0   3rd Qu.:12.00  
-##  Max.   :16.27   Max.   :175   Max.   :120.0   Max.   :29.00  
-##    Population      Price      ShelveLoc        Age         Education   
-##  Min.   : 10   Min.   : 24   Bad   : 96   Min.   :25.0   Min.   :10.0  
-##  1st Qu.:139   1st Qu.:100   Good  : 85   1st Qu.:39.8   1st Qu.:12.0  
-##  Median :272   Median :117   Medium:219   Median :54.5   Median :14.0  
-##  Mean   :265   Mean   :116                Mean   :53.3   Mean   :13.9  
-##  3rd Qu.:398   3rd Qu.:131                3rd Qu.:66.0   3rd Qu.:16.0  
-##  Max.   :509   Max.   :191                Max.   :80.0   Max.   :18.0  
-##  Urban       US     
-##  No :118   No :142  
-##  Yes:282   Yes:258  
-##                     
-##                     
-##                     
-## 
+
##      Sales          CompPrice       Income        Advertising    
+##  Min.   : 0.000   Min.   : 77   Min.   : 21.00   Min.   : 0.000  
+##  1st Qu.: 5.390   1st Qu.:115   1st Qu.: 42.75   1st Qu.: 0.000  
+##  Median : 7.490   Median :125   Median : 69.00   Median : 5.000  
+##  Mean   : 7.496   Mean   :125   Mean   : 68.66   Mean   : 6.635  
+##  3rd Qu.: 9.320   3rd Qu.:135   3rd Qu.: 91.00   3rd Qu.:12.000  
+##  Max.   :16.270   Max.   :175   Max.   :120.00   Max.   :29.000  
+##    Population        Price        ShelveLoc        Age       
+##  Min.   : 10.0   Min.   : 24.0   Bad   : 96   Min.   :25.00  
+##  1st Qu.:139.0   1st Qu.:100.0   Good  : 85   1st Qu.:39.75  
+##  Median :272.0   Median :117.0   Medium:219   Median :54.50  
+##  Mean   :264.8   Mean   :115.8                Mean   :53.32  
+##  3rd Qu.:398.5   3rd Qu.:131.0                3rd Qu.:66.00  
+##  Max.   :509.0   Max.   :191.0                Max.   :80.00  
+##    Education    Urban       US     
+##  Min.   :10.0   No :118   No :142  
+##  1st Qu.:12.0   Yes:282   Yes:258  
+##  Median :14.0                      
+##  Mean   :13.9                      
+##  3rd Qu.:16.0                      
+##  Max.   :18.0
attach(Carseats)
 lm.fit = lm(Sales~Price+Urban+US)
 summary(lm.fit)
@@ -350,21 +365,21 @@

10a.

## lm(formula = Sales ~ Price + Urban + US) ## ## Residuals: -## Min 1Q Median 3Q Max -## -6.921 -1.622 -0.056 1.579 7.058 +## Min 1Q Median 3Q Max +## -6.9206 -1.6220 -0.0564 1.5786 7.0581 ## ## Coefficients: -## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 13.04347 0.65101 20.04 < 2e-16 *** -## Price -0.05446 0.00524 -10.39 < 2e-16 *** -## UrbanYes -0.02192 0.27165 -0.08 0.94 -## USYes 1.20057 0.25904 4.63 4.9e-06 *** +## Estimate Std. Error t value Pr(>|t|) +## (Intercept) 13.043469 0.651012 20.036 < 2e-16 *** +## Price -0.054459 0.005242 -10.389 < 2e-16 *** +## UrbanYes -0.021916 0.271650 -0.081 0.936 +## USYes 1.200573 0.259042 4.635 4.86e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 2.47 on 396 degrees of freedom -## Multiple R-squared: 0.239, Adjusted R-squared: 0.234 -## F-statistic: 41.5 on 3 and 396 DF, p-value: <2e-16 +## Residual standard error: 2.472 on 396 degrees of freedom +## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335 +## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16

10b.

@@ -398,20 +413,20 @@

10e.

## lm(formula = Sales ~ Price + US) ## ## Residuals: -## Min 1Q Median 3Q Max -## -6.927 -1.629 -0.057 1.577 7.052 +## Min 1Q Median 3Q Max +## -6.9269 -1.6286 -0.0574 1.5766 7.0515 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 13.03079 0.63098 20.65 < 2e-16 *** -## Price -0.05448 0.00523 -10.42 < 2e-16 *** -## USYes 1.19964 0.25846 4.64 4.7e-06 *** +## (Intercept) 13.03079 0.63098 20.652 < 2e-16 *** +## Price -0.05448 0.00523 -10.416 < 2e-16 *** +## USYes 1.19964 0.25846 4.641 4.71e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 2.47 on 397 degrees of freedom -## Multiple R-squared: 0.239, Adjusted R-squared: 0.235 -## F-statistic: 62.4 on 2 and 397 DF, p-value: <2e-16 +## Residual standard error: 2.469 on 397 degrees of freedom +## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354 +## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16

10f.

@@ -420,18 +435,18 @@

10f.

10g.

confint(lm.fit2)
-
##                2.5 %  97.5 %
-## (Intercept) 11.79032 14.2713
-## Price       -0.06476 -0.0442
-## USYes        0.69152  1.7078
+
##                   2.5 %      97.5 %
+## (Intercept) 11.79032020 14.27126531
+## Price       -0.06475984 -0.04419543
+## USYes        0.69151957  1.70776632

10h.

plot(predict(lm.fit2), rstudent(lm.fit2))
-

plot of chunk unnamed-chunk-18 All studentized residuals appear to be bounded by -3 to 3, so not potential outliers are suggested from the linear regression.

+

All studentized residuals appear to be bounded by -3 to 3, so not potential outliers are suggested from the linear regression.

par(mfrow=c(2,2))
 plot(lm.fit2)
-

plot of chunk unnamed-chunk-19 There are a few observations that greatly exceed \((p+1)/n\) (0.0076) on the leverage-statistic plot that suggest that the corresponding points have high leverage.

+

There are a few observations that greatly exceed \((p+1)/n\) (0.0075567) on the leverage-statistic plot that suggest that the corresponding points have high leverage.

11.

@@ -448,18 +463,18 @@

11a.

## lm(formula = y ~ x + 0) ## ## Residuals: -## Min 1Q Median 3Q Max -## -1.915 -0.647 -0.177 0.506 2.311 +## Min 1Q Median 3Q Max +## -1.9154 -0.6472 -0.1771 0.5056 2.3109 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## x 1.994 0.106 18.7 <2e-16 *** +## x 1.9939 0.1065 18.73 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 0.959 on 99 degrees of freedom -## Multiple R-squared: 0.78, Adjusted R-squared: 0.778 -## F-statistic: 351 on 1 and 99 DF, p-value: <2e-16 +## Residual standard error: 0.9586 on 99 degrees of freedom +## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776 +## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16

The p-value of the t-statistic is near zero so the null hypothesis is rejected.

@@ -471,27 +486,27 @@

11b.

## lm(formula = x ~ y + 0) ## ## Residuals: -## Min 1Q Median 3Q Max -## -0.870 -0.237 0.103 0.286 0.894 +## Min 1Q Median 3Q Max +## -0.8699 -0.2368 0.1030 0.2858 0.8938 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## y 0.3911 0.0209 18.7 <2e-16 *** +## y 0.39111 0.02089 18.73 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 0.425 on 99 degrees of freedom -## Multiple R-squared: 0.78, Adjusted R-squared: 0.778 -## F-statistic: 351 on 1 and 99 DF, p-value: <2e-16 +## Residual standard error: 0.4246 on 99 degrees of freedom +## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776 +## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16

The p-value of the t-statistic is near zero so the null hypothesis is rejected.

11c.

-

Both results in (a) and (b) reflect the same line created in 11a. In other words, \(y = 2x + \epsilon\) could also be written \(x = 0.5 * (y - \epsilon)\).

+

Both results in (a) and (b) reflect the same line created in 11a. In other words, \(y = 2x + \epsilon\) could also be written \(x = 0.5 * (y - \epsilon)\).

11d.

-

\[ +

\[ \begin{array}{cc} t = \beta / SE(\beta) & \beta = \frac {\sum{x_i y_i}} {\sum{x_i^2}} & @@ -519,7 +534,7 @@

11d.

{\sqrt{\sum{x_i^2} \sum{y_i^2} - (\sum{x_i y_i})^2 }} \]

(sqrt(length(x)-1) * sum(x*y)) / (sqrt(sum(x*x) * sum(y*y) - (sum(x*y))^2))
-
## [1] 18.73
+
## [1] 18.72593

This is same as the t-statistic shown above.

@@ -536,38 +551,38 @@

11f.

## lm(formula = y ~ x) ## ## Residuals: -## Min 1Q Median 3Q Max -## -1.877 -0.614 -0.140 0.539 2.346 +## Min 1Q Median 3Q Max +## -1.8768 -0.6138 -0.1395 0.5394 2.3462 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -0.0377 0.0970 -0.39 0.7 -## x 1.9989 0.1077 18.56 <2e-16 *** +## (Intercept) -0.03769 0.09699 -0.389 0.698 +## x 1.99894 0.10773 18.556 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 0.963 on 98 degrees of freedom -## Multiple R-squared: 0.778, Adjusted R-squared: 0.776 -## F-statistic: 344 on 1 and 98 DF, p-value: <2e-16 +## Residual standard error: 0.9628 on 98 degrees of freedom +## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762 +## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
summary(lm.fit2)
## 
 ## Call:
 ## lm(formula = x ~ y)
 ## 
 ## Residuals:
-##     Min      1Q  Median      3Q     Max 
-## -0.9085 -0.2810  0.0627  0.2457  0.8574 
+##      Min       1Q   Median       3Q      Max 
+## -0.90848 -0.28101  0.06274  0.24570  0.85736 
 ## 
 ## Coefficients:
 ##             Estimate Std. Error t value Pr(>|t|)    
-## (Intercept)   0.0388     0.0427    0.91     0.37    
-## y             0.3894     0.0210   18.56   <2e-16 ***
+## (Intercept)  0.03880    0.04266    0.91    0.365    
+## y            0.38942    0.02099   18.56   <2e-16 ***
 ## ---
 ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 ## 
-## Residual standard error: 0.425 on 98 degrees of freedom
-## Multiple R-squared:  0.778,  Adjusted R-squared:  0.776 
-## F-statistic:  344 on 1 and 98 DF,  p-value: <2e-16
+## Residual standard error: 0.4249 on 98 degrees of freedom +## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762 +## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16

You can see the t-statistic is the same for the two linear regressions.

@@ -582,41 +597,45 @@

12b.

lm.fit = lm(y~x+0) lm.fit2 = lm(x~y+0) summary(lm.fit) +
## Warning in summary.lm(lm.fit): essentially perfect fit: summary may be
+## unreliable
## 
 ## Call:
 ## lm(formula = y ~ x + 0)
 ## 
 ## Residuals:
-##       Min        1Q    Median        3Q       Max 
-## -3.78e-16 -3.38e-17  2.70e-18  6.11e-17  5.11e-16 
+##        Min         1Q     Median         3Q        Max 
+## -3.776e-16 -3.378e-17  2.680e-18  6.113e-17  5.105e-16 
 ## 
 ## Coefficients:
-##   Estimate Std. Error  t value Pr(>|t|)    
-## x  2.0e+00    1.3e-17 1.54e+17   <2e-16 ***
+##    Estimate Std. Error   t value Pr(>|t|)    
+## x 2.000e+00  1.296e-17 1.543e+17   <2e-16 ***
 ## ---
 ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 ## 
-## Residual standard error: 1.17e-16 on 99 degrees of freedom
-## Multiple R-squared:     1,   Adjusted R-squared:     1 
-## F-statistic: 2.38e+34 on 1 and 99 DF,  p-value: <2e-16
+## Residual standard error: 1.167e-16 on 99 degrees of freedom +## Multiple R-squared: 1, Adjusted R-squared: 1 +## F-statistic: 2.382e+34 on 1 and 99 DF, p-value: < 2.2e-16
summary(lm.fit2)
+
## Warning in summary.lm(lm.fit2): essentially perfect fit: summary may be
+## unreliable
## 
 ## Call:
 ## lm(formula = x ~ y + 0)
 ## 
 ## Residuals:
-##       Min        1Q    Median        3Q       Max 
-## -1.89e-16 -1.69e-17  1.34e-18  3.06e-17  2.55e-16 
+##        Min         1Q     Median         3Q        Max 
+## -1.888e-16 -1.689e-17  1.339e-18  3.057e-17  2.552e-16 
 ## 
 ## Coefficients:
-##   Estimate Std. Error  t value Pr(>|t|)    
-## y 5.00e-01   3.24e-18 1.54e+17   <2e-16 ***
+##   Estimate Std. Error   t value Pr(>|t|)    
+## y 5.00e-01   3.24e-18 1.543e+17   <2e-16 ***
 ## ---
 ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 ## 
-## Residual standard error: 5.83e-17 on 99 degrees of freedom
-## Multiple R-squared:     1,   Adjusted R-squared:     1 
-## F-statistic: 2.38e+34 on 1 and 99 DF,  p-value: <2e-16
+## Residual standard error: 5.833e-17 on 99 degrees of freedom +## Multiple R-squared: 1, Adjusted R-squared: 1 +## F-statistic: 2.382e+34 on 1 and 99 DF, p-value: < 2.2e-16

The regression coefficients are different for each linear regression.

@@ -625,9 +644,9 @@

12c.

x <- rnorm(100) y <- -sample(x, 100) sum(x^2) -
## [1] 81.06
+
## [1] 81.05509
sum(y^2)
-
## [1] 81.06
+
## [1] 81.05509
lm.fit <- lm(y~x+0)
 lm.fit2 <- lm(x~y+0)
 summary(lm.fit)
@@ -636,33 +655,33 @@

12c.

## lm(formula = y ~ x + 0) ## ## Residuals: -## Min 1Q Median 3Q Max -## -2.393 -0.688 -0.103 0.512 2.232 +## Min 1Q Median 3Q Max +## -2.3926 -0.6877 -0.1027 0.5124 2.2315 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## x -0.0215 0.1005 -0.21 0.83 +## x -0.02148 0.10048 -0.214 0.831 ## -## Residual standard error: 0.905 on 99 degrees of freedom -## Multiple R-squared: 0.000461, Adjusted R-squared: -0.00963 -## F-statistic: 0.0457 on 1 and 99 DF, p-value: 0.831 +## Residual standard error: 0.9046 on 99 degrees of freedom +## Multiple R-squared: 0.0004614, Adjusted R-squared: -0.009635 +## F-statistic: 0.0457 on 1 and 99 DF, p-value: 0.8312
summary(lm.fit2)
## 
 ## Call:
 ## lm(formula = x ~ y + 0)
 ## 
 ## Residuals:
-##    Min     1Q Median     3Q    Max 
-## -2.240 -0.515  0.121  0.679  2.396 
+##     Min      1Q  Median      3Q     Max 
+## -2.2400 -0.5154  0.1213  0.6788  2.3959 
 ## 
 ## Coefficients:
 ##   Estimate Std. Error t value Pr(>|t|)
-## y  -0.0215     0.1005   -0.21     0.83
+## y -0.02148    0.10048  -0.214    0.831
 ## 
-## Residual standard error: 0.905 on 99 degrees of freedom
-## Multiple R-squared:  0.000461,   Adjusted R-squared:  -0.00963 
-## F-statistic: 0.0457 on 1 and 99 DF,  p-value: 0.831
-

The regression coefficients are the same for each linear regression. So long as sum sum(x^2) = sum(y^2) the condition in 12a. will be satisfied. Here we have simply taken all the \(x_i\) in a different order and made them negative.

+## Residual standard error: 0.9046 on 99 degrees of freedom +## Multiple R-squared: 0.0004614, Adjusted R-squared: -0.009635 +## F-statistic: 0.0457 on 1 and 99 DF, p-value: 0.8312 +

The regression coefficients are the same for each linear regression. So long as sum sum(x^2) = sum(y^2) the condition in 12a. will be satisfied. Here we have simply taken all the \(x_i\) in a different order and made them negative.

13a.

@@ -676,12 +695,12 @@

13b.

13c.

y = -1 + 0.5*x + eps
-

y is of length 100. \(\beta_0\) is -1, \(\beta_1\) is 0.5.

+

y is of length 100. \(\beta_0\) is -1, \(\beta_1\) is 0.5.

13d.

plot(x, y)
-

plot of chunk unnamed-chunk-30 I observe a linear relationship between x and y with a positive slope, with a variance as is to be expected.

+

I observe a linear relationship between x and y with a positive slope, with a variance as is to be expected.

13e.

@@ -692,19 +711,19 @@

13e.

## lm(formula = y ~ x) ## ## Residuals: -## Min 1Q Median 3Q Max -## -0.9384 -0.3069 -0.0697 0.2697 1.1731 +## Min 1Q Median 3Q Max +## -0.93842 -0.30688 -0.06975 0.26970 1.17309 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -1.0188 0.0485 -21.01 < 2e-16 *** -## x 0.4995 0.0539 9.27 4.6e-15 *** +## (Intercept) -1.01885 0.04849 -21.010 < 2e-16 *** +## x 0.49947 0.05386 9.273 4.58e-15 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 0.481 on 98 degrees of freedom -## Multiple R-squared: 0.467, Adjusted R-squared: 0.462 -## F-statistic: 86 on 1 and 98 DF, p-value: 4.58e-15 +## Residual standard error: 0.4814 on 98 degrees of freedom +## Multiple R-squared: 0.4674, Adjusted R-squared: 0.4619 +## F-statistic: 85.99 on 1 and 98 DF, p-value: 4.583e-15

The linear regression fits a model close to the true value of the coefficients as was constructed. The model has a large F-statistic with a near-zero p-value so the null hypothesis can be rejected.

@@ -713,7 +732,7 @@

13f.

abline(lm.fit, lwd=3, col=2) abline(-1, 0.5, lwd=3, col=3) legend(-1, legend = c("model fit", "pop. regression"), col=2:3, lwd=3) -

plot of chunk unnamed-chunk-32

+

13g.

@@ -724,21 +743,21 @@

13g.

## lm(formula = y ~ x + I(x^2)) ## ## Residuals: -## Min 1Q Median 3Q Max -## -0.9825 -0.3127 -0.0644 0.2901 1.1350 +## Min 1Q Median 3Q Max +## -0.98252 -0.31270 -0.06441 0.29014 1.13500 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -0.9716 0.0588 -16.52 < 2e-16 *** -## x 0.5086 0.0540 9.42 2.4e-15 *** -## I(x^2) -0.0595 0.0424 -1.40 0.16 +## (Intercept) -0.97164 0.05883 -16.517 < 2e-16 *** +## x 0.50858 0.05399 9.420 2.4e-15 *** +## I(x^2) -0.05946 0.04238 -1.403 0.164 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.479 on 97 degrees of freedom -## Multiple R-squared: 0.478, Adjusted R-squared: 0.467 -## F-statistic: 44.4 on 2 and 97 DF, p-value: 2.04e-14 -

There is evidence that model fit has increased over the training data given the slight increase in \(R^2\) and \(RSE\). Although, the p-value of the t-statistic suggests that there isn’t a relationship between y and \(x^2\).

+## Multiple R-squared: 0.4779, Adjusted R-squared: 0.4672 +## F-statistic: 44.4 on 2 and 97 DF, p-value: 2.038e-14 +

There is evidence that model fit has increased over the training data given the slight increase in \(R^2\) and \(RSE\). Although, the p-value of the t-statistic suggests that there isn’t a relationship between y and \(x^2\).

13h.

@@ -755,22 +774,22 @@

13h.

## ## Residuals: ## Min 1Q Median 3Q Max -## -0.29052 -0.07545 0.00067 0.07288 0.28665 +## -0.29052 -0.07545 0.00067 0.07288 0.28664 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -0.9864 0.0113 -87.3 <2e-16 *** -## x1 0.4999 0.0118 42.2 <2e-16 *** +## (Intercept) -0.98639 0.01129 -87.34 <2e-16 *** +## x1 0.49988 0.01184 42.22 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 0.113 on 98 degrees of freedom -## Multiple R-squared: 0.948, Adjusted R-squared: 0.947 -## F-statistic: 1.78e+03 on 1 and 98 DF, p-value: <2e-16 +## Residual standard error: 0.1128 on 98 degrees of freedom +## Multiple R-squared: 0.9479, Adjusted R-squared: 0.9474 +## F-statistic: 1782 on 1 and 98 DF, p-value: < 2.2e-16
abline(lm.fit1, lwd=3, col=2)
 abline(-1, 0.5, lwd=3, col=3)
 legend(-1, legend = c("model fit", "pop. regression"), col=2:3, lwd=3)
-

plot of chunk unnamed-chunk-34 As expected, the error observed in \(R^2\) and \(RSE\) decreases considerably.

+

As expected, the error observed in \(R^2\) and \(RSE\) decreases considerably.

13i.

@@ -786,38 +805,38 @@

13i.

## lm(formula = y2 ~ x2) ## ## Residuals: -## Min 1Q Median 3Q Max -## -1.1621 -0.3018 0.0027 0.2915 1.1466 +## Min 1Q Median 3Q Max +## -1.16208 -0.30181 0.00268 0.29152 1.14658 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -0.9456 0.0452 -20.9 <2e-16 *** -## x2 0.4995 0.0474 10.6 <2e-16 *** +## (Intercept) -0.94557 0.04517 -20.93 <2e-16 *** +## x2 0.49953 0.04736 10.55 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 0.451 on 98 degrees of freedom -## Multiple R-squared: 0.532, Adjusted R-squared: 0.527 -## F-statistic: 111 on 1 and 98 DF, p-value: <2e-16 +## Residual standard error: 0.4514 on 98 degrees of freedom +## Multiple R-squared: 0.5317, Adjusted R-squared: 0.5269 +## F-statistic: 111.2 on 1 and 98 DF, p-value: < 2.2e-16
abline(lm.fit2, lwd=3, col=2)
 abline(-1, 0.5, lwd=3, col=3)
 legend(-1, legend = c("model fit", "pop. regression"), col=2:3, lwd=3)
-

plot of chunk unnamed-chunk-35 As expected, the error observed in \(R^2\) and \(RSE\) increases considerably.

+

As expected, the error observed in \(R^2\) and \(RSE\) increases considerably.

13j.

confint(lm.fit)
-
##               2.5 %  97.5 %
-## (Intercept) -1.1151 -0.9226
-## x            0.3926  0.6064
+
##                  2.5 %     97.5 %
+## (Intercept) -1.1150804 -0.9226122
+## x            0.3925794  0.6063602
confint(lm.fit1)
-
##               2.5 %  97.5 %
-## (Intercept) -1.0088 -0.9640
-## x1           0.4764  0.5234
+
##                 2.5 %     97.5 %
+## (Intercept) -1.008805 -0.9639819
+## x1           0.476387  0.5233799
confint(lm.fit2)
-
##               2.5 %  97.5 %
-## (Intercept) -1.0352 -0.8559
-## x2           0.4055  0.5935
+
##                  2.5 %     97.5 %
+## (Intercept) -1.0352203 -0.8559276
+## x2           0.4055479  0.5935197

All intervals seem to be centered on approximately 0.5, with the second fit’s interval being narrower than the first fit’s interval and the last fit’s interval being wider than the first fit’s interval.

@@ -826,7 +845,7 @@

14a.

x1 = runif(100) x2 = 0.5 * x1 + rnorm(100)/10 y = 2 + 2*x1 + 0.3*x2 + rnorm(100) -

\[ +

\[ Y = 2 + 2 X_1 + 0.3 X_2 + \epsilon \\ \beta_0 = 2, \beta_1 = 2, \beta_3 = 0.3 \]

@@ -834,9 +853,9 @@

14a.

14b.

cor(x1, x2)
-
## [1] 0.8351
+
## [1] 0.8351212
plot(x1, x2)
-

plot of chunk unnamed-chunk-38

+

14c.

@@ -852,16 +871,16 @@

14c.

## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 2.130 0.232 9.19 7.6e-15 *** -## x1 1.440 0.721 2.00 0.049 * -## x2 1.010 1.134 0.89 0.375 +## (Intercept) 2.1305 0.2319 9.188 7.61e-15 *** +## x1 1.4396 0.7212 1.996 0.0487 * +## x2 1.0097 1.1337 0.891 0.3754 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 1.06 on 97 degrees of freedom -## Multiple R-squared: 0.209, Adjusted R-squared: 0.193 -## F-statistic: 12.8 on 2 and 97 DF, p-value: 1.16e-05 -

\[\beta_0 = 2.0533, \beta_1 = 1.6336, \beta_3 = 0.5588\] The regression coefficients are close to the true coefficients, although with high standard error. We can reject the null hypothesis for \(\beta_1\) because its p-value is below 5%. We cannot reject the null hypothesis for \(\beta_2\) because its p-value is much above the 5% typical cutoff, over 60%.

+## Residual standard error: 1.056 on 97 degrees of freedom +## Multiple R-squared: 0.2088, Adjusted R-squared: 0.1925 +## F-statistic: 12.8 on 2 and 97 DF, p-value: 1.164e-05 +

$$_0 = 2.1305, _1 = 1.4396, _3 = 1.0097 The intercept is close to the true coefficient, while \(\beta_1\) and \(\beta_2\) are deflated and inflated respectively. The standard error is moreover high and the model covers 21 % of the overall variance. We can reject the null hypothesis for \(\beta_1\) because its p-value is below 5%. We cannot reject the null hypothesis for \(\beta_2\) because its p-value is much above the 5% typical cutoff.

14d.

@@ -872,19 +891,19 @@

14d.

## lm(formula = y ~ x1) ## ## Residuals: -## Min 1Q Median 3Q Max -## -2.8950 -0.6687 -0.0779 0.5922 2.4556 +## Min 1Q Median 3Q Max +## -2.89495 -0.66874 -0.07785 0.59221 2.45560 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 2.112 0.231 9.15 8.3e-15 *** -## x1 1.976 0.396 4.99 2.7e-06 *** +## (Intercept) 2.1124 0.2307 9.155 8.27e-15 *** +## x1 1.9759 0.3963 4.986 2.66e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 1.06 on 98 degrees of freedom -## Multiple R-squared: 0.202, Adjusted R-squared: 0.194 -## F-statistic: 24.9 on 1 and 98 DF, p-value: 2.66e-06 +## Residual standard error: 1.055 on 98 degrees of freedom +## Multiple R-squared: 0.2024, Adjusted R-squared: 0.1942 +## F-statistic: 24.86 on 1 and 98 DF, p-value: 2.661e-06

Yes, we can reject the null hypothesis for the regression coefficient given the p-value for its t-statistic is near zero.

@@ -896,19 +915,19 @@

14e.

## lm(formula = y ~ x2) ## ## Residuals: -## Min 1Q Median 3Q Max -## -2.627 -0.752 -0.036 0.724 2.449 +## Min 1Q Median 3Q Max +## -2.62687 -0.75156 -0.03598 0.72383 2.44890 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 2.390 0.195 12.26 < 2e-16 *** -## x2 2.900 0.633 4.58 1.4e-05 *** +## (Intercept) 2.3899 0.1949 12.26 < 2e-16 *** +## x2 2.8996 0.6330 4.58 1.37e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 1.07 on 98 degrees of freedom -## Multiple R-squared: 0.176, Adjusted R-squared: 0.168 -## F-statistic: 21 on 1 and 98 DF, p-value: 1.37e-05 +## Residual standard error: 1.072 on 98 degrees of freedom +## Multiple R-squared: 0.1763, Adjusted R-squared: 0.1679 +## F-statistic: 20.98 on 1 and 98 DF, p-value: 1.366e-05

Yes, we can reject the null hypothesis for the regression coefficient given the p-value for its t-statistic is near zero.

@@ -927,20 +946,20 @@

14g.

## lm(formula = y ~ x1 + x2) ## ## Residuals: -## Min 1Q Median 3Q Max -## -2.7335 -0.6932 -0.0526 0.6638 2.3062 +## Min 1Q Median 3Q Max +## -2.73348 -0.69318 -0.05263 0.66385 2.30619 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 2.227 0.231 9.62 7.9e-16 *** -## x1 0.539 0.592 0.91 0.3646 -## x2 2.515 0.898 2.80 0.0061 ** +## (Intercept) 2.2267 0.2314 9.624 7.91e-16 *** +## x1 0.5394 0.5922 0.911 0.36458 +## x2 2.5146 0.8977 2.801 0.00614 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 1.07 on 98 degrees of freedom -## Multiple R-squared: 0.219, Adjusted R-squared: 0.203 -## F-statistic: 13.7 on 2 and 98 DF, p-value: 5.56e-06 +## Residual standard error: 1.075 on 98 degrees of freedom +## Multiple R-squared: 0.2188, Adjusted R-squared: 0.2029 +## F-statistic: 13.72 on 2 and 98 DF, p-value: 5.564e-06
lm.fit2 = lm(y~x1)
 summary(lm.fit2)
## 
@@ -948,19 +967,19 @@ 

14g.

## lm(formula = y ~ x1) ## ## Residuals: -## Min 1Q Median 3Q Max -## -2.890 -0.656 -0.091 0.568 3.567 +## Min 1Q Median 3Q Max +## -2.8897 -0.6556 -0.0909 0.5682 3.5665 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 2.257 0.239 9.44 1.8e-15 *** -## x1 1.766 0.412 4.28 4.3e-05 *** +## (Intercept) 2.2569 0.2390 9.445 1.78e-15 *** +## x1 1.7657 0.4124 4.282 4.29e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 1.11 on 99 degrees of freedom -## Multiple R-squared: 0.156, Adjusted R-squared: 0.148 -## F-statistic: 18.3 on 1 and 99 DF, p-value: 4.29e-05
+## Residual standard error: 1.111 on 99 degrees of freedom +## Multiple R-squared: 0.1562, Adjusted R-squared: 0.1477 +## F-statistic: 18.33 on 1 and 99 DF, p-value: 4.295e-05
lm.fit3 = lm(y~x2)
 summary(lm.fit3)
## 
@@ -968,99 +987,98 @@ 

14g.

## lm(formula = y ~ x2) ## ## Residuals: -## Min 1Q Median 3Q Max -## -2.647 -0.710 -0.069 0.727 2.381 +## Min 1Q Median 3Q Max +## -2.64729 -0.71021 -0.06899 0.72699 2.38074 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 2.345 0.191 12.26 < 2e-16 *** -## x2 3.119 0.604 5.16 1.3e-06 *** +## (Intercept) 2.3451 0.1912 12.264 < 2e-16 *** +## x2 3.1190 0.6040 5.164 1.25e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 1.07 on 99 degrees of freedom -## Multiple R-squared: 0.212, Adjusted R-squared: 0.204 -## F-statistic: 26.7 on 1 and 99 DF, p-value: 1.25e-06
+## Residual standard error: 1.074 on 99 degrees of freedom +## Multiple R-squared: 0.2122, Adjusted R-squared: 0.2042 +## F-statistic: 26.66 on 1 and 99 DF, p-value: 1.253e-06

In the first model, it shifts x1 to statistically insignificance and shifts x2 to statistiscal significance from the change in p-values between the two linear regressions.

par(mfrow=c(2,2))
 plot(lm.fit1)
-

plot of chunk unnamed-chunk-43

+

par(mfrow=c(2,2))
 plot(lm.fit2)
-

plot of chunk unnamed-chunk-44

+

par(mfrow=c(2,2))
 plot(lm.fit3)
-

plot of chunk unnamed-chunk-45 In the first and third models, the point becomes a high leverage point.

+

In the first and third models, the point becomes a high leverage point.

plot(predict(lm.fit1), rstudent(lm.fit1))
-

plot of chunk unnamed-chunk-46

+

plot(predict(lm.fit2), rstudent(lm.fit2))
-

plot of chunk unnamed-chunk-46

+

plot(predict(lm.fit3), rstudent(lm.fit3))
-

plot of chunk unnamed-chunk-46 Looking at the studentized residuals, we don’t observe points too far from the |3| value cutoff, except for the second linear regression: y ~ x1.

+

Looking at the studentized residuals, we don’t observe points too far from the |3| value cutoff, except for the second linear regression: y ~ x1.

15a.

-
library(MASS)
-
## Warning: package 'MASS' was built under R version 3.0.2
-
summary(Boston)
-
##       crim             zn            indus            chas       
-##  Min.   : 0.01   Min.   :  0.0   Min.   : 0.46   Min.   :0.0000  
-##  1st Qu.: 0.08   1st Qu.:  0.0   1st Qu.: 5.19   1st Qu.:0.0000  
-##  Median : 0.26   Median :  0.0   Median : 9.69   Median :0.0000  
-##  Mean   : 3.61   Mean   : 11.4   Mean   :11.14   Mean   :0.0692  
-##  3rd Qu.: 3.68   3rd Qu.: 12.5   3rd Qu.:18.10   3rd Qu.:0.0000  
-##  Max.   :88.98   Max.   :100.0   Max.   :27.74   Max.   :1.0000  
-##       nox              rm            age             dis       
-##  Min.   :0.385   Min.   :3.56   Min.   :  2.9   Min.   : 1.13  
-##  1st Qu.:0.449   1st Qu.:5.89   1st Qu.: 45.0   1st Qu.: 2.10  
-##  Median :0.538   Median :6.21   Median : 77.5   Median : 3.21  
-##  Mean   :0.555   Mean   :6.29   Mean   : 68.6   Mean   : 3.79  
-##  3rd Qu.:0.624   3rd Qu.:6.62   3rd Qu.: 94.1   3rd Qu.: 5.19  
-##  Max.   :0.871   Max.   :8.78   Max.   :100.0   Max.   :12.13  
-##       rad             tax         ptratio         black      
-##  Min.   : 1.00   Min.   :187   Min.   :12.6   Min.   :  0.3  
-##  1st Qu.: 4.00   1st Qu.:279   1st Qu.:17.4   1st Qu.:375.4  
-##  Median : 5.00   Median :330   Median :19.1   Median :391.4  
-##  Mean   : 9.55   Mean   :408   Mean   :18.5   Mean   :356.7  
-##  3rd Qu.:24.00   3rd Qu.:666   3rd Qu.:20.2   3rd Qu.:396.2  
-##  Max.   :24.00   Max.   :711   Max.   :22.0   Max.   :396.9  
-##      lstat            medv     
-##  Min.   : 1.73   Min.   : 5.0  
-##  1st Qu.: 6.95   1st Qu.:17.0  
-##  Median :11.36   Median :21.2  
-##  Mean   :12.65   Mean   :22.5  
-##  3rd Qu.:16.95   3rd Qu.:25.0  
-##  Max.   :37.97   Max.   :50.0
+
library(MASS)
+summary(Boston)
+
##       crim                zn             indus            chas        
+##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
+##  1st Qu.: 0.08204   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
+##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
+##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
+##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
+##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
+##       nox               rm             age              dis        
+##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
+##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
+##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
+##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
+##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
+##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
+##       rad              tax           ptratio          black       
+##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
+##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
+##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
+##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
+##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
+##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
+##      lstat            medv      
+##  Min.   : 1.73   Min.   : 5.00  
+##  1st Qu.: 6.95   1st Qu.:17.02  
+##  Median :11.36   Median :21.20  
+##  Mean   :12.65   Mean   :22.53  
+##  3rd Qu.:16.95   3rd Qu.:25.00  
+##  Max.   :37.97   Max.   :50.00
Boston$chas <- factor(Boston$chas, labels = c("N","Y"))
 summary(Boston)
-
##       crim             zn            indus       chas         nox       
-##  Min.   : 0.01   Min.   :  0.0   Min.   : 0.46   N:471   Min.   :0.385  
-##  1st Qu.: 0.08   1st Qu.:  0.0   1st Qu.: 5.19   Y: 35   1st Qu.:0.449  
-##  Median : 0.26   Median :  0.0   Median : 9.69           Median :0.538  
-##  Mean   : 3.61   Mean   : 11.4   Mean   :11.14           Mean   :0.555  
-##  3rd Qu.: 3.68   3rd Qu.: 12.5   3rd Qu.:18.10           3rd Qu.:0.624  
-##  Max.   :88.98   Max.   :100.0   Max.   :27.74           Max.   :0.871  
-##        rm            age             dis             rad       
-##  Min.   :3.56   Min.   :  2.9   Min.   : 1.13   Min.   : 1.00  
-##  1st Qu.:5.89   1st Qu.: 45.0   1st Qu.: 2.10   1st Qu.: 4.00  
-##  Median :6.21   Median : 77.5   Median : 3.21   Median : 5.00  
-##  Mean   :6.29   Mean   : 68.6   Mean   : 3.79   Mean   : 9.55  
-##  3rd Qu.:6.62   3rd Qu.: 94.1   3rd Qu.: 5.19   3rd Qu.:24.00  
-##  Max.   :8.78   Max.   :100.0   Max.   :12.13   Max.   :24.00  
-##       tax         ptratio         black           lstat      
-##  Min.   :187   Min.   :12.6   Min.   :  0.3   Min.   : 1.73  
-##  1st Qu.:279   1st Qu.:17.4   1st Qu.:375.4   1st Qu.: 6.95  
-##  Median :330   Median :19.1   Median :391.4   Median :11.36  
-##  Mean   :408   Mean   :18.5   Mean   :356.7   Mean   :12.65  
-##  3rd Qu.:666   3rd Qu.:20.2   3rd Qu.:396.2   3rd Qu.:16.95  
-##  Max.   :711   Max.   :22.0   Max.   :396.9   Max.   :37.97  
-##       medv     
-##  Min.   : 5.0  
-##  1st Qu.:17.0  
-##  Median :21.2  
-##  Mean   :22.5  
-##  3rd Qu.:25.0  
-##  Max.   :50.0
+
##       crim                zn             indus       chas   
+##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   N:471  
+##  1st Qu.: 0.08204   1st Qu.:  0.00   1st Qu.: 5.19   Y: 35  
+##  Median : 0.25651   Median :  0.00   Median : 9.69          
+##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14          
+##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10          
+##  Max.   :88.97620   Max.   :100.00   Max.   :27.74          
+##       nox               rm             age              dis        
+##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
+##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
+##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
+##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
+##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
+##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
+##       rad              tax           ptratio          black       
+##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
+##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
+##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
+##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
+##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
+##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
+##      lstat            medv      
+##  Min.   : 1.73   Min.   : 5.00  
+##  1st Qu.: 6.95   1st Qu.:17.02  
+##  Median :11.36   Median :21.20  
+##  Mean   :12.65   Mean   :22.53  
+##  3rd Qu.:16.95   3rd Qu.:25.00  
+##  Max.   :37.97   Max.   :50.00
attach(Boston)
 lm.zn = lm(crim~zn)
 summary(lm.zn) # yes
@@ -1070,18 +1088,18 @@

15a.

## ## Residuals: ## Min 1Q Median 3Q Max -## -4.43 -4.22 -2.62 1.25 84.52 +## -4.429 -4.222 -2.620 1.250 84.523 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 4.4537 0.4172 10.67 < 2e-16 *** -## zn -0.0739 0.0161 -4.59 5.5e-06 *** +## (Intercept) 4.45369 0.41722 10.675 < 2e-16 *** +## zn -0.07393 0.01609 -4.594 5.51e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 8.44 on 504 degrees of freedom -## Multiple R-squared: 0.0402, Adjusted R-squared: 0.0383 -## F-statistic: 21.1 on 1 and 504 DF, p-value: 5.51e-06 +## Residual standard error: 8.435 on 504 degrees of freedom +## Multiple R-squared: 0.04019, Adjusted R-squared: 0.03828 +## F-statistic: 21.1 on 1 and 504 DF, p-value: 5.506e-06
lm.indus = lm(crim~indus)
 summary(lm.indus) # yes
## 
@@ -1089,19 +1107,19 @@ 

15a.

## lm(formula = crim ~ indus) ## ## Residuals: -## Min 1Q Median 3Q Max -## -11.97 -2.70 -0.74 0.71 81.81 +## Min 1Q Median 3Q Max +## -11.972 -2.698 -0.736 0.712 81.813 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -2.064 0.667 -3.09 0.0021 ** -## indus 0.510 0.051 9.99 <2e-16 *** +## (Intercept) -2.06374 0.66723 -3.093 0.00209 ** +## indus 0.50978 0.05102 9.991 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7.87 on 504 degrees of freedom -## Multiple R-squared: 0.165, Adjusted R-squared: 0.164 -## F-statistic: 99.8 on 1 and 504 DF, p-value: <2e-16
+## Residual standard error: 7.866 on 504 degrees of freedom +## Multiple R-squared: 0.1653, Adjusted R-squared: 0.1637 +## F-statistic: 99.82 on 1 and 504 DF, p-value: < 2.2e-16
lm.chas = lm(crim~chas) 
 summary(lm.chas) # no
## 
@@ -1110,18 +1128,18 @@ 

15a.

## ## Residuals: ## Min 1Q Median 3Q Max -## -3.74 -3.66 -3.44 0.02 85.23 +## -3.738 -3.661 -3.435 0.018 85.232 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.744 0.396 9.45 <2e-16 *** -## chasY -1.893 1.506 -1.26 0.21 +## (Intercept) 3.7444 0.3961 9.453 <2e-16 *** +## chasY -1.8928 1.5061 -1.257 0.209 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 8.6 on 504 degrees of freedom -## Multiple R-squared: 0.00312, Adjusted R-squared: 0.00115 -## F-statistic: 1.58 on 1 and 504 DF, p-value: 0.209
+## Residual standard error: 8.597 on 504 degrees of freedom +## Multiple R-squared: 0.003124, Adjusted R-squared: 0.001146 +## F-statistic: 1.579 on 1 and 504 DF, p-value: 0.2094
lm.nox = lm(crim~nox)
 summary(lm.nox) # yes
## 
@@ -1129,19 +1147,19 @@ 

15a.

## lm(formula = crim ~ nox) ## ## Residuals: -## Min 1Q Median 3Q Max -## -12.37 -2.74 -0.97 0.56 81.73 +## Min 1Q Median 3Q Max +## -12.371 -2.738 -0.974 0.559 81.728 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -13.7 1.7 -8.07 5.1e-15 *** -## nox 31.2 3.0 10.42 < 2e-16 *** +## (Intercept) -13.720 1.699 -8.073 5.08e-15 *** +## nox 31.249 2.999 10.419 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 7.81 on 504 degrees of freedom -## Multiple R-squared: 0.177, Adjusted R-squared: 0.176 -## F-statistic: 109 on 1 and 504 DF, p-value: <2e-16
+## Multiple R-squared: 0.1772, Adjusted R-squared: 0.1756 +## F-statistic: 108.6 on 1 and 504 DF, p-value: < 2.2e-16
lm.rm = lm(crim~rm)
 summary(lm.rm) # yes
## 
@@ -1150,18 +1168,18 @@ 

15a.

## ## Residuals: ## Min 1Q Median 3Q Max -## -6.60 -3.95 -2.65 0.99 87.20 +## -6.604 -3.952 -2.654 0.989 87.197 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 20.482 3.364 6.09 2.3e-09 *** -## rm -2.684 0.532 -5.04 6.3e-07 *** +## (Intercept) 20.482 3.365 6.088 2.27e-09 *** +## rm -2.684 0.532 -5.045 6.35e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 8.4 on 504 degrees of freedom -## Multiple R-squared: 0.0481, Adjusted R-squared: 0.0462 -## F-statistic: 25.5 on 1 and 504 DF, p-value: 6.35e-07
+## Residual standard error: 8.401 on 504 degrees of freedom +## Multiple R-squared: 0.04807, Adjusted R-squared: 0.04618 +## F-statistic: 25.45 on 1 and 504 DF, p-value: 6.347e-07
lm.age = lm(crim~age)
 summary(lm.age) # yes
## 
@@ -1170,18 +1188,18 @@ 

15a.

## ## Residuals: ## Min 1Q Median 3Q Max -## -6.79 -4.26 -1.23 1.53 82.85 +## -6.789 -4.257 -1.230 1.527 82.849 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -3.7779 0.9440 -4.00 7.2e-05 *** -## age 0.1078 0.0127 8.46 2.9e-16 *** +## (Intercept) -3.77791 0.94398 -4.002 7.22e-05 *** +## age 0.10779 0.01274 8.463 2.85e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 8.06 on 504 degrees of freedom -## Multiple R-squared: 0.124, Adjusted R-squared: 0.123 -## F-statistic: 71.6 on 1 and 504 DF, p-value: 2.85e-16
+## Residual standard error: 8.057 on 504 degrees of freedom +## Multiple R-squared: 0.1244, Adjusted R-squared: 0.1227 +## F-statistic: 71.62 on 1 and 504 DF, p-value: 2.855e-16
lm.dis = lm(crim~dis)
 summary(lm.dis) # yes
## 
@@ -1190,18 +1208,18 @@ 

15a.

## ## Residuals: ## Min 1Q Median 3Q Max -## -6.71 -4.13 -1.53 1.52 81.67 +## -6.708 -4.134 -1.527 1.516 81.674 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 9.499 0.730 13.01 <2e-16 *** -## dis -1.551 0.168 -9.21 <2e-16 *** +## (Intercept) 9.4993 0.7304 13.006 <2e-16 *** +## dis -1.5509 0.1683 -9.213 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7.97 on 504 degrees of freedom -## Multiple R-squared: 0.144, Adjusted R-squared: 0.142 -## F-statistic: 84.9 on 1 and 504 DF, p-value: <2e-16
+## Residual standard error: 7.965 on 504 degrees of freedom +## Multiple R-squared: 0.1441, Adjusted R-squared: 0.1425 +## F-statistic: 84.89 on 1 and 504 DF, p-value: < 2.2e-16
lm.rad = lm(crim~rad)
 summary(lm.rad) # yes
## 
@@ -1209,19 +1227,19 @@ 

15a.

## lm(formula = crim ~ rad) ## ## Residuals: -## Min 1Q Median 3Q Max -## -10.16 -1.38 -0.14 0.66 76.43 +## Min 1Q Median 3Q Max +## -10.164 -1.381 -0.141 0.660 76.433 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -2.2872 0.4435 -5.16 3.6e-07 *** -## rad 0.6179 0.0343 18.00 < 2e-16 *** +## (Intercept) -2.28716 0.44348 -5.157 3.61e-07 *** +## rad 0.61791 0.03433 17.998 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 6.72 on 504 degrees of freedom -## Multiple R-squared: 0.391, Adjusted R-squared: 0.39 -## F-statistic: 324 on 1 and 504 DF, p-value: <2e-16
+## Residual standard error: 6.718 on 504 degrees of freedom +## Multiple R-squared: 0.3913, Adjusted R-squared: 0.39 +## F-statistic: 323.9 on 1 and 504 DF, p-value: < 2.2e-16
lm.tax = lm(crim~tax)
 summary(lm.tax) # yes
## 
@@ -1229,19 +1247,19 @@ 

15a.

## lm(formula = crim ~ tax) ## ## Residuals: -## Min 1Q Median 3Q Max -## -12.51 -2.74 -0.19 1.07 77.70 +## Min 1Q Median 3Q Max +## -12.513 -2.738 -0.194 1.065 77.696 ## ## Coefficients: -## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -8.52837 0.81581 -10.4 <2e-16 *** -## tax 0.02974 0.00185 16.1 <2e-16 *** +## Estimate Std. Error t value Pr(>|t|) +## (Intercept) -8.528369 0.815809 -10.45 <2e-16 *** +## tax 0.029742 0.001847 16.10 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7 on 504 degrees of freedom -## Multiple R-squared: 0.34, Adjusted R-squared: 0.338 -## F-statistic: 259 on 1 and 504 DF, p-value: <2e-16
+## Residual standard error: 6.997 on 504 degrees of freedom +## Multiple R-squared: 0.3396, Adjusted R-squared: 0.3383 +## F-statistic: 259.2 on 1 and 504 DF, p-value: < 2.2e-16
lm.ptratio = lm(crim~ptratio)
 summary(lm.ptratio) # yes
## 
@@ -1250,18 +1268,18 @@ 

15a.

## ## Residuals: ## Min 1Q Median 3Q Max -## -7.65 -3.99 -1.91 1.82 83.35 +## -7.654 -3.985 -1.912 1.825 83.353 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -17.647 3.147 -5.61 3.4e-08 *** -## ptratio 1.152 0.169 6.80 2.9e-11 *** +## (Intercept) -17.6469 3.1473 -5.607 3.40e-08 *** +## ptratio 1.1520 0.1694 6.801 2.94e-11 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 8.24 on 504 degrees of freedom -## Multiple R-squared: 0.0841, Adjusted R-squared: 0.0823 -## F-statistic: 46.3 on 1 and 504 DF, p-value: 2.94e-11
+## Multiple R-squared: 0.08407, Adjusted R-squared: 0.08225 +## F-statistic: 46.26 on 1 and 504 DF, p-value: 2.943e-11
lm.black = lm(crim~black)
 summary(lm.black) # yes
## 
@@ -1269,19 +1287,19 @@ 

15a.

## lm(formula = crim ~ black) ## ## Residuals: -## Min 1Q Median 3Q Max -## -13.76 -2.30 -2.09 -1.30 86.82 +## Min 1Q Median 3Q Max +## -13.756 -2.299 -2.095 -1.296 86.822 ## ## Coefficients: -## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 16.55353 1.42590 11.61 <2e-16 *** -## black -0.03628 0.00387 -9.37 <2e-16 *** +## Estimate Std. Error t value Pr(>|t|) +## (Intercept) 16.553529 1.425903 11.609 <2e-16 *** +## black -0.036280 0.003873 -9.367 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7.95 on 504 degrees of freedom -## Multiple R-squared: 0.148, Adjusted R-squared: 0.147 -## F-statistic: 87.7 on 1 and 504 DF, p-value: <2e-16
+## Residual standard error: 7.946 on 504 degrees of freedom +## Multiple R-squared: 0.1483, Adjusted R-squared: 0.1466 +## F-statistic: 87.74 on 1 and 504 DF, p-value: < 2.2e-16
lm.lstat = lm(crim~lstat)
 summary(lm.lstat) # yes
## 
@@ -1289,19 +1307,19 @@ 

15a.

## lm(formula = crim ~ lstat) ## ## Residuals: -## Min 1Q Median 3Q Max -## -13.93 -2.82 -0.66 1.08 82.86 +## Min 1Q Median 3Q Max +## -13.925 -2.822 -0.664 1.079 82.862 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) -3.3305 0.6938 -4.8 2.1e-06 *** -## lstat 0.5488 0.0478 11.5 < 2e-16 *** +## (Intercept) -3.33054 0.69376 -4.801 2.09e-06 *** +## lstat 0.54880 0.04776 11.491 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7.66 on 504 degrees of freedom -## Multiple R-squared: 0.208, Adjusted R-squared: 0.206 -## F-statistic: 132 on 1 and 504 DF, p-value: <2e-16
+## Residual standard error: 7.664 on 504 degrees of freedom +## Multiple R-squared: 0.2076, Adjusted R-squared: 0.206 +## F-statistic: 132 on 1 and 504 DF, p-value: < 2.2e-16
lm.medv = lm(crim~medv)
 summary(lm.medv) # yes
## 
@@ -1310,18 +1328,18 @@ 

15a.

## ## Residuals: ## Min 1Q Median 3Q Max -## -9.07 -4.02 -2.34 1.30 80.96 +## -9.071 -4.022 -2.343 1.298 80.957 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 11.7965 0.9342 12.63 <2e-16 *** -## medv -0.3632 0.0384 -9.46 <2e-16 *** +## (Intercept) 11.79654 0.93419 12.63 <2e-16 *** +## medv -0.36316 0.03839 -9.46 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7.93 on 504 degrees of freedom -## Multiple R-squared: 0.151, Adjusted R-squared: 0.149 -## F-statistic: 89.5 on 1 and 504 DF, p-value: <2e-16
+## Residual standard error: 7.934 on 504 degrees of freedom +## Multiple R-squared: 0.1508, Adjusted R-squared: 0.1491 +## F-statistic: 89.49 on 1 and 504 DF, p-value: < 2.2e-16

All, except chas. Plot each linear regression using “plot(lm)” to see residuals.

@@ -1334,30 +1352,30 @@

15b.

## ## Residuals: ## Min 1Q Median 3Q Max -## -9.92 -2.12 -0.35 1.02 75.05 +## -9.924 -2.120 -0.353 1.019 75.051 ## ## Coefficients: -## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 17.03323 7.23490 2.35 0.0189 * -## zn 0.04486 0.01873 2.39 0.0170 * -## indus -0.06385 0.08341 -0.77 0.4443 -## chasY -0.74913 1.18015 -0.63 0.5259 -## nox -10.31353 5.27554 -1.95 0.0512 . -## rm 0.43013 0.61283 0.70 0.4831 -## age 0.00145 0.01793 0.08 0.9355 -## dis -0.98718 0.28182 -3.50 0.0005 *** -## rad 0.58821 0.08805 6.68 6.5e-11 *** -## tax -0.00378 0.00516 -0.73 0.4638 -## ptratio -0.27108 0.18645 -1.45 0.1466 -## black -0.00754 0.00367 -2.05 0.0407 * -## lstat 0.12621 0.07572 1.67 0.0962 . -## medv -0.19889 0.06052 -3.29 0.0011 ** +## Estimate Std. Error t value Pr(>|t|) +## (Intercept) 17.033228 7.234903 2.354 0.018949 * +## zn 0.044855 0.018734 2.394 0.017025 * +## indus -0.063855 0.083407 -0.766 0.444294 +## chasY -0.749134 1.180147 -0.635 0.525867 +## nox -10.313535 5.275536 -1.955 0.051152 . +## rm 0.430131 0.612830 0.702 0.483089 +## age 0.001452 0.017925 0.081 0.935488 +## dis -0.987176 0.281817 -3.503 0.000502 *** +## rad 0.588209 0.088049 6.680 6.46e-11 *** +## tax -0.003780 0.005156 -0.733 0.463793 +## ptratio -0.271081 0.186450 -1.454 0.146611 +## black -0.007538 0.003673 -2.052 0.040702 * +## lstat 0.126211 0.075725 1.667 0.096208 . +## medv -0.198887 0.060516 -3.287 0.001087 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 6.44 on 492 degrees of freedom -## Multiple R-squared: 0.454, Adjusted R-squared: 0.44 -## F-statistic: 31.5 on 13 and 492 DF, p-value: <2e-16 +## Residual standard error: 6.439 on 492 degrees of freedom +## Multiple R-squared: 0.454, Adjusted R-squared: 0.4396 +## F-statistic: 31.47 on 13 and 492 DF, p-value: < 2.2e-16

zn, dis, rad, black, medv

@@ -1377,7 +1395,7 @@

15c.

coefficients(lm.medv)[2]) y = coefficients(lm.all)[2:14] plot(x, y) -

plot of chunk unnamed-chunk-49 Coefficient for nox is approximately -10 in univariate model and 31 in multiple regression model.

+

Coefficient for nox is approximately -10 in univariate model and 31 in multiple regression model.

15d.

@@ -1389,20 +1407,20 @@

15d.

## ## Residuals: ## Min 1Q Median 3Q Max -## -4.82 -4.61 -1.29 0.47 84.13 +## -4.821 -4.614 -1.294 0.473 84.130 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.614 0.372 9.71 < 2e-16 *** -## poly(zn, 3)1 -38.750 8.372 -4.63 4.7e-06 *** -## poly(zn, 3)2 23.940 8.372 2.86 0.0044 ** -## poly(zn, 3)3 -10.072 8.372 -1.20 0.2295 +## (Intercept) 3.6135 0.3722 9.709 < 2e-16 *** +## poly(zn, 3)1 -38.7498 8.3722 -4.628 4.7e-06 *** +## poly(zn, 3)2 23.9398 8.3722 2.859 0.00442 ** +## poly(zn, 3)3 -10.0719 8.3722 -1.203 0.22954 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 8.37 on 502 degrees of freedom -## Multiple R-squared: 0.0582, Adjusted R-squared: 0.0526 -## F-statistic: 10.3 on 3 and 502 DF, p-value: 1.28e-06 +## Residual standard error: 8.372 on 502 degrees of freedom +## Multiple R-squared: 0.05824, Adjusted R-squared: 0.05261 +## F-statistic: 10.35 on 3 and 502 DF, p-value: 1.281e-06
lm.indus = lm(crim~poly(indus,3))
 summary(lm.indus) # 1, 2, 3
## 
@@ -1411,20 +1429,20 @@ 

15d.

## ## Residuals: ## Min 1Q Median 3Q Max -## -8.28 -2.51 0.05 0.76 79.71 +## -8.278 -2.514 0.054 0.764 79.713 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.61 0.33 10.95 < 2e-16 *** -## poly(indus, 3)1 78.59 7.42 10.59 < 2e-16 *** -## poly(indus, 3)2 -24.39 7.42 -3.29 0.0011 ** -## poly(indus, 3)3 -54.13 7.42 -7.29 1.2e-12 *** +## (Intercept) 3.614 0.330 10.950 < 2e-16 *** +## poly(indus, 3)1 78.591 7.423 10.587 < 2e-16 *** +## poly(indus, 3)2 -24.395 7.423 -3.286 0.00109 ** +## poly(indus, 3)3 -54.130 7.423 -7.292 1.2e-12 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7.42 on 502 degrees of freedom -## Multiple R-squared: 0.26, Adjusted R-squared: 0.255 -## F-statistic: 58.7 on 3 and 502 DF, p-value: <2e-16
+## Residual standard error: 7.423 on 502 degrees of freedom +## Multiple R-squared: 0.2597, Adjusted R-squared: 0.2552 +## F-statistic: 58.69 on 3 and 502 DF, p-value: < 2.2e-16
# lm.chas = lm(crim~poly(chas,3)) : qualitative predictor
 lm.nox = lm(crim~poly(nox,3))
 summary(lm.nox) # 1, 2, 3
@@ -1434,20 +1452,20 @@

15d.

## ## Residuals: ## Min 1Q Median 3Q Max -## -9.11 -2.07 -0.25 0.74 78.30 +## -9.110 -2.068 -0.255 0.739 78.302 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.614 0.322 11.24 < 2e-16 *** -## poly(nox, 3)1 81.372 7.234 11.25 < 2e-16 *** -## poly(nox, 3)2 -28.829 7.234 -3.99 7.7e-05 *** -## poly(nox, 3)3 -60.362 7.234 -8.34 7.0e-16 *** +## (Intercept) 3.6135 0.3216 11.237 < 2e-16 *** +## poly(nox, 3)1 81.3720 7.2336 11.249 < 2e-16 *** +## poly(nox, 3)2 -28.8286 7.2336 -3.985 7.74e-05 *** +## poly(nox, 3)3 -60.3619 7.2336 -8.345 6.96e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7.23 on 502 degrees of freedom -## Multiple R-squared: 0.297, Adjusted R-squared: 0.293 -## F-statistic: 70.7 on 3 and 502 DF, p-value: <2e-16 +## Residual standard error: 7.234 on 502 degrees of freedom +## Multiple R-squared: 0.297, Adjusted R-squared: 0.2928 +## F-statistic: 70.69 on 3 and 502 DF, p-value: < 2.2e-16
lm.rm = lm(crim~poly(rm,3))
 summary(lm.rm) # 1, 2
## 
@@ -1455,21 +1473,21 @@ 

15d.

## lm(formula = crim ~ poly(rm, 3)) ## ## Residuals: -## Min 1Q Median 3Q Max -## -18.49 -3.47 -2.22 -0.01 87.22 +## Min 1Q Median 3Q Max +## -18.485 -3.468 -2.221 -0.015 87.219 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.61 0.37 9.76 < 2e-16 *** -## poly(rm, 3)1 -42.38 8.33 -5.09 5.1e-07 *** -## poly(rm, 3)2 26.58 8.33 3.19 0.0015 ** -## poly(rm, 3)3 -5.51 8.33 -0.66 0.5086 +## (Intercept) 3.6135 0.3703 9.758 < 2e-16 *** +## poly(rm, 3)1 -42.3794 8.3297 -5.088 5.13e-07 *** +## poly(rm, 3)2 26.5768 8.3297 3.191 0.00151 ** +## poly(rm, 3)3 -5.5103 8.3297 -0.662 0.50858 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 8.33 on 502 degrees of freedom -## Multiple R-squared: 0.0678, Adjusted R-squared: 0.0622 -## F-statistic: 12.2 on 3 and 502 DF, p-value: 1.07e-07
+## Multiple R-squared: 0.06779, Adjusted R-squared: 0.06222 +## F-statistic: 12.17 on 3 and 502 DF, p-value: 1.067e-07
lm.age = lm(crim~poly(age,3))
 summary(lm.age) # 1, 2, 3
## 
@@ -1478,20 +1496,20 @@ 

15d.

## ## Residuals: ## Min 1Q Median 3Q Max -## -9.76 -2.67 -0.52 0.02 82.84 +## -9.762 -2.673 -0.516 0.019 82.842 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.614 0.349 10.37 < 2e-16 *** -## poly(age, 3)1 68.182 7.840 8.70 < 2e-16 *** -## poly(age, 3)2 37.484 7.840 4.78 2.3e-06 *** -## poly(age, 3)3 21.353 7.840 2.72 0.0067 ** +## (Intercept) 3.6135 0.3485 10.368 < 2e-16 *** +## poly(age, 3)1 68.1820 7.8397 8.697 < 2e-16 *** +## poly(age, 3)2 37.4845 7.8397 4.781 2.29e-06 *** +## poly(age, 3)3 21.3532 7.8397 2.724 0.00668 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 7.84 on 502 degrees of freedom -## Multiple R-squared: 0.174, Adjusted R-squared: 0.169 -## F-statistic: 35.3 on 3 and 502 DF, p-value: <2e-16
+## Multiple R-squared: 0.1742, Adjusted R-squared: 0.1693 +## F-statistic: 35.31 on 3 and 502 DF, p-value: < 2.2e-16
lm.dis = lm(crim~poly(dis,3))
 summary(lm.dis) # 1, 2, 3
## 
@@ -1499,21 +1517,21 @@ 

15d.

## lm(formula = crim ~ poly(dis, 3)) ## ## Residuals: -## Min 1Q Median 3Q Max -## -10.76 -2.59 0.03 1.27 76.38 +## Min 1Q Median 3Q Max +## -10.757 -2.588 0.031 1.267 76.378 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.614 0.326 11.09 < 2e-16 *** -## poly(dis, 3)1 -73.389 7.331 -10.01 < 2e-16 *** -## poly(dis, 3)2 56.373 7.331 7.69 7.9e-14 *** -## poly(dis, 3)3 -42.622 7.331 -5.81 1.1e-08 *** +## (Intercept) 3.6135 0.3259 11.087 < 2e-16 *** +## poly(dis, 3)1 -73.3886 7.3315 -10.010 < 2e-16 *** +## poly(dis, 3)2 56.3730 7.3315 7.689 7.87e-14 *** +## poly(dis, 3)3 -42.6219 7.3315 -5.814 1.09e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7.33 on 502 degrees of freedom -## Multiple R-squared: 0.278, Adjusted R-squared: 0.274 -## F-statistic: 64.4 on 3 and 502 DF, p-value: <2e-16
+## Residual standard error: 7.331 on 502 degrees of freedom +## Multiple R-squared: 0.2778, Adjusted R-squared: 0.2735 +## F-statistic: 64.37 on 3 and 502 DF, p-value: < 2.2e-16
lm.rad = lm(crim~poly(rad,3))
 summary(lm.rad) # 1, 2
## 
@@ -1521,21 +1539,21 @@ 

15d.

## lm(formula = crim ~ poly(rad, 3)) ## ## Residuals: -## Min 1Q Median 3Q Max -## -10.38 -0.41 -0.27 0.18 76.22 +## Min 1Q Median 3Q Max +## -10.381 -0.412 -0.269 0.179 76.217 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.614 0.297 12.16 <2e-16 *** -## poly(rad, 3)1 120.907 6.682 18.09 <2e-16 *** -## poly(rad, 3)2 17.492 6.682 2.62 0.0091 ** -## poly(rad, 3)3 4.698 6.682 0.70 0.4823 +## (Intercept) 3.6135 0.2971 12.164 < 2e-16 *** +## poly(rad, 3)1 120.9074 6.6824 18.093 < 2e-16 *** +## poly(rad, 3)2 17.4923 6.6824 2.618 0.00912 ** +## poly(rad, 3)3 4.6985 6.6824 0.703 0.48231 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 6.68 on 502 degrees of freedom -## Multiple R-squared: 0.4, Adjusted R-squared: 0.396 -## F-statistic: 112 on 3 and 502 DF, p-value: <2e-16
+## Residual standard error: 6.682 on 502 degrees of freedom +## Multiple R-squared: 0.4, Adjusted R-squared: 0.3965 +## F-statistic: 111.6 on 3 and 502 DF, p-value: < 2.2e-16
lm.tax = lm(crim~poly(tax,3))
 summary(lm.tax) # 1, 2
## 
@@ -1543,21 +1561,21 @@ 

15d.

## lm(formula = crim ~ poly(tax, 3)) ## ## Residuals: -## Min 1Q Median 3Q Max -## -13.27 -1.39 0.05 0.54 76.95 +## Min 1Q Median 3Q Max +## -13.273 -1.389 0.046 0.536 76.950 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.614 0.305 11.86 < 2e-16 *** -## poly(tax, 3)1 112.646 6.854 16.44 < 2e-16 *** -## poly(tax, 3)2 32.087 6.854 4.68 3.7e-06 *** -## poly(tax, 3)3 -7.997 6.854 -1.17 0.24 +## (Intercept) 3.6135 0.3047 11.860 < 2e-16 *** +## poly(tax, 3)1 112.6458 6.8537 16.436 < 2e-16 *** +## poly(tax, 3)2 32.0873 6.8537 4.682 3.67e-06 *** +## poly(tax, 3)3 -7.9968 6.8537 -1.167 0.244 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 6.85 on 502 degrees of freedom -## Multiple R-squared: 0.369, Adjusted R-squared: 0.365 -## F-statistic: 97.8 on 3 and 502 DF, p-value: <2e-16
+## Residual standard error: 6.854 on 502 degrees of freedom +## Multiple R-squared: 0.3689, Adjusted R-squared: 0.3651 +## F-statistic: 97.8 on 3 and 502 DF, p-value: < 2.2e-16
lm.ptratio = lm(crim~poly(ptratio,3))
 summary(lm.ptratio) # 1, 2, 3
## 
@@ -1566,20 +1584,20 @@ 

15d.

## ## Residuals: ## Min 1Q Median 3Q Max -## -6.83 -4.15 -1.65 1.41 82.70 +## -6.833 -4.146 -1.655 1.408 82.697 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.614 0.361 10.01 < 2e-16 *** -## poly(ptratio, 3)1 56.045 8.122 6.90 1.6e-11 *** -## poly(ptratio, 3)2 24.775 8.122 3.05 0.0024 ** -## poly(ptratio, 3)3 -22.280 8.122 -2.74 0.0063 ** +## (Intercept) 3.614 0.361 10.008 < 2e-16 *** +## poly(ptratio, 3)1 56.045 8.122 6.901 1.57e-11 *** +## poly(ptratio, 3)2 24.775 8.122 3.050 0.00241 ** +## poly(ptratio, 3)3 -22.280 8.122 -2.743 0.00630 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 8.12 on 502 degrees of freedom -## Multiple R-squared: 0.114, Adjusted R-squared: 0.108 -## F-statistic: 21.5 on 3 and 502 DF, p-value: 4.17e-13
+## Residual standard error: 8.122 on 502 degrees of freedom +## Multiple R-squared: 0.1138, Adjusted R-squared: 0.1085 +## F-statistic: 21.48 on 3 and 502 DF, p-value: 4.171e-13
lm.black = lm(crim~poly(black,3))
 summary(lm.black) # 1
## 
@@ -1587,21 +1605,21 @@ 

15d.

## lm(formula = crim ~ poly(black, 3)) ## ## Residuals: -## Min 1Q Median 3Q Max -## -13.10 -2.34 -2.13 -1.44 86.79 +## Min 1Q Median 3Q Max +## -13.096 -2.343 -2.128 -1.439 86.790 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.614 0.354 10.22 <2e-16 *** -## poly(black, 3)1 -74.431 7.955 -9.36 <2e-16 *** -## poly(black, 3)2 5.926 7.955 0.75 0.46 -## poly(black, 3)3 -4.835 7.955 -0.61 0.54 +## (Intercept) 3.6135 0.3536 10.218 <2e-16 *** +## poly(black, 3)1 -74.4312 7.9546 -9.357 <2e-16 *** +## poly(black, 3)2 5.9264 7.9546 0.745 0.457 +## poly(black, 3)3 -4.8346 7.9546 -0.608 0.544 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7.95 on 502 degrees of freedom -## Multiple R-squared: 0.15, Adjusted R-squared: 0.145 -## F-statistic: 29.5 on 3 and 502 DF, p-value: <2e-16
+## Residual standard error: 7.955 on 502 degrees of freedom +## Multiple R-squared: 0.1498, Adjusted R-squared: 0.1448 +## F-statistic: 29.49 on 3 and 502 DF, p-value: < 2.2e-16
lm.lstat = lm(crim~poly(lstat,3))
 summary(lm.lstat) # 1, 2
## 
@@ -1609,21 +1627,21 @@ 

15d.

## lm(formula = crim ~ poly(lstat, 3)) ## ## Residuals: -## Min 1Q Median 3Q Max -## -15.23 -2.15 -0.49 0.07 83.35 +## Min 1Q Median 3Q Max +## -15.234 -2.151 -0.486 0.066 83.353 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.614 0.339 10.65 <2e-16 *** -## poly(lstat, 3)1 88.070 7.629 11.54 <2e-16 *** -## poly(lstat, 3)2 15.888 7.629 2.08 0.038 * -## poly(lstat, 3)3 -11.574 7.629 -1.52 0.130 +## (Intercept) 3.6135 0.3392 10.654 <2e-16 *** +## poly(lstat, 3)1 88.0697 7.6294 11.543 <2e-16 *** +## poly(lstat, 3)2 15.8882 7.6294 2.082 0.0378 * +## poly(lstat, 3)3 -11.5740 7.6294 -1.517 0.1299 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 7.63 on 502 degrees of freedom -## Multiple R-squared: 0.218, Adjusted R-squared: 0.213 -## F-statistic: 46.6 on 3 and 502 DF, p-value: <2e-16
+## Residual standard error: 7.629 on 502 degrees of freedom +## Multiple R-squared: 0.2179, Adjusted R-squared: 0.2133 +## F-statistic: 46.63 on 3 and 502 DF, p-value: < 2.2e-16
lm.medv = lm(crim~poly(medv,3))
 summary(lm.medv) # 1, 2, 3
## 
@@ -1631,21 +1649,21 @@ 

15d.

## lm(formula = crim ~ poly(medv, 3)) ## ## Residuals: -## Min 1Q Median 3Q Max -## -24.43 -1.98 -0.44 0.44 73.65 +## Min 1Q Median 3Q Max +## -24.427 -1.976 -0.437 0.439 73.655 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) -## (Intercept) 3.614 0.292 12.37 <2e-16 *** -## poly(medv, 3)1 -75.058 6.569 -11.43 <2e-16 *** -## poly(medv, 3)2 88.086 6.569 13.41 <2e-16 *** -## poly(medv, 3)3 -48.033 6.569 -7.31 1e-12 *** +## (Intercept) 3.614 0.292 12.374 < 2e-16 *** +## poly(medv, 3)1 -75.058 6.569 -11.426 < 2e-16 *** +## poly(medv, 3)2 88.086 6.569 13.409 < 2e-16 *** +## poly(medv, 3)3 -48.033 6.569 -7.312 1.05e-12 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## -## Residual standard error: 6.57 on 502 degrees of freedom -## Multiple R-squared: 0.42, Adjusted R-squared: 0.417 -## F-statistic: 121 on 3 and 502 DF, p-value: <2e-16
+## Residual standard error: 6.569 on 502 degrees of freedom +## Multiple R-squared: 0.4202, Adjusted R-squared: 0.4167 +## F-statistic: 121.3 on 3 and 502 DF, p-value: < 2.2e-16

See inline comments above, the answer is yes for most, except for black and chas.