intro-stat-learning
diff --git a/‎Ch02-statlearn-lab.ipynb‎
Lines changed: 1295 additions & 1291 deletions b/‎Ch02-statlearn-lab.ipynb‎
Lines changed: 1295 additions & 1291 deletions
diff --git a/‎Ch03-linreg-lab.ipynb‎
Lines changed: 2 additions & 2 deletions b/‎Ch03-linreg-lab.ipynb‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎Ch04-classification-lab.ipynb‎
Lines changed: 11 additions & 11 deletions b/‎Ch04-classification-lab.ipynb‎
Lines changed: 11 additions & 11 deletions
diff --git a/‎Ch05-resample-lab.ipynb‎
Lines changed: 12 additions & 12 deletions b/‎Ch05-resample-lab.ipynb‎
Lines changed: 12 additions & 12 deletions
diff --git a/‎Ch06-varselect-lab.ipynb‎
Lines changed: 7 additions & 7 deletions b/‎Ch06-varselect-lab.ipynb‎
Lines changed: 7 additions & 7 deletions
@@ -1533,7 +1533,7 @@
    "metadata": {},
    "source": [
     "Next we examine some diagnostic plots, several of which were discussed\n",
-    "in Section~\\ref{Ch3:problems.sec}.\n",
+    "in Section 3.3.3.\n",
     "We can find the fitted values and residuals\n",
     "of the fit as attributes of the `results` object.\n",
     "Various influence measures describing the regression model\n",
@@ -2142,7 +2142,7 @@
     "and\n",
     "`np.sqrt(results.scale)` gives us the RSE.\n",
     "\n",
-    "Variance inflation factors (section~\\ref{Ch3:problems.sec}) are sometimes useful\n",
+    "Variance inflation factors (section 3.3.3) are sometimes useful\n",
     "to assess the effect of collinearity in the model matrix of a regression model.\n",
     "We will compute the VIFs in our multiple regression fit, and use the opportunity to introduce the idea of *list comprehension*.\n",
     "\n",
 
@@ -2007,7 +2007,7 @@
    "metadata": {},
    "source": [
     "Here we have used the list comprehensions introduced\n",
-    "in Section~\\ref{Ch3-linreg-lab:multivariate-goodness-of-fit}. Looking at our first line above, we see that the right-hand side is a list\n",
+    "in Section 3.6.4. Looking at our first line above, we see that the right-hand side is a list\n",
     "of length two. This is because the code `for M in [X_train, X_test]` iterates over a list\n",
     "of length two. While here we loop over a list,\n",
     "the list comprehension method works when looping over any iterable object.\n",
@@ -2173,7 +2173,7 @@
    "id": "f0a4abaf",
    "metadata": {},
    "source": [
-    "These values provide the linear combination of `Lag1`  and `Lag2`  that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (\\ref{Ch4:bayes.multi}).\n",
+    "These values provide the linear combination of `Lag1`  and `Lag2`  that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (4.24).\n",
     "  If $-0.64\\times `Lag1`  - 0.51 \\times `Lag2` $ is large, then the LDA classifier will predict a market increase, and if it is small, then the LDA classifier will predict a market decline."
    ]
   },
@@ -2200,7 +2200,7 @@
    "metadata": {},
    "source": [
     "As we observed in our comparison of classification methods\n",
-    " (Section~\\ref{Ch4:comparison.sec}),  the LDA and logistic\n",
+    " (Section 4.5),  the LDA and logistic\n",
     "regression predictions are almost identical."
    ]
   },
@@ -2421,7 +2421,7 @@
     "`sklearn` library. We will use several other objects\n",
     "from this library. The objects\n",
     "follow a common structure that simplifies tasks such as cross-validation,\n",
-    "which we will see in Chapter~\\ref{Ch5:resample}. Specifically,\n",
+    "which we will see in Chapter 5. Specifically,\n",
     "the methods first create a generic classifier without\n",
     "referring to any data. This classifier is then fit\n",
     "to data with the `fit()`  method and predictions are\n",
@@ -4570,7 +4570,7 @@
     "The number of neighbors in KNN is referred to as a *tuning parameter*, also referred to as a *hyperparameter*.\n",
     "We do not know *a priori* what value to use. It is therefore of interest\n",
     "to see how the classifier performs on test data as we vary these\n",
-    "parameters. This can be achieved with a `for` loop, described in Section~\\ref{Ch2-statlearn-lab:for-loops}.\n",
+    "parameters. This can be achieved with a `for` loop, described in Section 2.3.8.\n",
     "Here we use a for loop to look at the accuracy of our classifier in the group predicted to purchase\n",
     "insurance as we vary the number of neighbors from 1 to 5:"
    ]
@@ -4629,7 +4629,7 @@
     "data. This can also be done\n",
     "with `sklearn`, though by default it fits\n",
     "something like the *ridge regression* version\n",
-    "of logistic regression, which we introduce in Chapter~\\ref{Ch6:varselect}. This can\n",
+    "of logistic regression, which we introduce in Chapter 6. This can\n",
     "be modified by appropriately setting the argument `C` below. Its default\n",
     "value is 1 but by setting it to a very large number, the algorithm converges to the same solution as the usual (unregularized)\n",
     "logistic regression estimator discussed above.\n",
@@ -4849,7 +4849,7 @@
    "metadata": {},
    "source": [
     "## Linear and Poisson Regression on the Bikeshare Data\n",
-    "Here we fit linear and  Poisson regression models to the `Bikeshare` data, as described in Section~\\ref{Ch4:sec:pois}.\n",
+    "Here we fit linear and  Poisson regression models to the `Bikeshare` data, as described in Section 4.6.\n",
     "The response `bikers` measures the number of bike rentals per hour\n",
     "in Washington, DC in the period 2010--2012."
    ]
@@ -5322,7 +5322,7 @@
     "February than in January. Similarly there are about 16.5 more riders\n",
     "in March than in January.\n",
     "\n",
-    "The results seen in Section~\\ref{sec:bikeshare.linear}\n",
+    "The results seen in Section 4.6.1\n",
     "used a slightly different coding of the variables `hr` and `mnth`, as follows:"
    ]
   },
@@ -5834,7 +5834,7 @@
    "id": "41fb2787",
    "metadata": {},
    "source": [
-    "To reproduce the left-hand side of Figure~\\ref{Ch4:bikeshare}\n",
+    "To reproduce the left-hand side of Figure 4.13\n",
     "we must first obtain the coefficient estimates associated with\n",
     "`mnth`. The coefficients for January through November can be obtained\n",
     "directly from the `M2_lm` object. The coefficient for December\n",
@@ -5988,7 +5988,7 @@
    "id": "6c68761a",
    "metadata": {},
    "source": [
-    "Reproducing the  right-hand plot in Figure~\\ref{Ch4:bikeshare}  follows a similar process."
+    "Reproducing the  right-hand plot in Figure 4.13  follows a similar process."
    ]
   },
   {
@@ -6088,7 +6088,7 @@
    "id": "8552fb8b",
    "metadata": {},
    "source": [
-    "We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce  Figure~\\ref{Ch4:bikeshare.pois}. We first complete these coefficients as before."
+    "We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce  Figure 4.15. We first complete these coefficients as before."
    ]
   },
   {
 
@@ -486,7 +486,7 @@
    "id": "a3a920ae",
    "metadata": {},
    "source": [
-    "As in Figure~\\ref{Ch5:cvplot}, we see a sharp drop in the estimated test MSE between the linear and\n",
+    "As in Figure 5.4, we see a sharp drop in the estimated test MSE between the linear and\n",
     "quadratic fits, but then no clear improvement from using higher-degree polynomials.\n",
     "\n",
     "Above we introduced the `outer()`  method of the `np.power()`\n",
@@ -589,7 +589,7 @@
     "Notice that the computation time is much shorter than that of LOOCV.\n",
     "(In principle, the computation time for LOOCV for a least squares\n",
     "linear model should be faster than for $k$-fold CV, due to the\n",
-    "availability of the formula~(\\ref{Ch5:eq:LOOCVform})  for LOOCV;\n",
+    "availability of the formula~(5.2)  for LOOCV;\n",
     "however, the generic `cross_validate()`  function does not make\n",
     "use of this formula.)  We still see little evidence that using cubic\n",
     "or higher-degree polynomial terms leads to a lower test error than simply\n",
@@ -699,7 +699,7 @@
     "\n",
     "## The Bootstrap\n",
     "We illustrate the use of the bootstrap in the simple example\n",
-    " {of Section~\\ref{Ch5:sec:bootstrap},}  as well as on an example involving\n",
+    " {of Section 5.2,}  as well as on an example involving\n",
     "estimating the accuracy of the linear regression model on the  `Auto`\n",
     "data set.\n",
     "### Estimating the Accuracy of a Statistic of Interest\n",
@@ -714,8 +714,8 @@
     "To illustrate the bootstrap, we\n",
     "start with a simple example.\n",
     "The  `Portfolio`  data set in the `ISLP` package is described\n",
-    "in Section~\\ref{Ch5:sec:bootstrap}. The goal is to estimate the\n",
-    "sampling variance of the parameter $\\alpha$ given in formula~(\\ref{Ch5:min.var}).  We will\n",
+    "in Section 5.2. The goal is to estimate the\n",
+    "sampling variance of the parameter $\\alpha$ given in formula~(5.7).  We will\n",
     "create a function\n",
     "`alpha_func()`, which takes as input a dataframe `D` assumed\n",
     "to have columns `X` and `Y`, as well as a\n",
@@ -754,7 +754,7 @@
    "source": [
     "This function returns an estimate for $\\alpha$\n",
     "based on applying the minimum\n",
-    "    variance formula (\\ref{Ch5:min.var}) to the observations indexed by\n",
+    "    variance formula (5.7) to the observations indexed by\n",
     "the argument `idx`.  For instance, the following command\n",
     "estimates $\\alpha$ using all 100 observations."
    ]
@@ -934,7 +934,7 @@
     "`horsepower` to predict `mpg` in the  `Auto`  data set. We\n",
     "will compare the estimates obtained using the bootstrap to those\n",
     "obtained using the formulas for ${\\rm SE}(\\hat{\\beta}_0)$ and\n",
-    "${\\rm SE}(\\hat{\\beta}_1)$ described in Section~\\ref{Ch3:secoefsec}.\n",
+    "${\\rm SE}(\\hat{\\beta}_1)$ described in Section 3.1.2.\n",
     "\n",
     "To use our `boot_SE()` function, we must write a function (its\n",
     "first argument)\n",
@@ -1115,7 +1115,7 @@
     "0.85, and that the bootstrap\n",
     "estimate for ${\\rm SE}(\\hat{\\beta}_1)$ is\n",
     "0.0074.  As discussed in\n",
-    "Section~\\ref{Ch3:secoefsec}, standard formulas can be used to compute\n",
+    "Section 3.1.2, standard formulas can be used to compute\n",
     "the standard errors for the regression coefficients in a linear\n",
     "model. These can be obtained using the `summarize()`  function\n",
     "from `ISLP.sm`."
@@ -1160,21 +1160,21 @@
    "metadata": {},
    "source": [
     "The standard error estimates for $\\hat{\\beta}_0$ and $\\hat{\\beta}_1$\n",
-    "obtained using the formulas  from Section~\\ref{Ch3:secoefsec}  are\n",
+    "obtained using the formulas  from Section 3.1.2  are\n",
     "0.717 for the\n",
     "intercept and\n",
     "0.006 for the\n",
     "slope. Interestingly, these are somewhat different from the estimates\n",
     "obtained using the bootstrap.  Does this indicate a problem with the\n",
     "bootstrap? In fact, it suggests the opposite.  Recall that the\n",
     "standard formulas given in\n",
-    " {Equation~\\ref{Ch3:se.eqn} on page~\\pageref{Ch3:se.eqn}}\n",
+    " {Equation 3.8 on page~\\pageref{Ch3:se.eqn}}\n",
     "rely on certain assumptions. For example,\n",
     "they depend on the unknown parameter $\\sigma^2$, the noise\n",
     "variance. We then estimate $\\sigma^2$ using the RSS. Now although the\n",
     "formulas for the standard errors do not rely on the linear model being\n",
     "correct, the estimate for $\\sigma^2$ does.  We see\n",
-    " {in Figure~\\ref{Ch3:polyplot} on page~\\pageref{Ch3:polyplot}}  that there is\n",
+    " {in Figure 3.8 on page~\\pageref{Ch3:polyplot}}  that there is\n",
     "a non-linear relationship in the data, and so the residuals from a\n",
     "linear fit will be inflated, and so will $\\hat{\\sigma}^2$.  Secondly,\n",
     "the standard formulas assume (somewhat unrealistically) that the $x_i$\n",
@@ -1187,7 +1187,7 @@
     "Below we compute the bootstrap standard error estimates and the\n",
     "standard linear regression estimates that result from fitting the\n",
     "quadratic model to the data. Since this model provides a good fit to\n",
-    "the data (Figure~\\ref{Ch3:polyplot}), there is now a better\n",
+    "the data (Figure 3.8), there is now a better\n",
     "correspondence between the bootstrap estimates and the standard\n",
     "estimates of ${\\rm SE}(\\hat{\\beta}_0)$, ${\\rm SE}(\\hat{\\beta}_1)$ and\n",
     "${\\rm SE}(\\hat{\\beta}_2)$."
 
@@ -182,7 +182,7 @@
    "id": "5199f18e",
    "metadata": {},
    "source": [
-    "We first choose the best model using forward selection based on $C_p$ (\\ref{Ch6:eq:cp}). This score\n",
+    "We first choose the best model using forward selection based on $C_p$ (6.2). This score\n",
     "is not built in as a metric to `sklearn`. We therefore define a function to compute it ourselves, and use\n",
     "it as a scorer. By default, `sklearn` tries to maximize a score, hence\n",
     "  our scoring function  computes the negative $C_p$ statistic."
@@ -245,7 +245,7 @@
    "id": "bc283589",
    "metadata": {},
    "source": [
-    "The function `sklearn_selected()` expects a scorer with just three arguments --- the last three in the definition of `nCp()` above. We use the function `partial()` first seen in Section~\\ref{Ch5-resample-lab:the-bootstrap} to freeze the first argument with our estimate of $\\sigma^2$."
+    "The function `sklearn_selected()` expects a scorer with just three arguments --- the last three in the definition of `nCp()` above. We use the function `partial()` first seen in Section 5.3.3 to freeze the first argument with our estimate of $\\sigma^2$."
    ]
   },
   {
@@ -997,7 +997,7 @@
     "standardize first, in order to find coefficient\n",
     "estimates on the original scale, we must *unstandardize*\n",
     "the coefficient estimates. The parameter\n",
-    "$\\lambda$ in (\\ref{Ch6:ridge}) and (\\ref{Ch6:LASSO}) is called `alphas` in `sklearn`. In order to\n",
+    "$\\lambda$ in (6.5) and (6.7) is called `alphas` in `sklearn`. In order to\n",
     "be consistent with the rest of this chapter, we use `lambdas`\n",
     "rather than `alphas` in what follows.  {At the time of publication, ridge fits like the one in code chunk [22] issue unwarranted convergence warning messages; we expect these to disappear as this package matures.}"
    ]
@@ -9710,7 +9710,7 @@
     "### Evaluating Test Error of Cross-Validated Ridge\n",
     "Choosing $\\lambda$ using cross-validation provides a single regression\n",
     "estimator, similar to fitting a linear regression model as we saw in\n",
-    "Chapter~\\ref{Ch3:linreg}. It is therefore reasonable to estimate what its test error\n",
+    "Chapter 3. It is therefore reasonable to estimate what its test error\n",
     "is. We run into a problem here in that cross-validation will have\n",
     "*touched* all of its data in choosing $\\lambda$, hence we have no\n",
     "further data to estimate test error. A compromise is to do an initial\n",
@@ -12101,11 +12101,11 @@
     "`PCA()`  from the `sklearn.decomposition`\n",
     "module. We now apply PCR to the  `Hitters`  data, in order to\n",
     "predict `Salary`. Again, ensure that the missing values have\n",
-    "been removed from the data, as described in Section~\\ref{Ch6-varselect-lab:lab-1-subset-selection-methods}.\n",
+    "been removed from the data, as described in Section 6.5.1.\n",
     "\n",
     "We use `LinearRegression()`  to fit the regression model\n",
     "here. Note that it fits an intercept by default, unlike\n",
-    "the `OLS()` function seen earlier in Section~\\ref{Ch6-varselect-lab:lab-1-subset-selection-methods}."
+    "the `OLS()` function seen earlier in Section 6.5.1."
    ]
   },
   {
@@ -12757,7 +12757,7 @@
     "The `explained_variance_ratio_`\n",
     "attribute of our `PCA` object provides the *percentage of variance explained* in the predictors and in the response using\n",
     "different numbers of components. This concept is discussed in greater\n",
-    "detail in Section~\\ref{Ch10:sec:pca}."
+    "detail in Section 12.2."
    ]
   },
   {