Skip to content

Commit

Permalink
Merge pull request #493 from gungorMetehan/patch-6
Browse files Browse the repository at this point in the history
Update 07-sampling.Rmd
  • Loading branch information
ismayc authored Apr 22, 2024
2 parents a8a5b7a + cb2648c commit 975a60a
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions 07-sampling.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -725,7 +725,7 @@ virtual_shovel
virtual_shovel
```

Using the sample of 50 balls contained in `virtual_shovel`, we generated an estimate of the proportion of the bowl's balls that are red `prop_red`
Using the sample of 50 balls contained in `virtual_shovel`, we generated an estimate of the proportion of the bowl's balls that are red `prop_red`.


```{r}
Expand All @@ -746,7 +746,7 @@ We say a sample of $n$ balls extracted using our shovel is **representative** of

The fourth and final set of terms and notation relate to the goal of sampling:

1. One way to ensure that a sample is unbiased and representative of the population is by using **random sampling**
1. One way to ensure that a sample is unbiased and representative of the population is by using **random sampling**.
1. **Inference** is the act of "making a guess" about some unknown. **Statistical inference** is the act of making a guess about a population using a sample.

In our case, since the `rep_sample_n()` function uses your computer's [random number generator](https://en.wikipedia.org/wiki/Random_number_generation), we were in fact performing **random sampling**.
Expand Down Expand Up @@ -1140,10 +1140,10 @@ knitr::include_graphics("images/copyright/CLT_video_preview.png")

Here's what is so surprising about the Central Limit Theorem: regardless of the shape of the underlying population distribution, the sampling distribution of means (such as the sample mean of bunny weights or the sample mean of the length of dragon wings) and proportions (such as the sample proportion red in our shovels) will be **normal.** Normal distributions are defined by where they are centered and how wide they are, and the Central Limit Theorem gives us both:

1. The sampling distribution of the point estimate is centered at the true population parameter
2. We have an estimate for how wide the sampling distribution of the point estimate is, given by the standard error (which we will discuss further in Chapter \@ref(confidence-intervals))
1. The sampling distribution of the point estimate is centered at the true population parameter.
2. We have an estimate for how wide the sampling distribution of the point estimate is, given by the standard error (which we will discuss further in Chapter \@ref(confidence-intervals)).

What the Central Limit Theorem creates for us is a ladder between a *single* sample and the population. By the Central Limit Theorem, we can say that (1) our sample's point estimate is drawn from a normal distribution centered at the true population parameter and (2)that the width of that normal distribution is governed by the standard error of our point estimate. Relating this to our bowl, if we pull one sample and get the sample proportion of red balls $\widehat{p}$, this value of $\widehat{p}$ is drawn from the normal curve centered at the true population proportion of red balls $p$ with the computed standard error.
What the Central Limit Theorem creates for us is a ladder between a *single* sample and the population. By the Central Limit Theorem, we can say that (1) our sample's point estimate is drawn from a normal distribution centered at the true population parameter and (2) that the width of that normal distribution is governed by the standard error of our point estimate. Relating this to our bowl, if we pull one sample and get the sample proportion of red balls $\widehat{p}$, this value of $\widehat{p}$ is drawn from the normal curve centered at the true population proportion of red balls $p$ with the computed standard error.

<!--
TODO: Maybe add synthetic datasets to the moderndive package so that students
Expand Down Expand Up @@ -1246,4 +1246,4 @@ Recall in our Obama poll case study in Section \@ref(sampling-case-study) that b

> The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll's margin of error was plus or minus 2.1 percentage points.
Note the term *margin of error*, which here is "plus or minus 2.1 percentage points." Most polls won't produce an estimate that's perfectly right; there will always be a certain amount of error caused by *sampling variation*. The margin of error of plus or minus 2.1 percentage points is saying that a typical range of errors for polls of this type is about $\pm$ 2.1%, in words from about 2.1% too small to about 2.1% too big. We can restate this as the interval of $[41\% - 2.1\%, 41\% + 2.1\%] = [37.9\%, 43.1\%]$ (this notation indicates the interval contains all values between 37.9% and 43.1%, including the end points of 37.9% and 43.1%). We'll see in the next chapter that such intervals are known as *confidence intervals*.
Note the term *margin of error*, which here is "plus or minus 2.1 percentage points." Most polls won't produce an estimate that's perfectly right; there will always be a certain amount of error caused by *sampling variation*. The margin of error of plus or minus 2.1 percentage points is saying that a typical range of errors for polls of this type is about $\pm$ 2.1%, in words from about 2.1% too small to about 2.1% too big. We can restate this as the interval of $[41\% - 2.1\%, 41\% + 2.1\%] = [38.9\%, 43.1\%]$ (this notation indicates the interval contains all values between 38.9% and 43.1%, including the end points of 38.9% and 43.1%). We'll see in the next chapter that such intervals are known as *confidence intervals*.

0 comments on commit 975a60a

Please sign in to comment.