diff --git a/2024/lecture_03.html b/2024/lecture_03.html deleted file mode 100644 index da4ddee..0000000 --- a/2024/lecture_03.html +++ /dev/null @@ -1,3131 +0,0 @@ - -
- - - - - - - - - - -Uncertainty,
-standard errors and
-confidence intervals
Recap of sampling from populations
Uncertainty in research and estimation
Sampling distribution revisited
Standard error of the mean
Confidence intervals: what they are, and what they are not
LAST WEEK: Jennifer undertook a determined journey to find out just how clever Jessie (left) is compared to all the other dogs in the population.
-THIS WEEK: What if we don’t know anything about the population?
-\[ -SE = \frac{\sigma}{\sqrt N} -\]
-\[ -\text{CI limits} = mean \pm (1.96 \times{SE}) \\ -\]
-The average person (apparently)…
-drinks 730 cups of coffee per year (twice as much for academics, incl. students) ☕
spends 192 minutes a day watching TV 📺
eats 250 cloves of garlic per year 🧄
takes 3500 steps each day 🚶
falls asleep in 7 minutes 😴
--Doomscrolling
-“… refers to a unique media habit where social media users persistently attend to negative information in their newsfeeds about crises, disasters, and tragedies.”
-- Sharma, Lee, and Johnson (2022)
-
The problem: Each time we take a sample, we get a different estimate.
- -The problem: Each time we take a sample, we get a different estimate.
- -Sampling distributions don’t exist “in the wild”. They are a hypothetical statistical concept.
Remember: standard error refers to the standard deviation of the sampling distribution (created by re-sampling and computing the mean infinite number of times), but we only have access one sample with one mean.
Therefore, if we want to use the standard error to construct an interval, we need to estimate it from our sample.
Equation:
-\[ -SE = \frac{\sigma}{\sqrt N} -\]
-Translation:
-\[ -\text{standard error} = \frac{\text{sample standard deviation}}{\text{(the square root of) the sample size}} -\]
-In R
:
We collect a sample of 4 individuals.
Each person reports their daily doomscrolling time (in minutes): 86, 114, 97, 107
The mean for the sample is 101 minutes
The standard deviation is:
\[ -\sigma = \sqrt\frac{\sum(x_i - x)^2}{N} = \sqrt\frac{(86-101)^2 + (114-101)^2 + (97 - 101)^2+(107-101)^2}{4} = 12.19 -\]
-\[ -SE = \frac{\sigma}{\sqrt{N}} = \frac{12.19}{\sqrt{4}} = 6.095 -\]
-Average doomscrolling time for the sample: 101 minutes
-Standard deviation: 12.19
-Standard error: 6.095
-\[ -\text{Lower CI limit} = \text{sample mean} - 1.96 \times\text{SE} \\ -\text{Upper CI limit} = \text{sample mean} + 1.96 \times\text{SE} -\]
-\[ -\text{Lower CI limit} = 101 - 1.96 \times6.095 = 89.054\\ -\text{Upper CI limit} = 101 + 1.96 \times6.095 = 112.946 -\]
-Remember: sampling distribution of the mean will have a normal shape as long as the sample size large enough
Smaller samples don’t approximate the normal sampling distribution very well. Because of this, we can’t rely on the value 1.96 to give us accurate intervals.
Instead, we can use the t-distribution
The “critical t value” - i.e. the value of t which will give us the most accurate estimate of the confidence interval - depends on degrees of freedom (df), which in our case are related to the sample size.
When working with the mean, the degrees of freedom are calculated as N - 1.
Instead of multiplying the standard error by 1.96, we multiply by the critical t value.
For example, in our sample of 4, the df is 4 - 1 = 3. Move the slider to df = 3 to see that the critical t value for 3 is 3.182
Average doomscrolling time for the sample: 101 minutes
-Standard error: 6.095
-Critical t value: 3.182
-\[ -\text{CI Limits} = \text{mean} \pm3.182 \times\text{SE} \\ -\text{CI Limits} = 101 \pm3.182 \times\text{6.095} \\ -\text{CI Limits} = [81.606, 120.394] -\]
-This is to be expected - we have a tiny sample (N = 4), so there is a lot of uncertainty around whether the estimate of 101 minutes is actually representative of the population.
The larger the sample, the tighter the confidence intervals,. because the critical t gets smaller and smaller (note how t approaches 1.96 as the sample size (df) increases)
We take samples over and over again, compute the mean, and construct confidence intervals around that mean - 95% of them will contain the population value, the remaining 5% will not.
This is known as an interval with 95% coverage. 95% is the most common value that we choose, but it can take on other values as well (e.g 50%, 90%, 99%).
\[ -\text{"The average doomscrolling time in our sample was} \\ -\text{101 minutes (SD = 12.19) 95% CI [81.61, 120.39]."} -\]
-\[ -SE = \frac{\sigma}{\sqrt N} -\]
-\[ -\text{CI limits} = mean \pm (1.96 \times{SE}) \\ -\]
-When interpreting estimates and confidence intervals for your sample - always consider them as just one of many different possible estimates
This is why replication is important in science - our sample could easily be the one that misses the population value
Always be vary of studies placing too much confidence on a single finding
Putting it all into practice:
-Research questions
Good and bad hypotheses
Testing hypotheses with Null Hypothesis Significance Testing
A disappointing answer to why we’re so obsessed with the value 95%.