Skip to content

Commit 49f4423

Browse files
Add files via upload
1 parent 15f3603 commit 49f4423

File tree

1 file changed

+129
-0
lines changed

1 file changed

+129
-0
lines changed

labs/Lab_4.Rmd

+129
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
---
2+
title: "Lab 4"
3+
date: '10/21/2022'
4+
output:
5+
pdf_document: default
6+
html_document: default
7+
---
8+
9+
10+
```{r setup, include=FALSE}
11+
knitr::opts_chunk$set(echo = TRUE)
12+
```
13+
14+
# The dangers of hard coding
15+
16+
One of the most common problems from the last problem set was in question 1e, which asked you to write a function called mymean(). Many people had the correct solution, or something which worked similarly:
17+
18+
```{r}
19+
mymean <- function(vec){
20+
sum(vec)/length(vec)
21+
}
22+
```
23+
24+
| $x$ | $P(X = x)$ |
25+
|------|------------|
26+
| 0 | 1/8 |
27+
| 1 | 3/8 |
28+
| 2 | 3/8 |
29+
| 3 | 1/8 |
30+
31+
32+
We can code this distribution like this:
33+
34+
```{r}
35+
X <- c(0, 1, 2, 3)
36+
probs <- c(1/8, 3/8, 3/8, 1/8)
37+
```
38+
39+
Then we can sample from it like this:
40+
41+
```{r}
42+
set.seed(2418)
43+
44+
sample_1 <- sample(x = X,
45+
size = 1,
46+
prob = probs)
47+
sample_1
48+
```
49+
50+
```{r}
51+
sample_10000 <- sample(x = X,
52+
size = 10000,
53+
prob = probs,
54+
replace = TRUE)
55+
56+
sample_10000
57+
```
58+
59+
Now, back to our function. Let’s check to make sure it works:
60+
61+
```{r}
62+
mean(sample_10000)
63+
```
64+
```{r}
65+
mymean(sample_10000)
66+
```
67+
68+
Many assignments that something was “hard-coded,” and that even though the function may have produced the correct answer in this case, it would not work as intended in general.
69+
70+
First, consider this version of mymean:
71+
72+
```{r}
73+
n <- 10000
74+
75+
mymean_2 <- function(vec){
76+
sum(vec)/length(vec)
77+
}
78+
79+
mymean_2(sample_10000)
80+
```
81+
82+
It looks like it works, but notice that the denominator is n instead of length(vec). In this case, the denominator has been “hard-coded” into the function. It does not change when we pass in a different vector. The downside of the hard-coding is that the function only works correctly if you pass in a vector with 10000 entries.
83+
84+
```{r}
85+
sample_5000 <- sample(x = X,
86+
size = 5000,
87+
prob = probs,
88+
replace = TRUE)
89+
90+
mymean_2(sample_5000)
91+
92+
```
93+
This value is roughly half of what we would expect. Why?
94+
95+
To be clear, there isn’t anything wrong with mymean_2. It correctly calculates the mean under certain conditions (specifically, when you have exactly 10000 data points). But, it’s usefulness is farily limited. This
96+
is almost always true when you hard-code values, parameters, etc. in your functions, so it is a practice to avoid if possible (and it usually is).
97+
98+
99+
# Conditional Probability of two Random Variables
100+
101+
When writing the PMF of X conditional on Y, X is the random variable we defined above and Y is a random variable which takes on the value 1 if all three flips are heads and 0 otherwise.
102+
103+
Conditional probability is a concept that will come up again and again when we talk about inference and regression.
104+
105+
| $x$ |$y$ | $P(X = x | Y = y)$ |
106+
|------|----|---------------------|
107+
| 0 | 0 | 1/7 |
108+
| 1 | 0 | 3/7 |
109+
| 2 | 0 | 3/7 |
110+
| 3 | 1 | 1 |
111+
112+
One thing that always helps me conceptualize conditional probabilities is to read | as “assuming that. . . ” So, if we want to know P(X = x|Y = y), we want the “probability that X = x assuming that Y = y.
113+
114+
First, assume that Y = 1. There is only one possible event that maps to this value of Y : {HHH}. Therefore, the probability that X = 3 given that Y = 1 is one, and the probability that X takes any other value is 0. Now, assume that Y = 0. How many events map to this value of Y ? Seven: {T T T, T TH, T HH, HT T, HT H, T HT, HHT }.
115+
116+
Now, calculating the conditional probabilities in the table above is as simple as counting how many of the seven map into particular values of X. There is one for which X = 0, 3 for which X = 1, and 3 for which X = 2.
117+
118+
# Final Projects!
119+
120+
Hopefully everyone has had a chance to locate an interesting data set you want to work with. We are nearly halfway through the quarter and I want to make sure that everyone is making progress. Hopefully this week’s material has prompted deeper thinking about what your data can tell you.
121+
122+
Pull up your data and briefly describe it to the person sitting next to you. Where does it come from? What is the unit of observation?
123+
124+
If you haven’t chosen a dataset, what are you considering? What topics interest you? See if anyone around you has suggestions.
125+
126+
Have you run into any problems wrangling your data? Discuss ongoing challenges with the people around you.
127+
128+
What are some questions your data can help you answer? Are they causal or descriptive questions?
129+

0 commit comments

Comments
 (0)