|
| 1 | +--- |
| 2 | +title: "Lab 4" |
| 3 | +date: '10/21/2022' |
| 4 | +output: |
| 5 | + pdf_document: default |
| 6 | + html_document: default |
| 7 | +--- |
| 8 | + |
| 9 | + |
| 10 | +```{r setup, include=FALSE} |
| 11 | +knitr::opts_chunk$set(echo = TRUE) |
| 12 | +``` |
| 13 | + |
| 14 | +# The dangers of hard coding |
| 15 | + |
| 16 | +One of the most common problems from the last problem set was in question 1e, which asked you to write a function called mymean(). Many people had the correct solution, or something which worked similarly: |
| 17 | + |
| 18 | +```{r} |
| 19 | +mymean <- function(vec){ |
| 20 | + sum(vec)/length(vec) |
| 21 | +} |
| 22 | +``` |
| 23 | + |
| 24 | +| $x$ | $P(X = x)$ | |
| 25 | +|------|------------| |
| 26 | +| 0 | 1/8 | |
| 27 | +| 1 | 3/8 | |
| 28 | +| 2 | 3/8 | |
| 29 | +| 3 | 1/8 | |
| 30 | + |
| 31 | + |
| 32 | +We can code this distribution like this: |
| 33 | + |
| 34 | +```{r} |
| 35 | +X <- c(0, 1, 2, 3) |
| 36 | +probs <- c(1/8, 3/8, 3/8, 1/8) |
| 37 | +``` |
| 38 | + |
| 39 | +Then we can sample from it like this: |
| 40 | + |
| 41 | +```{r} |
| 42 | +set.seed(2418) |
| 43 | +
|
| 44 | +sample_1 <- sample(x = X, |
| 45 | + size = 1, |
| 46 | + prob = probs) |
| 47 | +sample_1 |
| 48 | +``` |
| 49 | + |
| 50 | +```{r} |
| 51 | +sample_10000 <- sample(x = X, |
| 52 | + size = 10000, |
| 53 | + prob = probs, |
| 54 | + replace = TRUE) |
| 55 | +
|
| 56 | +sample_10000 |
| 57 | +``` |
| 58 | + |
| 59 | +Now, back to our function. Let’s check to make sure it works: |
| 60 | + |
| 61 | +```{r} |
| 62 | +mean(sample_10000) |
| 63 | +``` |
| 64 | +```{r} |
| 65 | +mymean(sample_10000) |
| 66 | +``` |
| 67 | + |
| 68 | +Many assignments that something was “hard-coded,” and that even though the function may have produced the correct answer in this case, it would not work as intended in general. |
| 69 | + |
| 70 | +First, consider this version of mymean: |
| 71 | + |
| 72 | +```{r} |
| 73 | +n <- 10000 |
| 74 | +
|
| 75 | +mymean_2 <- function(vec){ |
| 76 | + sum(vec)/length(vec) |
| 77 | +} |
| 78 | +
|
| 79 | +mymean_2(sample_10000) |
| 80 | +``` |
| 81 | + |
| 82 | +It looks like it works, but notice that the denominator is n instead of length(vec). In this case, the denominator has been “hard-coded” into the function. It does not change when we pass in a different vector. The downside of the hard-coding is that the function only works correctly if you pass in a vector with 10000 entries. |
| 83 | + |
| 84 | +```{r} |
| 85 | +sample_5000 <- sample(x = X, |
| 86 | + size = 5000, |
| 87 | + prob = probs, |
| 88 | + replace = TRUE) |
| 89 | +
|
| 90 | +mymean_2(sample_5000) |
| 91 | +
|
| 92 | +``` |
| 93 | +This value is roughly half of what we would expect. Why? |
| 94 | + |
| 95 | +To be clear, there isn’t anything wrong with mymean_2. It correctly calculates the mean under certain conditions (specifically, when you have exactly 10000 data points). But, it’s usefulness is farily limited. This |
| 96 | +is almost always true when you hard-code values, parameters, etc. in your functions, so it is a practice to avoid if possible (and it usually is). |
| 97 | + |
| 98 | + |
| 99 | +# Conditional Probability of two Random Variables |
| 100 | + |
| 101 | +When writing the PMF of X conditional on Y, X is the random variable we defined above and Y is a random variable which takes on the value 1 if all three flips are heads and 0 otherwise. |
| 102 | + |
| 103 | +Conditional probability is a concept that will come up again and again when we talk about inference and regression. |
| 104 | + |
| 105 | +| $x$ |$y$ | $P(X = x | Y = y)$ | |
| 106 | +|------|----|---------------------| |
| 107 | +| 0 | 0 | 1/7 | |
| 108 | +| 1 | 0 | 3/7 | |
| 109 | +| 2 | 0 | 3/7 | |
| 110 | +| 3 | 1 | 1 | |
| 111 | + |
| 112 | +One thing that always helps me conceptualize conditional probabilities is to read | as “assuming that. . . ” So, if we want to know P(X = x|Y = y), we want the “probability that X = x assuming that Y = y. |
| 113 | + |
| 114 | +First, assume that Y = 1. There is only one possible event that maps to this value of Y : {HHH}. Therefore, the probability that X = 3 given that Y = 1 is one, and the probability that X takes any other value is 0. Now, assume that Y = 0. How many events map to this value of Y ? Seven: {T T T, T TH, T HH, HT T, HT H, T HT, HHT }. |
| 115 | + |
| 116 | +Now, calculating the conditional probabilities in the table above is as simple as counting how many of the seven map into particular values of X. There is one for which X = 0, 3 for which X = 1, and 3 for which X = 2. |
| 117 | + |
| 118 | +# Final Projects! |
| 119 | + |
| 120 | +Hopefully everyone has had a chance to locate an interesting data set you want to work with. We are nearly halfway through the quarter and I want to make sure that everyone is making progress. Hopefully this week’s material has prompted deeper thinking about what your data can tell you. |
| 121 | + |
| 122 | +Pull up your data and briefly describe it to the person sitting next to you. Where does it come from? What is the unit of observation? |
| 123 | + |
| 124 | +If you haven’t chosen a dataset, what are you considering? What topics interest you? See if anyone around you has suggestions. |
| 125 | + |
| 126 | +Have you run into any problems wrangling your data? Discuss ongoing challenges with the people around you. |
| 127 | + |
| 128 | +What are some questions your data can help you answer? Are they causal or descriptive questions? |
| 129 | + |
0 commit comments