-
Notifications
You must be signed in to change notification settings - Fork 3
/
SosaVsRose.Rmd
49 lines (37 loc) · 1.23 KB
/
SosaVsRose.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
title: "Was Sammy Sosa a better hitter than Pete Rose?"
author: Jane Analyst
output: html_document
---
```{r results = 'hide', echo = FALSE, message = FALSE}
library(Lahman)
library(dplyr)
library(ggplot2)
data(Batting)
data(Master)
Master <- Master %>%
select(playerID, nameFirst, nameLast)
Batting <- Batting %>%
filter(playerID %in% c('sosasa01', 'rosepe01')) %>%
merge(Master, all.x=TRUE) %>%
group_by(nameFirst, nameLast) %>%
summarise(Hits = sum(H)
, AtBats = sum(AB)) %>%
ungroup() %>%
mutate(Name = paste(nameFirst, nameLast)
, Misses = AtBats - Hits
, BattingAverage = Hits / AtBats) %>%
select(-nameFirst, -nameLast)
```
Did Sammy Sosa hit better than Pete Rose? Let's use Bayesian analysis to try and find out!
The graph below is a posterior density of each player's batting average using an uninformed beta prior density.
```{r echo = FALSE}
dfBeta <- data.frame(x = seq(0, 1, length.out = 1000))
for (i in seq_len(nrow(Batting))){
dfBeta[, Batting$Name[i]] <- dbeta(dfBeta$x, Batting$Hits[i] + 1, Batting$Misses[i] + 1)
}
dfBeta <- dfBeta %>%
tidyr::gather("Batter", "Density", -x)
plt <- ggplot(dfBeta, aes(x = x, y = Density, color = Batter)) + geom_line()
plt
```