-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Data #15
Comments
Thank you for raising this! I'm fairly certain this is a problem with the underlying data from NFL where team name changes messed them up. We actually discovered this issue yesterday and fixed it for the next version of the package (not released yet), but only noticed it for JAX. So I need to check on STL and will leave this open. |
Looked into this with the new data source and seems to be fixed. For future reference, my code for checking the ratio of home plays to away plays: library(tidyverse)
seasons <- 1999:2019
pbp <- purrr::map_df(seasons, function(x) {
readRDS(
url(
glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_{x}.rds")
)
)
})
sum <- pbp %>%
filter(!is.na(posteam), !is.na(defteam), posteam != "") %>%
group_by(game_id, posteam, home_team) %>%
summarize(n = n ())
gs <- sum %>%
ungroup() %>%
mutate(home = if_else(posteam == home_team, 1, 0)) %>%
select(game_id, home, n) %>%
pivot_wider(
names_from = home,
values_from = n
) %>%
dplyr::rename(
away = `0`,
home = `1`
) %>%
mutate(ratio = away / home) Doing this led me to find some games with way too many plays. Here's an example where duplicates need to be removed ('2007_08_IND_CAR'): |
This was a problem with duplicate |
One more comment- sadly, we can't fix anything in |
Using the zipped files from legacy data and python I have found 1241 instances where the "posteam" is also recorded as the "defteam". It only happens to Jacksonville and LA (Rams) (renaming issue?). The relevant "game_id"s are:
2009110804
2009112204
2009120605
2009121305
2009121700
2010092610
2010100308
2010122608
2015102500
2015111504
2015111900
2015112906
2015120609
2015121309
2015121700
2015122003
In game 2009110804, the only time that JAX is reported as "posteam" with KC as "defteam" are kickoffs. All kickoffs for that game are listed as JAX as "posteam" and KC as "defteam" regardless of which team is kicking. All other instances where JAX is the "posteam" also lists JAX as the "defteam".
The same pattern holds for 2009112204 (another JAX game)
The same pattern for 2015111504 (with LA in place of JAX)
I am not sure if this problem is specific to the .gz files.
The text was updated successfully, but these errors were encountered: