-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many wp variables with different values when running different number of games #183
Comments
For comparison now same thing but for the first 30 games instead of the first 10 library(tidyverse)
library(arsenal)
library(nflfastR)
ids <- fast_scraper_schedules(2020) %>% head(30) %>% pull(game_id)
pbp <- fast_scraper(ids) %>% decode_player_ids()
#> ✓ Download finished. Adding variables...
#> ✓ added game variables
#> ✓ added nflscrapR variables
#> ✓ added ep variables
#> ✓ added air_yac_ep variables
#> ✓ added wp variables
#> ✓ added air_yac_wp variables
#> ✓ added cp and cpoe
#> ✓ added fixed drive variables
#> ✓ added series variables
#> ✓ Procedure completed.
#> ✓ Decoding of player ids completed
data_repo <- load_pbp(2020) %>%
filter(game_id %in% ids) %>%
select_at(vars(names(pbp)))
s <- summary(comparedf(data_repo, pbp))
s$diffs.byvar.table %>% filter(n>0)
#> var.x var.y n NAs
#> 1 desc desc 1 0
#> 2 vegas_wpa vegas_wpa 2 1
#> 3 vegas_home_wpa vegas_home_wpa 2 1
#> 4 vegas_wp vegas_wp 2 1
d <- diffs(s)
d %>% head(30)
#> var.x var.y ..row.names.. values.x values.y row.x
#> 1 desc desc 2563 (15:00) .... (15:00) .... 2563
#> 2 vegas_wpa vegas_wpa 4955 0.362151.... 0.350065.... 4955
#> 3 vegas_wpa vegas_wpa 5488 0.267549.... NA 5488
#> 4 vegas_home_wpa vegas_home_wpa 4955 -0.36215.... -0.35006.... 4955
#> 5 vegas_home_wpa vegas_home_wpa 5488 -0.26754.... NA 5488
#> 6 vegas_wp vegas_wp 4955 0.637848.... 0.649934.... 4955
#> 7 vegas_wp vegas_wp 5488 0.267549.... NA 5488
#> row.y
#> 1 2563
#> 2 4955
#> 3 5488
#> 4 4955
#> 5 5488
#> 6 4955
#> 7 5488 Created on 2021-02-13 by the reprex package (v1.0.0) |
Have to leave computer soon but to fix the END GAME line I think we just need something like this for |
The win prob issue on PAT issue is different and I have no idea what's causing it but it also is a very small difference |
Since number of different values gets smaller if the number of games increases I think the PATs somehow interfere with each other |
I added a |
Dropping some code here in case I want to investigate this at some point. For now we don't care as the wp values differ less than 0.3 percentage points library(tidyverse)
library(arsenal)
library(nflfastR)
progressr::handlers(global = TRUE)
dat <- load_pbp(2020)
ids <- fast_scraper_schedules(2020) %>% slice(1:10) %>% pull(game_id)
pbp <- fast_scraper(ids) %>% decode_player_ids() %>% filter(desc != "END GAME")
data_repo <- dat %>%
filter(game_id %in% ids) %>%
select_at(vars(names(pbp))) %>%
filter(desc != "END GAME")
s <- summary(comparedf(data_repo, pbp))
# s$diffs.byvar.table %>% filter(n>0)
d <- diffs(s)
# d %>% head(30)
big <- d %>%
mutate(
data_repo = furrr::future_map_dbl(values.x, ~ ifelse(is.numeric(.x), .x, 0)),
small_sample = furrr::future_map_dbl(values.y, ~ ifelse(is.numeric(.x), .x, 0)),
diff_abs = abs(data_repo - small_sample),
diff = diff_abs %>% scales::percent(accuracy = 0.01)
) %>%
# filter(diff_abs > 0.1/100) %>%
select(var = var.x, row = ..row.names.., data_repo:diff) %>%
filter(var != "desc") %>%
arrange(desc(diff_abs))
big
pbp %>% slice(big$row) %>% select(game_id, play_id, desc)
# pbp %>% slice(499) %>% select(game_id, play_id, desc, penalty) %>% view()
unique(pbp$game_id[big$row])
|
To do
|
Created on 2021-02-13 by the reprex package (v1.0.0)
The text was updated successfully, but these errors were encountered: