-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix wpa on end game line, fix pat wp and fantasy id decoding #230
Conversation
This doesn't pass Seb's check here but does pass |
Note that |
(because of era and roof adjustments)
Finally fixed the underlying issue of #183. The actual problem lives here https://github.com/mrcaseb/nflfastR/blob/66d0de88e9093e367cc44a2268191bfb5df334eb/R/helper_add_ep_wp.R#L649 We estimate the probability that the PAT will be made and instead of using the output vector of the bam we only use the first value. Since the fg_model uses model_roof and era as features, the pat_make_probs differ slightly (which is the reason why the overall differences are so small). Line 649 uses the first pat_make_prob of the complete dataset for all games. So if we use two datasets where the first pat_make_prob will differ we will get differing wp all across the complete dataset, which is bad imo. Therefore I deleted the part that uses the first prob and instead keep the complete pat_make_prob vector. @guga31bb please let me know if there was a good reason to do The following code shows that this fixes all our wp difference problems as all the wrong wpa values were only a result of wrong wp values. It also shows the differing pat_make_probs depending on which game is the first game in the dataset. For 2020 games only this results in a maximum difference of library(dplyr, warn.conflicts = FALSE)
library(arsenal)
library(nflfastR)
ids <- c("2020_01_CHI_DET", "2020_15_TB_ATL", "2020_17_TEN_HOU")
# v4.1.0
all_games <- build_nflfastR_pbp(ids)
#> -- Build nflfastR Play-by-Play Data ------------------ nflfastR version 4.1.0 --
#> * 11:12:48 | Start download of 3 games...
sample <- build_nflfastR_pbp(ids[-1])
#> -- Build nflfastR Play-by-Play Data ------------------ nflfastR version 4.1.0 --
#> * 11:13:02 | Start download of 2 games...
c <- comparedf(all_games %>% filter(game_id %in% sample$game_id), sample)
diffs(c) %>% head(20)
#> var.x var.y ..row.names.. values.x values.y row.x row.y
#> 1 wp wp 11 0.697947.... 0.697489.... 11 11
#> 2 wp wp 93 0.134893.... 0.134536.... 93 93
#> 3 wp wp 102 0.943005.... 0.942860.... 102 102
#> 4 wp wp 112 0.132840.... 0.132257.... 112 112
#> 5 wp wp 127 0.285401.... 0.284856.... 127 127
#> 6 wp wp 156 0.666679.... 0.665633.... 156 156
#> 7 wp wp 223 0.679398.... 0.679094.... 223 223
#> 8 wp wp 244 0.801211.... 0.800766.... 244 244
#> 9 wp wp 277 0.905423.... 0.905035.... 277 277
#> 10 wp wp 284 0.168142.... 0.167975.... 284 284
#> 11 wp wp 294 0.930354.... 0.930306.... 294 294
#> 12 wp wp 305 0.354060.... 0.353571.... 305 305
#> 13 wp wp 323 0.695493.... 0.694733.... 323 323
#> 14 wp wp 347 0.682081.... 0.680360.... 347 347
#> 15 def_wp def_wp 11 0.302052.... 0.302510.... 11 11
#> 16 def_wp def_wp 93 0.865106.... 0.865463.... 93 93
#> 17 def_wp def_wp 102 0.056994.... 0.057139.... 102 102
#> 18 def_wp def_wp 112 0.867159.... 0.867742.... 112 112
#> 19 def_wp def_wp 127 0.714598.... 0.715143.... 127 127
#> 20 def_wp def_wp 156 0.333320.... 0.334366.... 156 156
# load fixed code
devtools::load_all(nflfastR_path)
#> i Loading nflfastR
all_games <- build_nflfastR_pbp(ids)
#> -- Build nflfastR Play-by-Play Data ------------- nflfastR version 4.1.0.9001 --
#> * 11:13:07 | Start download of 3 games...
sample <- build_nflfastR_pbp(ids[-1])
#> -- Build nflfastR Play-by-Play Data ------------- nflfastR version 4.1.0.9001 --
#> * 11:13:12 | Start download of 2 games...
c <- comparedf(all_games %>% filter(game_id %in% sample$game_id), sample)
diffs(c) %>% head(20)
#> [1] var.x var.y ..row.names.. values.x values.y
#> [6] row.x row.y
#> <0 Zeilen> (oder row.names mit Länge 0)
# what is going on?
fg_probs <- function(pbp_data) {
make_pat_prob <- as.numeric(
mgcv::predict.bam(
fastrmodels::fg_model,
newdata = pbp_data %>%
make_model_mutations() %>%
prepare_wp_data() %>%
dplyr::mutate(yardline_100 = if_else(.data$season >= 2015, 15, 3)),
type = "response"
)
)
make_pat_prob[1]
}
fg_probs(all_games)
#> [1] 0.9488284
fg_probs(sample)
#> [1] 0.9314286 Created on 2021-03-24 by the reprex package (v1.0.0) |
I found another small bug but didn't want to modify too much before we talked about it: Example from 2020
Can't we just use I think the 2pt search actually finds the correct plays but it gets overwritten here Or we modify the matches with something like pat_i <- pat_i[!pat_i %in% two_pt_i] |
No good reason, great catch! I must have been doing something dumb like testing on just one play rather than the normal df. |
I think these columns get messed up by plays with penalty which is why we have this additional search.
Yes I think this is a good solution! |
Added this now |
No description provided.