Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix wpa on end game line, fix pat wp and fantasy id decoding #230

Merged
merged 11 commits into from
Mar 28, 2021
Merged

Conversation

guga31bb
Copy link
Member

No description provided.

@guga31bb
Copy link
Member Author

This doesn't pass Seb's check here but does pass compare_dfs.R when I've tried it

@guga31bb
Copy link
Member Author

Note that wp and home_wp differ from data repo since I filled up like we do for the vegas ones, so the data repo has some NA that are filled in by this PR

@mrcaseb
Copy link
Member

mrcaseb commented Mar 24, 2021

Finally fixed the underlying issue of #183.

The actual problem lives here https://github.com/mrcaseb/nflfastR/blob/66d0de88e9093e367cc44a2268191bfb5df334eb/R/helper_add_ep_wp.R#L649

We estimate the probability that the PAT will be made and instead of using the output vector of the bam we only use the first value. Since the fg_model uses model_roof and era as features, the pat_make_probs differ slightly (which is the reason why the overall differences are so small).

Line 649 uses the first pat_make_prob of the complete dataset for all games. So if we use two datasets where the first pat_make_prob will differ we will get differing wp all across the complete dataset, which is bad imo.

Therefore I deleted the part that uses the first prob and instead keep the complete pat_make_prob vector.

@guga31bb please let me know if there was a good reason to do make_pat_prob <- make_pat_prob[1] so I don't overseee anything.

The following code shows that this fixes all our wp difference problems as all the wrong wpa values were only a result of wrong wp values.

It also shows the differing pat_make_probs depending on which game is the first game in the dataset. For 2020 games only this results in a maximum difference of 0.01739978

library(dplyr, warn.conflicts = FALSE)
library(arsenal)
library(nflfastR)

ids <- c("2020_01_CHI_DET", "2020_15_TB_ATL", "2020_17_TEN_HOU")

# v4.1.0
all_games <- build_nflfastR_pbp(ids)
#> -- Build nflfastR Play-by-Play Data ------------------ nflfastR version 4.1.0 --
#> * 11:12:48 | Start download of 3 games...

sample <- build_nflfastR_pbp(ids[-1])
#> -- Build nflfastR Play-by-Play Data ------------------ nflfastR version 4.1.0 --
#> * 11:13:02 | Start download of 2 games...

c <- comparedf(all_games %>% filter(game_id %in% sample$game_id), sample)
diffs(c) %>% head(20)
#>     var.x  var.y ..row.names..     values.x     values.y row.x row.y
#> 1      wp     wp            11 0.697947.... 0.697489....    11    11
#> 2      wp     wp            93 0.134893.... 0.134536....    93    93
#> 3      wp     wp           102 0.943005.... 0.942860....   102   102
#> 4      wp     wp           112 0.132840.... 0.132257....   112   112
#> 5      wp     wp           127 0.285401.... 0.284856....   127   127
#> 6      wp     wp           156 0.666679.... 0.665633....   156   156
#> 7      wp     wp           223 0.679398.... 0.679094....   223   223
#> 8      wp     wp           244 0.801211.... 0.800766....   244   244
#> 9      wp     wp           277 0.905423.... 0.905035....   277   277
#> 10     wp     wp           284 0.168142.... 0.167975....   284   284
#> 11     wp     wp           294 0.930354.... 0.930306....   294   294
#> 12     wp     wp           305 0.354060.... 0.353571....   305   305
#> 13     wp     wp           323 0.695493.... 0.694733....   323   323
#> 14     wp     wp           347 0.682081.... 0.680360....   347   347
#> 15 def_wp def_wp            11 0.302052.... 0.302510....    11    11
#> 16 def_wp def_wp            93 0.865106.... 0.865463....    93    93
#> 17 def_wp def_wp           102 0.056994.... 0.057139....   102   102
#> 18 def_wp def_wp           112 0.867159.... 0.867742....   112   112
#> 19 def_wp def_wp           127 0.714598.... 0.715143....   127   127
#> 20 def_wp def_wp           156 0.333320.... 0.334366....   156   156

# load fixed code
devtools::load_all(nflfastR_path)
#> i Loading nflfastR
all_games <- build_nflfastR_pbp(ids)
#> -- Build nflfastR Play-by-Play Data ------------- nflfastR version 4.1.0.9001 --
#> * 11:13:07 | Start download of 3 games...

sample <- build_nflfastR_pbp(ids[-1])
#> -- Build nflfastR Play-by-Play Data ------------- nflfastR version 4.1.0.9001 --
#> * 11:13:12 | Start download of 2 games...

c <- comparedf(all_games %>% filter(game_id %in% sample$game_id), sample)
diffs(c) %>% head(20)
#> [1] var.x         var.y         ..row.names.. values.x      values.y     
#> [6] row.x         row.y        
#> <0 Zeilen> (oder row.names mit Länge 0)

# what is going on?
fg_probs <- function(pbp_data) {
  make_pat_prob <- as.numeric(
    mgcv::predict.bam(
      fastrmodels::fg_model,
      newdata = pbp_data %>%
        make_model_mutations() %>%
        prepare_wp_data() %>%
        dplyr::mutate(yardline_100 = if_else(.data$season >= 2015, 15, 3)),
      type = "response"
    )
  )
  make_pat_prob[1]
}

fg_probs(all_games)
#> [1] 0.9488284
fg_probs(sample)
#> [1] 0.9314286

Created on 2021-03-24 by the reprex package (v1.0.0)

@mrcaseb
Copy link
Member

mrcaseb commented Mar 24, 2021

I found another small bug but didn't want to modify too much before we talked about it:
The below index search for pats and two_pts can lead to duplicated matches, e.g. for TWO POINT attempts that start with (Kick formation)

https://github.com/mrcaseb/nflfastR/blob/66d0de88e9093e367cc44a2268191bfb5df334eb/R/helper_add_ep_wp.R#L653-L673

Example from 2020

# A tibble: 6 x 3
  game_id        play_id desc                                                                                         
  <chr>            <dbl> <chr>                                                                                        
1 2020_03_SF_NYG    3011 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 6-M.Wishnowsky pass is incomplete. ATTEMPT FA~
2 2020_04_CLE_D~    4451 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 89-S.Carlson rushes left end. ATTEMPT SUCCEED~
3 2020_08_TEN_C~    3784 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 6-B.Kern pass to 94-J.Crawford is incomplete.~
4 2020_10_MIN_C~    3186 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 2-B.Colquitt pass to 82-K.Rudolph is incomple~
5 2020_15_PHI_A~    3766 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 86-Z.Ertz rushes up the middle. ATTEMPT FAILS.
6 2020_19_LA_GB     1504 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 6-J.Scott pass to 2-M.Crosby is complete. ATT~

Can't we just use extra_point_attempt == 1 and two_point_attempt == 1 to find those plays?

I think the 2pt search actually finds the correct plays but it gets overwritten here
https://github.com/mrcaseb/nflfastR/blob/66d0de88e9093e367cc44a2268191bfb5df334eb/R/helper_add_ep_wp.R#L718-L719

Or we modify the matches with something like

pat_i <- pat_i[!pat_i %in% two_pt_i] 

@guga31bb
Copy link
Member Author

@guga31bb please let me know if there was a good reason to do make_pat_prob <- make_pat_prob[1] so I don't overseee anything.

No good reason, great catch! I must have been doing something dumb like testing on just one play rather than the normal df.

@guga31bb
Copy link
Member Author

Can't we just use extra_point_attempt == 1 and two_point_attempt == 1 to find those plays?

I think these columns get messed up by plays with penalty which is why we have this additional search.

Or we modify the matches with something like

pat_i <- pat_i[!pat_i %in% two_pt_i] 

Yes I think this is a good solution!

@mrcaseb
Copy link
Member

mrcaseb commented Mar 24, 2021

Or we modify the matches with something like

pat_i <- pat_i[!pat_i %in% two_pt_i]

Added this now

@mrcaseb mrcaseb changed the title fix wpa on end game line fix wpa on end game line, fix pat wp and fantasy id decoding Mar 28, 2021
@mrcaseb mrcaseb merged commit 5af7e30 into master Mar 28, 2021
@mrcaseb mrcaseb deleted the wp branch March 28, 2021 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants