fix wpa on end game line, fix pat wp and fantasy id decoding #230

guga31bb · 2021-03-23T18:07:26Z

No description provided.

guga31bb · 2021-03-23T18:36:17Z

This doesn't pass Seb's check here but does pass compare_dfs.R when I've tried it

guga31bb · 2021-03-23T18:49:05Z

Note that wp and home_wp differ from data repo since I filled up like we do for the vegas ones, so the data repo has some NA that are filled in by this PR

(because of era and roof adjustments)

mrcaseb · 2021-03-24T10:28:16Z

Finally fixed the underlying issue of #183.

The actual problem lives here https://github.com/mrcaseb/nflfastR/blob/66d0de88e9093e367cc44a2268191bfb5df334eb/R/helper_add_ep_wp.R#L649

We estimate the probability that the PAT will be made and instead of using the output vector of the bam we only use the first value. Since the fg_model uses model_roof and era as features, the pat_make_probs differ slightly (which is the reason why the overall differences are so small).

Line 649 uses the first pat_make_prob of the complete dataset for all games. So if we use two datasets where the first pat_make_prob will differ we will get differing wp all across the complete dataset, which is bad imo.

Therefore I deleted the part that uses the first prob and instead keep the complete pat_make_prob vector.

@guga31bb please let me know if there was a good reason to do make_pat_prob <- make_pat_prob[1] so I don't overseee anything.

The following code shows that this fixes all our wp difference problems as all the wrong wpa values were only a result of wrong wp values.

It also shows the differing pat_make_probs depending on which game is the first game in the dataset. For 2020 games only this results in a maximum difference of 0.01739978

library(dplyr, warn.conflicts = FALSE)
library(arsenal)
library(nflfastR)

ids <- c("2020_01_CHI_DET", "2020_15_TB_ATL", "2020_17_TEN_HOU")

# v4.1.0
all_games <- build_nflfastR_pbp(ids)
#> -- Build nflfastR Play-by-Play Data ------------------ nflfastR version 4.1.0 --
#> * 11:12:48 | Start download of 3 games...

sample <- build_nflfastR_pbp(ids[-1])
#> -- Build nflfastR Play-by-Play Data ------------------ nflfastR version 4.1.0 --
#> * 11:13:02 | Start download of 2 games...

c <- comparedf(all_games %>% filter(game_id %in% sample$game_id), sample)
diffs(c) %>% head(20)
#>     var.x  var.y ..row.names..     values.x     values.y row.x row.y
#> 1      wp     wp            11 0.697947.... 0.697489....    11    11
#> 2      wp     wp            93 0.134893.... 0.134536....    93    93
#> 3      wp     wp           102 0.943005.... 0.942860....   102   102
#> 4      wp     wp           112 0.132840.... 0.132257....   112   112
#> 5      wp     wp           127 0.285401.... 0.284856....   127   127
#> 6      wp     wp           156 0.666679.... 0.665633....   156   156
#> 7      wp     wp           223 0.679398.... 0.679094....   223   223
#> 8      wp     wp           244 0.801211.... 0.800766....   244   244
#> 9      wp     wp           277 0.905423.... 0.905035....   277   277
#> 10     wp     wp           284 0.168142.... 0.167975....   284   284
#> 11     wp     wp           294 0.930354.... 0.930306....   294   294
#> 12     wp     wp           305 0.354060.... 0.353571....   305   305
#> 13     wp     wp           323 0.695493.... 0.694733....   323   323
#> 14     wp     wp           347 0.682081.... 0.680360....   347   347
#> 15 def_wp def_wp            11 0.302052.... 0.302510....    11    11
#> 16 def_wp def_wp            93 0.865106.... 0.865463....    93    93
#> 17 def_wp def_wp           102 0.056994.... 0.057139....   102   102
#> 18 def_wp def_wp           112 0.867159.... 0.867742....   112   112
#> 19 def_wp def_wp           127 0.714598.... 0.715143....   127   127
#> 20 def_wp def_wp           156 0.333320.... 0.334366....   156   156

# load fixed code
devtools::load_all(nflfastR_path)
#> i Loading nflfastR
all_games <- build_nflfastR_pbp(ids)
#> -- Build nflfastR Play-by-Play Data ------------- nflfastR version 4.1.0.9001 --
#> * 11:13:07 | Start download of 3 games...

sample <- build_nflfastR_pbp(ids[-1])
#> -- Build nflfastR Play-by-Play Data ------------- nflfastR version 4.1.0.9001 --
#> * 11:13:12 | Start download of 2 games...

c <- comparedf(all_games %>% filter(game_id %in% sample$game_id), sample)
diffs(c) %>% head(20)
#> [1] var.x         var.y         ..row.names.. values.x      values.y     
#> [6] row.x         row.y        
#> <0 Zeilen> (oder row.names mit Länge 0)

# what is going on?
fg_probs <- function(pbp_data) {
  make_pat_prob <- as.numeric(
    mgcv::predict.bam(
      fastrmodels::fg_model,
      newdata = pbp_data %>%
        make_model_mutations() %>%
        prepare_wp_data() %>%
        dplyr::mutate(yardline_100 = if_else(.data$season >= 2015, 15, 3)),
      type = "response"
    )
  )
  make_pat_prob[1]
}

fg_probs(all_games)
#> [1] 0.9488284
fg_probs(sample)
#> [1] 0.9314286

^{Created on 2021-03-24 by the reprex package (v1.0.0)}

mrcaseb · 2021-03-24T10:43:53Z

I found another small bug but didn't want to modify too much before we talked about it:
The below index search for pats and two_pts can lead to duplicated matches, e.g. for TWO POINT attempts that start with (Kick formation)

https://github.com/mrcaseb/nflfastR/blob/66d0de88e9093e367cc44a2268191bfb5df334eb/R/helper_add_ep_wp.R#L653-L673

Example from 2020

# A tibble: 6 x 3
  game_id        play_id desc                                                                                         
  <chr>            <dbl> <chr>                                                                                        
1 2020_03_SF_NYG    3011 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 6-M.Wishnowsky pass is incomplete. ATTEMPT FA~
2 2020_04_CLE_D~    4451 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 89-S.Carlson rushes left end. ATTEMPT SUCCEED~
3 2020_08_TEN_C~    3784 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 6-B.Kern pass to 94-J.Crawford is incomplete.~
4 2020_10_MIN_C~    3186 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 2-B.Colquitt pass to 82-K.Rudolph is incomple~
5 2020_15_PHI_A~    3766 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 86-Z.Ertz rushes up the middle. ATTEMPT FAILS.
6 2020_19_LA_GB     1504 (Kick formation) TWO-POINT CONVERSION ATTEMPT. 6-J.Scott pass to 2-M.Crosby is complete. ATT~

Can't we just use extra_point_attempt == 1 and two_point_attempt == 1 to find those plays?

I think the 2pt search actually finds the correct plays but it gets overwritten here
https://github.com/mrcaseb/nflfastR/blob/66d0de88e9093e367cc44a2268191bfb5df334eb/R/helper_add_ep_wp.R#L718-L719

Or we modify the matches with something like

pat_i <- pat_i[!pat_i %in% two_pt_i]

guga31bb · 2021-03-24T12:13:20Z

@guga31bb please let me know if there was a good reason to do make_pat_prob <- make_pat_prob[1] so I don't overseee anything.

No good reason, great catch! I must have been doing something dumb like testing on just one play rather than the normal df.

guga31bb · 2021-03-24T12:14:19Z

Can't we just use extra_point_attempt == 1 and two_point_attempt == 1 to find those plays?

I think these columns get messed up by plays with penalty which is why we have this additional search.

Or we modify the matches with something like
pat_i <- pat_i[!pat_i %in% two_pt_i] 

Yes I think this is a good solution!

mrcaseb · 2021-03-24T12:53:45Z

Or we modify the matches with something like
pat_i <- pat_i[!pat_i %in% two_pt_i]

Added this now

guga31bb added 2 commits March 23, 2021 14:07

fix wpa on end game line

18f3d2f

update comparison thing

1201b3c

guga31bb added 2 commits March 23, 2021 14:38

group

f0ecbd8

update comparison

3f6790b

mrcaseb added 5 commits March 23, 2021 20:56

fix fantasy id decoding

de7d5f5

create def_wp after wp is fixed for END GAME

1b2f5ca

unnecessary to do this

123007c

keep vector of make pat probs

fe7c762

(because of era and roof adjustments)

just remove the commented stuff

f48798c

This was linked to issues Mar 24, 2021

Many wp variables with different values when running different number of games #183

Closed

Player decoder doesn't decode fantasy_id #229

Closed

update NEWS

1a2dd29

remove 2pt matches from 1pt matches

c4eef03

mrcaseb changed the title ~~fix wpa on end game line~~ fix wpa on end game line, fix pat wp and fantasy id decoding Mar 28, 2021

mrcaseb merged commit 5af7e30 into master Mar 28, 2021

mrcaseb deleted the wp branch March 28, 2021 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix wpa on end game line, fix pat wp and fantasy id decoding #230

fix wpa on end game line, fix pat wp and fantasy id decoding #230

guga31bb commented Mar 23, 2021

guga31bb commented Mar 23, 2021

guga31bb commented Mar 23, 2021

mrcaseb commented Mar 24, 2021 •

edited

Loading

mrcaseb commented Mar 24, 2021

guga31bb commented Mar 24, 2021

guga31bb commented Mar 24, 2021

mrcaseb commented Mar 24, 2021

fix wpa on end game line, fix pat wp and fantasy id decoding #230

fix wpa on end game line, fix pat wp and fantasy id decoding #230

Conversation

guga31bb commented Mar 23, 2021

guga31bb commented Mar 23, 2021

guga31bb commented Mar 23, 2021

mrcaseb commented Mar 24, 2021 • edited Loading

mrcaseb commented Mar 24, 2021

guga31bb commented Mar 24, 2021

guga31bb commented Mar 24, 2021

mrcaseb commented Mar 24, 2021

mrcaseb commented Mar 24, 2021 •

edited

Loading