Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plays with Laterals Will Not Have the Correct Yardage Gained #216

Closed
JoeMarino2021 opened this issue Mar 11, 2021 · 11 comments · Fixed by #221
Closed

Plays with Laterals Will Not Have the Correct Yardage Gained #216

JoeMarino2021 opened this issue Mar 11, 2021 · 11 comments · Fixed by #221

Comments

@JoeMarino2021
Copy link

For plays with laterals, the yardage gained is often incorrect. Notice the following 2nd quarter play from the CLE at CIN game in 2020 (2020_07_CLE_CIN).

(10:26) (Shotgun) 6-B.Mayfield pass short middle to 80-J.Landry to CLE 47 for -6 yards. Lateral to 27-K.Hunt to 50 for 3 yards (21-M.Alexander; 90-K.Kareem)

This play is listed as a 0 yard gain, when it should have been listed as a 3 yard loss.

Also, here is a desperation play by Houston at the end of their game with Cincinnati this year (2020_16_CIN_HOU).

(:09) (Shotgun) 4-D.Watson pass short left to 17-C.Hansen to HOU 38 for 8 yards. Lateral to 16-K.Coutee to HOU 29 for -9 yards. Lateral to 4-D.Watson to HOU 25 for -4 yards. Lateral to 13-B.Cooks to HOU 45 for 20 yards. Lateral to 74-M.Scharping to CIN 48 for 7 yards. Lateral to 67-C.Heck to CIN 40 for 8 yards (23-D.Phillips). FUMBLES (23-D.Phillips), ball out of bounds at CIN 40.

This play is listed as an 8 yard gain. It gained 30 yards (43 yards of forward progress less 13 yards of backwards movement).

The yardline_100 info appears correct (not thoroughly verified yet), so "internal" plays like the CLE example above might be detectable/correctable by sanity checks with that data. The final plays of games won't have a yardline_100 available on the next play, so that may have to be dealt with differently. That play description text may have to be parsed to find all of the gains and losses.

My main concern with this information is trying to calculate yards per play for each team. These discrepancies in the yards gained information cause my yards per play to be inaccurate. If you could suggest a workaround for calculating yards per play for teams while the yards gained on a play with laterals is corrected, I would very much be grateful.

Thanks again for reading my report, and for creating the most magnificent R tool. Have a tremendous day!

OFF TOPIC QUESTION: I am new to the tool. I haven't used it in season yet. How soon after completion of a game is data usually available?

@guga31bb
Copy link
Member

Play in question

> nflfastR::load_pbp(2020) %>% 
   filter(game_id == "2020_07_CLE_CIN", play_id == 1334) %>% 
   select(play_id, yards_gained, receiving_yards, lateral_receiving_yards, passing_yards)
# A tibble: 1 x 5
  play_id yards_gained receiving_yards lateral_receiving_yards passing_yards
    <dbl>        <dbl>           <dbl>                   <dbl>         <dbl>
1    1334            0              -3                       0            -3

Maybe we could fix yards_gained by using the lateral yards on plays with laterals.

How soon after completion of a game is data usually available?

Usually 20 minutes or so using the R package, and rebuilt overnight for data repo.

@mrcaseb
Copy link
Member

mrcaseb commented Mar 11, 2021

Maybe we could fix yards_gained by using the lateral yards on plays with laterals.

That would be wrong especially on plays with multiple laterals.

In my opinion we have to split up by pass and rush plays.

  • For pass plays the overall yards gained on a play will always be passing_yards (I am very sure that this is correct)
  • For rush plays that's not true. Instead we have to add rushing_yards and all instances (if there are more than one) of lateral_rushing_yards. Currently we cannot catch multiple instances of lateral_rushing_yards because it gets overwritten for every stat ID 12/13. However, we can see in this file that there wasn't a single play in the complete nflfastR era where there were more than one instance of lateral_rushing_yards. That's good news.
library(tidyverse)

pbp_db %>% 
  filter(pass == 1, passing_yards != yards_gained) %>% 
  select(game_id, play_id, yards_gained, passing_yards, receiving_yards, lateral_receiving_yards, desc) %>% 
  collect() %>% 
  mutate(lateral = str_detect(tolower(desc), fixed("lateral")))
#> # A tibble: 188 x 8
#>    game_id      play_id yards_gained passing_yards receiving_yards lateral_receiving_ya~ desc                                                                                                    lateral
#>    <chr>          <dbl>        <dbl>         <dbl>           <dbl>                 <dbl> <chr>                                                                                                   <lgl>  
#>  1 1999_04_CAR~    4569           12            14               5                    12 (:03) S.Beuerlein pass to M.Muhammad to CAR 31 for 5 yards. Lateral to A.Johnson to CAR 28 for -3 yard~ TRUE   
#>  2 1999_04_TEN~    3968           22            25               3                    22 (2:25) (Shotgun) N.O'Donnell pass to M.Roan to TEN 46 for 3 yards. Lateral to E.George ran ob at SF 32~ TRUE   
#>  3 1999_05_CHI~    3887            2            12              10                     2 (4:50) R.Cunningham pass to R.Moss to CHI 40 for 10 yards. Lateral to C.Carter ran ob at CHI 38 for 2 ~ TRUE   
#>  4 1999_16_DEN~    3870            5            17              12                     5 (:14) C.Batch pass to H.Moore to DET 40 for 12 yards. Lateral to G.Crowell to DET 45 for 5 yards (C.Wa~ TRUE   
#>  5 2000_03_MIN~     756            0            27              27                     0 (1:30) D.Culpepper pass to C.Carter to NE 36 for 27 yards. Lateral to R.Moss ran ob at NE 36 for no ga~ TRUE   
#>  6 2000_06_TB_~    3525           26            30               4                    26 (:49) (Shotgun) S.King pass to D.Moore to TB 27 for 4 yards. Lateral to W.Dunn pushed ob at MIN 47 for~ TRUE   
#>  7 2000_08_DET~    3774            8            12               4                     8 (1:27) S.King pass to D.Moore to TB 48 for 4 yards. Lateral to W.Dunn ran ob at DET 44 for 8 yards (R.~ TRUE   
#>  8 2000_09_STL~    3211           11            18               7                    11 (15:00) T.Green pass to I.Bruce to STL 49 for 7 yards. Lateral to R.Holcombe to SF 40 for 11 yards (A.~ TRUE   
#>  9 2000_10_CAR~    3536            3            19              16                     3 (:07) T.Green pass to I.Bruce to STL 44 for 16 yards. Lateral to A.Hakim to STL 47 for 3 yards (E.Robi~ TRUE   
#> 10 2000_10_KC_~    4270            6            17              11                     6 (:28) E.Grbac pass to S.Morris to OAK 35 for 11 yards. Lateral to T.Richardson to OAK 29 for 6 yards (~ TRUE   
#> # ... with 178 more rows

pbp_db %>% 
  filter(rush == 1, rushing_yards != yards_gained) %>% 
  select(game_id, play_id, yards_gained, rushing_yards, lateral_rushing_yards, desc) %>% 
  collect() %>% 
  mutate(lateral = str_detect(tolower(desc), fixed("lateral")))
#> # A tibble: 31 x 7
#>    game_id      play_id yards_gained rushing_yards lateral_rushing_ya~ desc                                                                                                                      lateral
#>    <chr>          <dbl>        <dbl>         <dbl>               <dbl> <chr>                                                                                                                     <lgl>  
#>  1 2000_17_TB_~    1654           11             8                  11 (1:50) (Shotgun) S.King left end to TB 44 for 8 yards. Lateral to W.Dunn to GB 45 for 11 yards (B.Harris, C.Hunt).        TRUE   
#>  2 2001_06_STL~    1523           44            12                  44 (3:22) 81-A.Hakim right end to NYJ 44 for 12 yards. Lateral to 24-T.Canidate for 44 yards, TOUCHDOWN. Warner hands to #8~ TRUE   
#>  3 2001_15_NYJ~    3630           11            12                  11 (1:37) 16-V.Testaverde up the middle to IND 32 for 12 yards. Lateral to 28-C.Martin to IND 21 for 11 yards.               TRUE   
#>  4 2002_10_HOU~     854            6             0                   6 (15:00) (Punt formation) 7-C.Stanley Aborted. 80-S.McDermott FUMBLES at HOU 37, recovered by HOU-7-C.Stanley at HOU 23. ~ FALSE  
#>  5 2005_02_JAX~     605            0             4                   0 (3:42) 28-F.Taylor up the middle to IND 39 for 4 yards. Lateral to 18-M.Jones to IND 36 for 3 yards. FUMBLES, and recove~ TRUE   
#>  6 2006_03_GB_~    1501            5             1                   5 (6:10) 83-A.Hakim right end to DET 45 for 1 yard. Lateral to 29-B.Calhoun to 50 for 5 yards (22-M.Manuel).                TRUE   
#>  7 2006_10_NO_~    3658            4             0                   4 (8:09) 26-D.McAllister Aborted. 52-J.Faine FUMBLES at PIT 4, recovered by NO-26-D.McAllister at PIT 4. 26-D.McAllister f~ FALSE  
#>  8 2007_04_STL~    1991            4             0                   4 (:56) (Shotgun) 9-T.Romo Aborted. 65-A.Gurode FUMBLES at STL 50, recovered by DAL-9-T.Romo at DAL 17. 9-T.Romo ran ob at~ FALSE  
#>  9 2007_15_IND~    1923            0             5                   0 (:01) (Shotgun) 12-J.McCown up the middle to OAK 48 for 5 yards. Lateral to 33-D.Rhodes to OAK 48 for no gain. FUMBLES, ~ TRUE   
#> 10 2008_08_OAK~    1678           19             2                  19 (2:00) (Shotgun) 10-T.Smith right end to BAL 46 for 2 yards. Lateral to 27-R.Rice to OAK 35 for 19 yards (31-H.Eugene).   TRUE   
#> # ... with 21 more rows

Created on 2021-03-11 by the reprex package (v1.0.0)

Solution

So my suggestion is

yards_gained = dplyr::case_when(
  !is.na(.data$passing_yards) & 
    .data$yards_gained != .data$passing_yards & 
    .data$penalty == 0 ~ .data$passing_yards,
  !is.na(.data$rushing_yards) & 
    !is.na(.data$lateral_rushing_yards) & 
    .data$yards_gained != .data$rushing_yards & 
    .data$penalty == 0 ~ .data$rushing_yards + .data$lateral_rushing_yards,
  TRUE ~ yards_gained
)

and if there will be a play with multiple instances of lateral_rushing_yards in the future we would have to hard code it or add a second lateral_rushing_yards variable to tidy_play_stats

@mrcaseb
Copy link
Member

mrcaseb commented Mar 13, 2021

Here is code to scrape offensive yards from nfl.com for further checks

library(dplyr, warn.conflicts = FALSE)
options(dplyr.summarise.inform = FALSE)
options(tibble.print_min = 32)

passing <- rvest::read_html("https://www.nfl.com/stats/team-stats/offense/passing/2020/reg/all") %>% 
  rvest::html_table() %>% 
  purrr::pluck(1) %>% 
  dplyr::mutate(Team = stringr::str_extract(Team, ".+(?=\\n)")) %>% 
  janitor::clean_names() %>% 
  dplyr::select(team:cmp, pass_yds)

rushing <- rvest::read_html("https://www.nfl.com/stats/team-stats/offense/rushing/2020/reg/all") %>% 
  rvest::html_table() %>% 
  purrr::pluck(1) %>% 
  dplyr::mutate(Team = stringr::str_extract(Team, ".+(?=\\n)")) %>% 
  janitor::clean_names() %>% 
  dplyr::select(team:rush_yds)

passing %>% 
  dplyr::left_join(rushing, by = "team") %>% 
  dplyr::mutate(overall = pass_yds + rush_yds) %>% 
  dplyr::arrange(dplyr::desc(overall)) %>% 
  dplyr::select(team, overall)
#> # A tibble: 32 x 2
#>    team          overall
#>    <chr>           <int>
#>  1 Chiefs           6804
#>  2 Vikings          6548
#>  3 Titans           6516
#>  4 Bills            6509
#>  5 Packers          6417
#>  6 Cardinals        6339
#>  7 Chargers         6332
#>  8 Texans           6309
#>  9 Raiders          6299
#> 10 Cowboys          6299
#> 11 Buccaneers       6295
#> 12 Seahawks         6216
#> 13 Saints           6210
#> 14 49ers            6209
#> 15 Rams             6201
#> 16 Colts            6182
#> 17 Falcons          6152
#> 18 Browns           6075
#> 19 Ravens           5990
#> 20 Lions            5896
#> 21 Panthers         5833
#> 22 Eagles           5755
#> 23 Dolphins         5625
#> 24 Broncos          5591
#> 25 Bears            5572
#> 26 Steelers         5480
#> 27 Jaguars          5474
#> 28 Patriots         5470
#> 29 Bengals          5461
#> 30 Football Team    5407
#> 31 Giants           5104
#> 32 Jets             4798

Created on 2021-03-13 by the reprex package (v1.0.0)

@mrcaseb
Copy link
Member

mrcaseb commented Mar 13, 2021

For comparison numbers from ESPN. I have no idea why they are differing so heavy.

image

@JoeMarino2021
Copy link
Author

For comparison numbers from ESPN. I have no idea why they are differing so heavy.

image

I believe that the NFL passing page is showing gross passing yards (sack yards not deducted). When I subtract KC's sack yardage (151) from the 6804 that you reported with your scrape, you arrive at 6653. This is also the total offense for KC on pfref.com.

I strongly suspect if you transform the gross passing yards to net passing yards, the stats should line up. I feel like the only necessary work is to deduct sack yardage. (Sack yardage is on the far right of the NFL page you scraped). They really should better label what they are showing in that table.

The rushing yardage seems to agree from all three sources. In your screenshot I can see KC's rushing is 1799. It is also 1799 on NFL.com and pfref.com. Turns out rushing doesn't have a "gross rushing yardage", since they are deducting as they go along for every negative run.

@JoeMarino2021
Copy link
Author

Here is code to scrape offensive yards from nfl.com for further checks

Seeing all of this excellent code almost makes it worth asking the question, even if the bugs don't get squished! Thanks for showing me such useful functions like clean_names() and pluck(). I can't wait to try to put these new functions into my arsenal! I have been a pretty rookie scraper myself, but maybe now I can be a little better at it.

Thanks also for kindly working on this issue.

@JoeMarino2021
Copy link
Author

Solution

So my suggestion is

yards_gained = dplyr::case_when(
  !is.na(.data$passing_yards) & 
    .data$yards_gained != .data$passing_yards & 
    .data$penalty == 0 ~ .data$passing_yards,
  !is.na(.data$rushing_yards) & 
    !is.na(.data$lateral_rushing_yards) & 
    .data$yards_gained != .data$rushing_yards & 
    .data$penalty == 0 ~ .data$rushing_yards + .data$lateral_rushing_yards,
  TRUE ~ yards_gained
)

I am sorry if I was supposed to take this code and try it out. I can do that if you would like and see if it clears up my issue.

I wasn't sure if this was an open internal discussion between the creators of the tool, and I should wait for this to be implemented on your side, or I was supposed to incorporate this code into mine and expect the tool to remain unchanged.

QUICK CHECK LATER: Success! I cannot detect any problems with 2020 season data. All of my YPP calculations are now correct! All I need to know is whether I need to leave that code in my code, or will it eventually make it into nflfastR.

Thanks for the incredibly fast reply and sharp work! I wish people who charged for products/support were as responsive as you have been. What an incredible service you are providing to the community.

@mrcaseb
Copy link
Member

mrcaseb commented Mar 14, 2021

I am sorry if I was supposed to take this code and try it out.

Oh no worries, you are not supposed to use any of that code. I just dumped it in the issue for me and Ben.
I did quick total yardage summaries with my suggested solution and wasn't sure how to check the output. Therefore I wrote the scraper

@mrcaseb
Copy link
Member

mrcaseb commented Mar 14, 2021

As soon as I am sure that yards_gained works well now I will update nflfastR with it.

@JoeMarino2021
Copy link
Author

Oh no worries, you are not supposed to use any of that code

Phew! I was nervous for a sec! Thanks for posting the correction code anyway. That allows your users to use the corrections before the final release. It felt good to see the errors disappear when I inserted the code into mine. I will now leave that snippet in until the new nflfastR is released.

Thanks for the swift attention!

@mrcaseb
Copy link
Member

mrcaseb commented Mar 15, 2021

The data in the data repo has been updated with this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants