Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify Schedule.json accuracy #125

Open
derek-adair opened this issue Apr 29, 2020 · 10 comments
Open

Verify Schedule.json accuracy #125

derek-adair opened this issue Apr 29, 2020 · 10 comments

Comments

@derek-adair
Copy link
Owner

derek-adair commented Apr 29, 2020

Literally just comb over schedule.json and make sure it reflects the games that have happened so far.

@andrew-shackelford
Copy link

Hey, I'm happy to help take a look at this. Is there a specific ground truth source you want to compare it to?

@derek-adair
Copy link
Owner Author

@derek-adair
Copy link
Owner Author

I will likely fold anything you contribute into the verify-data script i intend to package w/ this project. It will scan for this type of thing. If you have any ideas on how that might work I'd be happy to entertain any suggestions.

@andrew-shackelford
Copy link

Okay, I've run through all the data and found the following:

  1. The 'meridiem' field on some games says 'AM' when it should be 'PM'
  2. One game (Jets vs. Bills, November 24, 2014) has the wrong day (23 instead of 24) -- this I believe is due to an incorrect game id on the NFL's side
  3. All pro bowl games are missing (if that matters)
  4. 93 preseason games are missing
  5. 8 regular season games are missing: all occur within the week leading up to Christmas, I'm unsure if that has anything to do with it but it seemed like an odd pattern

As for folding it into a script, I'm happy to help out with that and share the (somewhat messy) code I've used to check the games and edit it as needed; as you know I've just started contributing to this project so am willing to follow your guidance and do whatever you think is best.

@ochawkeye
Copy link
Collaborator

  1. The 'meridiem' field on some games says 'AM' when it should be 'PM'

My role was always more on the support side, but this was one of the only bits of code I actually contributed...ooof - gut punch. Any idea where the logic is wrong?

https://github.com/derek-adair/nflgame/blob/master/nflgame/update_sched.py#L80-L109

@andrew-shackelford
Copy link

Okay, I've found two main things, and lost a bit of my sanity in the process:

1a) Quite often the games eid's are out of order in the preseason. For example:

{'away': 'SEA', 'day': 27, 'eid': '2011082758', 'gamekey': '55469', 'home': 'DEN', 'meridiem': 'AM', 'month': 8, 'season_type': 'PRE', 'time': '9:00', 'wday': 'Sat', 'week': 3, 'year': 2011}
{'away': 'SD', 'day': 27, 'eid': '2011082759', 'gamekey': '55470', 'home': 'ARI', 'meridiem': 'AM', 'month': 8, 'season_type': 'PRE', 'time': '10:00', 'wday': 'Sat', 'week': 3, 'year': 2011}
{'away': 'MIA', 'day': 27, 'eid': '2011082760', 'gamekey': '55459', 'home': 'TB', 'meridiem': 'PM', 'month': 8, 'season_type': 'PRE', 'time': '7:30', 'wday': 'Sat', 'week': 3, 'year': 2011}

In this example, game 55459 actually occurs 30 minutes before 55470 (and an hour and a half before 55469, for that matter), yet is numbered after both of them. This also has the secondary effect of incorrectly marking all games before it on that day as AM as well. Not sure what we can do about this if the NFL insists on numbering them incorrectly.

1b) Occasionally (7 times), the same occurs in the postseason. For example:

{'away': 'HOU', 'day': 14, 'eid': '2017011401', 'gamekey': '57162', 'home': 'NE', 'meridiem': 'PM', 'month': 1, 'season_type': 'POST', 'time': '8:15', 'wday': 'Sat', 'week': 2, 'year': 2016}
{'away': 'PIT', 'day': 15, 'eid': '2017011500', 'gamekey': '57163', 'home': 'KC', 'meridiem': 'AM', 'month': 1, 'season_type': 'POST', 'time': '8:20', 'wday': 'Sun', 'week': 2, 'year': 2016}
{'away': 'GB', 'day': 15, 'eid': '2017011501', 'gamekey': '57164', 'home': 'DAL', 'meridiem': 'PM', 'month': 1, 'season_type': 'POST', 'time': '4:40', 'wday': 'Sun', 'week': 2, 'year': 2016}

This could probably be fixed by moving the logic on lines 108 and 109 up to before the rest of the logic, as to my knowledge, games in the postseason are always in the afternoon, at least in Eastern Time.

1c) This also occurred in the regular season for games 57302 and 55023, the former I think when a game was flexed to Sunday night, and the latter some weirdness when there were two Monday night games that kicked off an hour apart, for which I have no idea why. These are probably small enough edge cases to not worry about.

  1. There's a small bug when parsing games that start at 12:30 (or any time that starts with a 12), since 12:30 PM actually occurs before 1:00 PM. This could be fixed easily by updating lines 99 and 102 to fix the "earlier" and "later" logic. I'm happy to submit a PR to do this, or let you take a crack at it if you'd like.

So overall, one of the bugs is easily fixable, but the other one is more inherent to using eid's as a way of checking the order of games when the NFL doesn't always follow that rule. Thankfully, most of the problems occur in the preseason, and the postseason can just be easily hardcoded, so we can mitigate it pretty well that way, however we may want to look and see if there's a different way to do it (I have no clue having just started contributing to the project).

@derek-adair
Copy link
Owner Author

Thanks for your hard work! Any of the scripts you used to verify this would be great! Definitely start a PR draft.

  • I'm not concerned with the NFL is supplying incorrect data; nothing we can do. Are these games w/ improper EID's returning game data? It's very possible they are not as vigilant about keeping the data up-to-date in the pre season.

  • The new schedule api has a 24 hour clock. This would allow us to clean up the meridiem logic. I am considering adding the ISO time instead of "time".

  • Pro bowl games is now a feature request - ProBowl not included #129


Again thanks for your detective work. If @andrew-shackelford or @ochawkeye wanna toss a PR against the dev branch pulling from the new api w/ the 24 hour clock I'd be happy to get it in. Otherwise I can probably do this some time in the next week.

Also @andrew-shackelford definitely submit a PR w/ your scripts and we'll figure out how to get you some contribution credit.

@derek-adair
Copy link
Owner Author

derek-adair commented May 13, 2020

Also a gist w/ the different eid's w/ improper data for me to investigate may help. Something like...

regular_season_pre_christmis = [ //array of eids missing the week before christmas ]
missing_pre_season = [ //array of eids missing preseason ] 
improper_times = [ //aray of eids w/ wrong time and/or  eid ]

@andrew-shackelford
Copy link

Okay, I just sent in PR #131 . I wasn't entirely sure how you wanted it added to the library, so please let me know if there are any changes you want made and I'm happy to make them. I had to include the feeds-rs files in the commit as it looks like the NFL has revoked access to them.

Also, I caught one error I had made which is that by using a lazy try/except loop I had marked some games as missing when it was actually just that their meridiem was missing. It turns out all games are present except for the pro bowl games, so that's good.

@derek-adair
Copy link
Owner Author

This is correct, the NFL has yanked all of the feeds. This will be on hold until I can make some adjustments to how this whole thing works without that feed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants