-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AFLTables Extract Has Fewer Unique IDs Than Debutants On AFLTables #72
Comments
The difference are due to mis-codings. This following accounts for the 7 player discrepancy that @TonyCorke highlighted:
The following are inconsistencies between the names used in the FitRoy package and AFL Tables. They don't in affect the statistics, but may cause problems if the data sitting behind the player ids is ever rescraped.
|
This is fabulous @afableco. Thank you. Am I right that we're still one short of the seven we need though, as we get from the changes:
So that's +7 and -1 for a net gain of 6. Or, have I misinterpreted your explanation? |
Sadly, you are correct. I forgot to net off Archie Richardson. I will try and get back to this on the weekend to see if I can work out who else is missing. |
No rush at all - and thank you for looking at the issue I raised so quickly! |
The answer is Tom Darcy. In my original note, I had that Jim Darcy (ID 4318) should have been Tom Darcy, but it seems they are two separate people. Tom played for South Melbourne had his first game 1904-09-03, and Jim played for Essendon and had his first game 1897-05-08. There are other issues with the data (eg Cam Rayner is recorded as Heber Quinton in 2018). |
Perfect! Thanks again. Below is some code that can be used to patch the data: library(fitzRoy) dat <- get_afltables_stats(start_date = "1897-05-01", end_date = "2019-05-21") Fix Arthur Davidson (recorded as Alex Davidson)dat$ID[dat$ID == 4350 & dat$Playing.for == "Fitzroy" & dat$Season == 1898 & dat$Round %in% c(7,10)] = 15000 Fix George McLeod (there were two)dat$ID[dat$First.name == "George" & dat$Surname == "McLeod" & dat$Playing.for == "St Kilda" & dat$Season == 1903] = 15001 Fix Archie Richardson (three different guys)dat$ID[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$ID[dat$First.name == "Archie" & dat$Surname == "Richardson" & dat$ID[dat$First.name == "Archie" & dat$Surname == "Richardson" & Fix Jack Dorgan (recorded as Jim Dorgan)dat$ID[dat$First.name == "Jim" & dat$Surname == "Dorgan" & dat$Season == 1949] = 15005 Fix Walter Johnston (recorded as Alex Johnston)dat$ID[dat$First.name == "Alex" & dat$Surname == "Johnston" & Fix Tom Darcy (recorded as Jim)dat$ID[dat$First.name == "Jim" & dat$Surname == "Darcy" & |
Thanks heaps for all this guys. I'm going to try block out some time to focus on some of these in the coming weeks. I will need to work out which issues are to do with fitzRoy, versus which are to do with the underlying data on afltables.com. My general philosophy is to leave things as they appear on afltables.com and try get Paul who runs the website to fix it there. But some helper functions to clean the data may also be useful - will have to think about it! Thanks for all the work so far identifying them! |
fixed by #235 |
Please briefly describe your problem and what output you expect.
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.
According to AFLTables as at the end of R3 2019, there have been 12,710 debutants. There are only 12,703 unique IDs in the AFLTables extract.
Created on 2019-04-14 by the reprex package (v0.2.1)
The text was updated successfully, but these errors were encountered: