Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancies between COVID19_deaths.csv and tomwhite/covid-19-uk-data #8

Closed
tomwhite opened this issue Mar 27, 2020 · 18 comments
Closed

Comments

@tomwhite
Copy link

I'm comparing COVID19_deaths.csv with the data I've been collating on https://github.com/tomwhite/covid-19-uk-data, and came across a few discrepancies.

Otherwise all the figures match. One thing this shows is that inferring the figures for England is not straightforward, since we don't know if the reporting times match up.

I'd be interested to see what you think, and if you have any more information about what the reported figures were for these days. Thanks again for all your work on collecting the data.

@andrewjmdata
Copy link

andrewjmdata commented Mar 27, 2020 via email

@tomwhite
Copy link
Author

Thanks Andrew. I haven't compared the numbers with Johns Hopkins data, but as you say there's likely to be discrepancies there too.

@emmadoughty
Copy link
Owner

Hi Tom and Andrew,

I really appreciate you getting in contact about this. As I'm sure you know yourself, reporting methods from the public health agencies have been somewhat inconsistent making collation tricky. I've had a look at the issues you raised and checked my own sources:

  • Re NI deaths on 24/03/2020: I took my data from the surveillance reports here (https://www.publichealth.hscni.net/publications/covid-19-surveillance-reports). Presumably, a fifth death that occurred on the 24th was reported on twitter but didn't make it into a surveillance report until the next day. I'm going to stick with the sources I mention on my README page for this collation but I think it's a matter of preference on how to collate the data
  • Re deaths in Wales on 17/03/2020, I see I have included the report of the second person's death on the 17th whilst you have it on the 18th. I've taken my info from here (https://gov.wales/second-person-wales-dies-after-contracting-covid-19) reporting this person's death on the 17th; maybe you had another source.
  • Re deaths in Wales on 21/03/2020, I completely see your point. I added the two further deaths reported on twitter to 21/03 but for consistency should stick to the numbers of deaths reported on the Welsh government site. I have amended the number of deaths for that day to 3 to reflect that.

If you do notice any other discrepancies, please do get in contact. Collating this data has been a very manual process owing to the number of sources, their reporting formats and frequently changing web locations so I can easily make mistakes or fail to be consistent and I am glad to be kept accountable!

@tomwhite
Copy link
Author

Hi Emma,

Thanks for the details. I agree with you on all of these points. I've amended the NI 24/03/2020 number, since looking back at the report it says there were 2 new deaths that day (from 2 to 4). I've also amended Wales on 17/03/2020 to 2.

It looks like you need to add 2 to England for 21/03/2020 to make the numbers add up correctly (since you subtracted 2 from Wales for that day).

After all these changes I think our numbers are consistent!

@emmadoughty
Copy link
Owner

Yes, you're right! Thanks!

@tomwhite
Copy link
Author

@emmadoughty I just noticed another discrepancy: England on 21/03/2020 in https://github.com/emmadoughty/Daily_COVID-19/blob/master/Data/COVID19_deaths.csv should be 222 not 218 to get correct totals.

The good news is that PHE is publishing number of deaths for England now (and other UK countries).

@emmadoughty
Copy link
Owner

Amended! Thanks, Tom. Yeah, I'm really glad to see their own reporting improving!

@tomwhite
Copy link
Author

Going through Scottish case numbers I noticed more discrepancies (hope you don't mind me reporting them here): 2020-03-07 should be 16 cases (not 11), 2020-03-08 should be 18 (not 16). See the raw scraped files https://github.com/tomwhite/covid-19-uk-data/blob/master/data/raw/coronavirus-covid-19-number-of-cases-in-scotland-2020-03-07.html and https://github.com/tomwhite/covid-19-uk-data/blob/master/data/raw/coronavirus-covid-19-number-of-cases-in-scotland-2020-03-08.html

These are also consistent with the Grand Total column in https://github.com/watty62/Scot_covid19/blob/master/data/processed/regional_cases.csv.

@emmadoughty
Copy link
Owner

Looks like I pulled the data together before the updates finished for those days. I really appreciate you going through the data like this- it helps everyone out.

@tomwhite
Copy link
Author

NI cases: in my repo I have 3 on 2020-03-04, you have 1; and I have 3 on 2020-03-06, you have 4. This tweet (https://twitter.com/publichealthni/status/1235928458431205377) says 3 on 2020-03-06. Not sure about source for 2020-03-04 though.

@tomwhite
Copy link
Author

Actually NI cases for 2020-03-04 was reported as 3 here: https://www.health-ni.gov.uk/news/latest-update-coronavirus-covid-19

@emmadoughty
Copy link
Owner

Great. I'll update 04/03/2020 to 3. FYI, here (https://www.health-ni.gov.uk/news/latest-update-covid-19-coronavirus) cases on 06/03/2020 is shown as 4. Maybe a later update but not time-stamped

@tomwhite
Copy link
Author

Sounds good. I did a view source on that page and it looks like the timestamp is 20:24:36+00 (created time 16:49:16+00); the tweet is timestamped 2:01pm, so perhaps we should go with the tweet as it's the standard 2pm time? I think (hope) this is the last outstanding discrepancy for cases

emmadoughty added a commit that referenced this issue Mar 29, 2020
@andrewjmdata
Copy link

andrewjmdata commented Apr 1, 2020 via email

@emmadoughty
Copy link
Owner

Hi Andrew,

I have seen this. FYI, my cases_by_area.csv and COVID19_by_area.csv files include this data but also include regional breakdowns for Scotland, Wales and NI. I haven't ever done any mapping but you might want to look at how others have done it. I know a few people have given it a shot:

Hope this helps!

@tomwhite
Copy link
Author

tomwhite commented Apr 1, 2020

Hi Andrew - in addition to the ones Emma suggested, there are some links here that may be useful: tomwhite/covid-19-uk-data#18

@emmadoughty
Copy link
Owner

Sorry Tom, should have mentioned yours too!

@andrewjmdata
Copy link

andrewjmdata commented Apr 18, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants