-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancies between COVID19_deaths.csv and tomwhite/covid-19-uk-data #8
Comments
Hi Tom, I've also found similar issues with the data. Agreed on the typos
but suspect major diffs are where we are out of synch timewise.
I'm also using world data from JHU..
https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
There is a link to Github data source at the bottom. This has had major
problems with data missing on days with no explanation as to why.
I put it into a Tableau report which I update daily...
https://public.tableau.com/profile/andrewjmdata#!/vizhome/CoronaVirusMarch2020v4/Introduction?publish=yes
All the best
Andrew
…On Fri, 27 Mar 2020 at 10:47, Tom White ***@***.***> wrote:
I'm comparing COVID19_deaths.csv with the data I've been collating on
https://github.com/tomwhite/covid-19-uk-data, and came across a few
discrepancies.
- NI on 24/03/2020, the file in this repo has 4, mine has 5. According
to this tweet (https://twitter.com/healthdpt/status/1242528268844138497)
from the NI Dept of Health the correct figure is 5.
- Wales on 17/03/2020, the file in this repo has 2, mine has 1. I'm
not sure which is correct!
- Wales on 21/03/2020, the file in this repo has 5, mine has 3. The
archived page from PHW from that day says "Three people in Wales who tested
positive for Novel Coronavirus (COVID-19) have now died" (see
https://github.com/tomwhite/covid-19-uk-data/blob/master/data/raw/coronavirus-covid-19-number-of-cases-in-wales-2020-03-21.html).
However, this tweet (
https://twitter.com/PublicHealthW/status/1241402961189888001) says a
further 2 people died (so the total would be 5). I think which figure is
correct depends on reporting cut off times.
Otherwise all the figures match. One thing this shows is that inferring
the figures for England is not straightforward, since we don't know if the
reporting times match up.
I'd be interested to see what you think, and if you have any more
information about what the reported figures were for these days. Thanks
again for all your work on collecting the data.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEXYZFE4OHIDCDEM3JN77L3RJR75TANCNFSM4LU5DEZQ>
.
--
_________________
Andrew Whittam
Spurfold House
Church Lane
Grayshott
Hampshire
GU26 6LY
m: 07766-521794
e: andrew@jmdata.co.uk
www.jmdata.co.uk
|
Thanks Andrew. I haven't compared the numbers with Johns Hopkins data, but as you say there's likely to be discrepancies there too. |
Hi Tom and Andrew, I really appreciate you getting in contact about this. As I'm sure you know yourself, reporting methods from the public health agencies have been somewhat inconsistent making collation tricky. I've had a look at the issues you raised and checked my own sources:
If you do notice any other discrepancies, please do get in contact. Collating this data has been a very manual process owing to the number of sources, their reporting formats and frequently changing web locations so I can easily make mistakes or fail to be consistent and I am glad to be kept accountable! |
Hi Emma, Thanks for the details. I agree with you on all of these points. I've amended the NI 24/03/2020 number, since looking back at the report it says there were 2 new deaths that day (from 2 to 4). I've also amended Wales on 17/03/2020 to 2. It looks like you need to add 2 to England for 21/03/2020 to make the numbers add up correctly (since you subtracted 2 from Wales for that day). After all these changes I think our numbers are consistent! |
Yes, you're right! Thanks! |
@emmadoughty I just noticed another discrepancy: England on 21/03/2020 in https://github.com/emmadoughty/Daily_COVID-19/blob/master/Data/COVID19_deaths.csv should be 222 not 218 to get correct totals. The good news is that PHE is publishing number of deaths for England now (and other UK countries). |
Amended! Thanks, Tom. Yeah, I'm really glad to see their own reporting improving! |
Going through Scottish case numbers I noticed more discrepancies (hope you don't mind me reporting them here): 2020-03-07 should be 16 cases (not 11), 2020-03-08 should be 18 (not 16). See the raw scraped files https://github.com/tomwhite/covid-19-uk-data/blob/master/data/raw/coronavirus-covid-19-number-of-cases-in-scotland-2020-03-07.html and https://github.com/tomwhite/covid-19-uk-data/blob/master/data/raw/coronavirus-covid-19-number-of-cases-in-scotland-2020-03-08.html These are also consistent with the Grand Total column in https://github.com/watty62/Scot_covid19/blob/master/data/processed/regional_cases.csv. |
Looks like I pulled the data together before the updates finished for those days. I really appreciate you going through the data like this- it helps everyone out. |
NI cases: in my repo I have 3 on 2020-03-04, you have 1; and I have 3 on 2020-03-06, you have 4. This tweet (https://twitter.com/publichealthni/status/1235928458431205377) says 3 on 2020-03-06. Not sure about source for 2020-03-04 though. |
Actually NI cases for 2020-03-04 was reported as 3 here: https://www.health-ni.gov.uk/news/latest-update-coronavirus-covid-19 |
Great. I'll update 04/03/2020 to 3. FYI, here (https://www.health-ni.gov.uk/news/latest-update-covid-19-coronavirus) cases on 06/03/2020 is shown as 4. Maybe a later update but not time-stamped |
Sounds good. I did a view source on that page and it looks like the timestamp is 20:24:36+00 (created time 16:49:16+00); the tweet is timestamped 2:01pm, so perhaps we should go with the tweet as it's the standard 2pm time? I think (hope) this is the last outstanding discrepancy for cases |
Hi Both, Good work done on reckoning of the data so far.
I was looking at a table that PHE have put up:
https://fingertips.phe.org.uk/documents/Historic%20COVID-19%20Dashboard%20Data.xlsx
The is a sheet in the xls called ULTAs which shows cases by date and by
County/Local Authority type entities. I was hoping to create a map in my
Tableau report to provide something of interest. Unless I am missing
something, it looks like the geographic areas having been mixed up. Some
are Countys, some London Boroughs and some look like Local Authority
Districts!
If I am correct, it's going to make it very difficult to create maps from a
mixture of mapping data sets (shape files).
Just wondered if either of you had seen the data and had any suggestions?
Cheers
Andrew
…On Sun, 29 Mar 2020 at 13:14, Tom White ***@***.***> wrote:
Sounds good. I did a view source on that page and it looks like the
timestamp is 20:24:36+00 (created time 16:49:16+00); the tweet is
timestamped 2:01pm, so perhaps we should go with the tweet as it's the
standard 2pm time? I think (hope) this is the last outstanding discrepancy
for cases
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEXYZFAOSU2OKIR2B7APY7LRJ43Q3ANCNFSM4LU5DEZQ>
.
--
_________________
Andrew Whittam
Spurfold House
Church Lane
Grayshott
Hampshire
GU26 6LY
m: 07766-521794
e: andrew@jmdata.co.uk
www.jmdata.co.uk
|
Hi Andrew, I have seen this. FYI, my cases_by_area.csv and COVID19_by_area.csv files include this data but also include regional breakdowns for Scotland, Wales and NI. I haven't ever done any mapping but you might want to look at how others have done it. I know a few people have given it a shot:
Hope this helps! |
Hi Andrew - in addition to the ones Emma suggested, there are some links here that may be useful: tomwhite/covid-19-uk-data#18 |
Sorry Tom, should have mentioned yours too! |
Hi Emma, Hope you are well. I noticed that your data had stopped at
April 15th and also that the source data has now changed. Grrrr..
Did you find an alternative way of getting at the data?
I have tried all sorts of quick things and am about to try using python
webscraping. Not that hopeful but will let you know if I get it to work.
Regards
Andrew
On 01/04/2020 16:23, Emma Doughty wrote:
Sorry Tom, should have mentioned yours too!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEXYZFDU3YPJDLPWB5AT2WLRKNMA3ANCNFSM4LU5DEZQ>.
--
…_________________ Andrew Whittam Spurfold House Church Lane Grayshott
Hampshire GU26 6LY m: 07766-521794 e: andrew@jmdata.co.uk www.jmdata.co.uk
|
I'm comparing COVID19_deaths.csv with the data I've been collating on https://github.com/tomwhite/covid-19-uk-data, and came across a few discrepancies.
Otherwise all the figures match. One thing this shows is that inferring the figures for England is not straightforward, since we don't know if the reporting times match up.
I'd be interested to see what you think, and if you have any more information about what the reported figures were for these days. Thanks again for all your work on collecting the data.
The text was updated successfully, but these errors were encountered: