Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The acumulate deads for some contries are wrong #3

Open
carloserwin opened this issue Apr 13, 2020 · 6 comments
Open

The acumulate deads for some contries are wrong #3

carloserwin opened this issue Apr 13, 2020 · 6 comments

Comments

@carloserwin
Copy link

Do this, for example

data <- covid19.data("ts-deaths")

x <- data[data$Country.Region == "Germany", ]
y <- as.numeric(x[-(1:4)])
plot(y, type = "l")

Do you see the problem?

If not, try looking at this particular "aggregate number of deads" in these dates:

2020-04-10 2020-04-11
2767 > (cant be) 2736

This also happends for India, and I do not know if for other countries.

Regards,
CE

obviously this is not correct.

@mponce0
Copy link
Owner

mponce0 commented Apr 13, 2020

Thanks for reporting this.
I took a look at the data and the issue is in the actual data source from JHU, see

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv

You can see the issue with the case of Germany as you noticed.
I couldn't find any problems with the numbers reported for India though.

There is no much I can do other than letting the people at JHU know about this as that is the source of the data in this case.

@carloserwin
Copy link
Author

Thanks!!!

For India, you can look at the mistake easily

x <- data[data$Country.Region == "India", -(1:4)]
diff(as.numeric(x))

if there is a negative there must be something wrong.

Cheers!
CE

@mponce0
Copy link
Owner

mponce0 commented Apr 13, 2020

Yes, I can see that, thanks again.

I have opened an issue with JHU/CCSEGIS, see
CSSEGISandData/COVID-19#2165

This is the list of location I found with this anomalies,

44 Prince Edward Island Canada
45 Quebec Canada
91 Cyprus
107 Finland
121 Germany
131 Iceland
132 India
142 Kazakhstan
183 Philippines
195 Serbia
198 Slovakia

In the meanwhile I will implement some checks to warn the user about this.

@carloserwin
Copy link
Author

carloserwin commented Apr 13, 2020 via email

@mponce0
Copy link
Owner

mponce0 commented Apr 16, 2020

Three new functions have been added to the package to test for data integrity and consistency:

  • integrity.check:
    is a function that determines whether there are integrity issues within the datasets or changes to the structure of the data as reported by JHU/CCSEGIS
  • consistency.check:
    is a function that determines whether there are consistency issues within the data, such as, anomalies in the cumulative quantities of the data as reported by JHU/CCSEGIS
  • data.checks:
    is a function to check for data integrity and data consistency invoking the previous functions

These functions are already part of the development version of the package available in the GitHub repository and will be included in the next version of the package submitted to CRAN.

@mponce0
Copy link
Owner

mponce0 commented May 13, 2020

These functions are also part of version v1.1 available to be installed from CRAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants