Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanks for publishing the data, and some identified data quality issues (admin field, duplication) #43

Open
steven4320555 opened this issue Mar 25, 2020 · 1 comment

Comments

@steven4320555
Copy link

steven4320555 commented Mar 25, 2020

Thanks for publishing the dataset, it creates a possible structure of looking at anonymised individual-level data. The methodology reads sound, but the quality of data can definitely improve over time.

For example, there are some links in admin columns.

image

And there are cases, same reference referring to different ID.
(As a bonus, I found the original link used in the data has been updated to https://www.gov.uk/government/news/cmo-for-england-announces-4-new-cases-of-novel-coronavirus-2-march-2020 ) Instead of 2021 as referenced in the data.

image

Looking at the symptoms field, it seems to me that some data standardisation was attempted (for example 22, 23 ). Hope to get some consistency of descriptions.

image

Hope the quality of data can be improved over time. Good work!

@smazrouee
Copy link

I wonder how did they extract these symptoms and gave it it an ID with age, location, etc. I checked some of the publications, there is no specific information about individuals. I don't see anything under wiki tab. Is there a place that they explained how the collect/refine this data? thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants