Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify health-related data sets for inclusion #150

Open
cliftonmcintosh opened this issue Feb 20, 2017 · 8 comments
Open

Identify health-related data sets for inclusion #150

cliftonmcintosh opened this issue Feb 20, 2017 · 8 comments

Comments

@cliftonmcintosh
Copy link
Member

Open Nepal has many health-related data sets. Identify ones that might be good to include. Create an issue for each one and link the issue to the actual data set. Data sets that work well:

  • have data for each district. Even better if they have data for each VDC.
  • have counts for each district.

Examples of potentially good data sets include:

These are potential samples. Once data sets have been identified and issues have been created, then the team can prioritize which issues would or would not be valuable to include.

@amitness
Copy link
Member

I'm participating in "Open Data Day Hackathon 2017" and we've been provided few datasets. One of them includes immunization dataset by district. Will this be helpful? @cliftonmcintosh

Immunization dataset by district last 2 year.csv.zip

@cliftonmcintosh
Copy link
Member Author

cliftonmcintosh commented Feb 24, 2017 via email

@cliftonmcintosh
Copy link
Member Author

@ravinepal

As per our discussion via email, I have been looking at health data more closely. OpenNepal has quite a few data sets, but one of my concerns is that some of them are several years old. I have been looking at the most recent Annual Report from the Department of Health Services (available here as a PDF). This is a more recent version of the data sets that OpenNepal has digested. I believe we can extract the data from the tables in that report using Tabula and this will provide us with more recent data. I have extracted a few data sets this way. They need manipulating to convert them into a usable format, but I believe it will be worth the effort. Right now I have done the preliminary extraction for several of the tables in the "Safe Motherhood" section. These include data on:

  • antenatal maternity care
  • delivery methods and locations (home versus a health facility)
  • postnatal infant and mother care
  • newborn and maternal deaths
  • abortion care
  • nutrition in the first two years of a child's life

The data sets need more processing, and it may be that not all of them are valuable, but I think there is a lot we can mine from the document.

It would be nice if team members could look through those tables and see if they think see some data points that might be important to show.

@ravinepal
Copy link
Member

thanks, @cliftonmcintosh! should i reach out to open nepal team to see if they can extract these datasets? (responded to your email as well.)

@cliftonmcintosh
Copy link
Member Author

@ravinepal

Thanks for offering to reach out to Open Nepal for extracting the data, but I would like to try my hand at it for a couple of datasets first. This will allow me to convert the data in a way that is useful for NepalMap. Moving to a format that is useful for us from the format delivered by theTabula PDF converter is likely to be no more difficult than moving from the way OpenNepal presents the data.

@ravinepal
Copy link
Member

sounds good, @cliftonmcintosh! @amitness has extracted some of census data in the past - so looping him to see if he can advise/help as well

@amitness
Copy link
Member

@ravinepal @cliftonmcintosh Tabula is the best way to go. There is this useful wrapper for tabula in python called tabula-py. Also here is the Example on using it.

@cliftonmcintosh
Copy link
Member Author

@amitness Thanks for the tips

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants