Skip to content

Data Analysis at Health Department of Federal District Brazil

License

Notifications You must be signed in to change notification settings

leoitcode/sesdfproject

Repository files navigation

Hospitalization Analysis SESDF

Analysis on patient's hospitalizations throughout state health institutions of Federal District of Brazil.

Section 1


1- The Data

The data are the hospitalizations of Health Department of Federal District (SESDF) from year 2010 to 2018.

The data are composed by 108 files with 37 columns, ~20 million of rows and 4.74GB size.

Only 9 columns were selected to analysis and the overall explanation for excluded columns is on columns_excluded.md file.

Getting Data

The data were obtained from DataSUS (IT Health Department of Brazil), this sector is related to Ministry of Health of Brazil.

Link: DataSUS

The files are in (.dbc) format which is the compressed form (.dbf) dBase III format. I used the dbf2dbc.exe (Decompresser Program DBC to DBF), but due there were 108 files to process a batch script was created to deal with all files in a row.

The Batch script (process_all_files.bat)

for %%f in (*.dbc) do call dbf2dbc.exe %%f

Link: Program dbf2dbc explanation

After the processing, I've got each (.dbf) file and sent it to pandas Dataframe. This whole process is explained on notebooks.


2- Objectives

  • Analyze the relations between hospitalizations and public health institutions.
  • Check and balance the origin of patients
  • Correlate the costs of medical procedures and hospitalizations
  • Predict the number of hospitalizations in 2019

3- Data Wrangling and Machine Learning

The whole process of Data preparation and Machine Learning was explained on:


4- Power BI and Visualization


Relationship

The DataSUS data about health institutions and medical procedures are only codes (numbers) which doesn't represent the true names of organizations. I had to connect this numbers to real names of places and the procedures that's why I've made a deep search on SUS (Unique Health System) for tabular data of Hospital names and procedures names from codes. As a result, this relationship model was created into Power BI:


Relation_PowerBI


The (INT_PROCEDS.csv) is from original Dataset (INT_2010_to_2018.csv) created on Notebook, and so it was connected with Group of Procedures(PROCED_GP) and Subgroup of Procedures(PROCED_SG). What's more, the exact same was done with (LISTA_CNES & CNES - T..) whose names connected to the health institutions codes (CNES) on main dataset (INT_2010_to_2018.csv) in order to show correct names on charts.


Columns

Some columns (Variables) were selected to connect the datasets:

  • PROCED_GP: Informs the name of medical procedure (PROCED) to code (PROCED_GP) in (INT_PROCEDS.csv)
  • PROCED_SG: Informs the name of medical procedure (PROCED) to code (PROCED_SG) in (INT_PROCEDS.csv)
  • NAIH: Informs the procedures names (PROCED_GP & SG) and each hospitalization value (VALOR) to Main Dataset (INT_2010_to_2018.csv)
  • CNES: Informs the name of health institution (Nome Fatasia, Establ. Saude) to Main Dataset (INT_2010_to_2018.csv)
  • DATE: Informs the Monthly value of each Date to Main Dataset (INT_2010_to_2018.csv)

Power BI Files


Visualizations

As a result, visualizations were made on Power BI from data in order to compare and get insights.

My reports on Power BI web link: Hospitalization Report

*CLICK ON IMAGES to access the View on Web Power BI App


  • Report Section 1 - Hospitalizations

Section 1


  • Report Section 2 - Medical Procedures

Section 2


  • Report Section 3 - Cost and Values

Section 3


  • Report Section 4 - Prediction

Section 4

InovaSES

InovaSES

InovaSES supports this project http://www.saude.df.gov.br/inovases-quem-somos/

Support to new ideas, innovative projects and entrepreneurship of Health Department Services.

About

Data Analysis at Health Department of Federal District Brazil

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published