Skip to content

Latest commit

 

History

History
147 lines (83 loc) · 5.96 KB

README.md

File metadata and controls

147 lines (83 loc) · 5.96 KB

Hospitalization Analysis SESDF

Analysis on patient's hospitalizations throughout state health institutions of Federal District of Brazil.

Section 1


1- The Data

The data are the hospitalizations of Health Department of Federal District (SESDF) from year 2010 to 2018.

The data are composed by 108 files with 37 columns, ~20 million of rows and 4.74GB size.

Only 9 columns were selected to analysis and the overall explanation for excluded columns is on columns_excluded.md file.

Getting Data

The data were obtained from DataSUS (IT Health Department of Brazil), this sector is related to Ministry of Health of Brazil.

Link: DataSUS

The files are in (.dbc) format which is the compressed form (.dbf) dBase III format. I used the dbf2dbc.exe (Decompresser Program DBC to DBF), but due there were 108 files to process a batch script was created to deal with all files in a row.

The Batch script (process_all_files.bat)

for %%f in (*.dbc) do call dbf2dbc.exe %%f

Link: Program dbf2dbc explanation

After the processing, I've got each (.dbf) file and sent it to pandas Dataframe. This whole process is explained on notebooks.


2- Objectives

  • Analyze the relations between hospitalizations and public health institutions.
  • Check and balance the origin of patients
  • Correlate the costs of medical procedures and hospitalizations
  • Predict the number of hospitalizations in 2019

3- Data Wrangling and Machine Learning

The whole process of Data preparation and Machine Learning was explained on:


4- Power BI and Visualization


Relationship

The DataSUS data about health institutions and medical procedures are only codes (numbers) which doesn't represent the true names of organizations. I had to connect this numbers to real names of places and the procedures that's why I've made a deep search on SUS (Unique Health System) for tabular data of Hospital names and procedures names from codes. As a result, this relationship model was created into Power BI:


Relation_PowerBI


The (INT_PROCEDS.csv) is from original Dataset (INT_2010_to_2018.csv) created on Notebook, and so it was connected with Group of Procedures(PROCED_GP) and Subgroup of Procedures(PROCED_SG). What's more, the exact same was done with (LISTA_CNES & CNES - T..) whose names connected to the health institutions codes (CNES) on main dataset (INT_2010_to_2018.csv) in order to show correct names on charts.


Columns

Some columns (Variables) were selected to connect the datasets:

  • PROCED_GP: Informs the name of medical procedure (PROCED) to code (PROCED_GP) in (INT_PROCEDS.csv)
  • PROCED_SG: Informs the name of medical procedure (PROCED) to code (PROCED_SG) in (INT_PROCEDS.csv)
  • NAIH: Informs the procedures names (PROCED_GP & SG) and each hospitalization value (VALOR) to Main Dataset (INT_2010_to_2018.csv)
  • CNES: Informs the name of health institution (Nome Fatasia, Establ. Saude) to Main Dataset (INT_2010_to_2018.csv)
  • DATE: Informs the Monthly value of each Date to Main Dataset (INT_2010_to_2018.csv)

Power BI Files


Visualizations

As a result, visualizations were made on Power BI from data in order to compare and get insights.

My reports on Power BI web link: Hospitalization Report

*CLICK ON IMAGES to access the View on Web Power BI App


  • Report Section 1 - Hospitalizations

Section 1


  • Report Section 2 - Medical Procedures

Section 2


  • Report Section 3 - Cost and Values

Section 3


  • Report Section 4 - Prediction

Section 4

InovaSES

InovaSES

InovaSES supports this project http://www.saude.df.gov.br/inovases-quem-somos/

Support to new ideas, innovative projects and entrepreneurship of Health Department Services.