Hospitalization Analysis SESDF

Analysis on patient's hospitalizations throughout state health institutions of Federal District of Brazil.

1- The Data

The data are the hospitalizations of Health Department of Federal District (SESDF) from year 2010 to 2018.

The data are composed by 108 files with 37 columns, ~20 million of rows and 4.74GB size.

Only 9 columns were selected to analysis and the overall explanation for excluded columns is on columns_excluded.md file.

Getting Data

The data were obtained from DataSUS (IT Health Department of Brazil), this sector is related to Ministry of Health of Brazil.

Link: DataSUS

The files are in (.dbc) format which is the compressed form (.dbf) dBase III format. I used the dbf2dbc.exe (Decompresser Program DBC to DBF), but due there were 108 files to process a batch script was created to deal with all files in a row.

The Batch script (process_all_files.bat)

for %%f in (*.dbc) do call dbf2dbc.exe %%f

Link: Program dbf2dbc explanation

After the processing, I've got each (.dbf) file and sent it to pandas Dataframe. This whole process is explained on notebooks.

2- Objectives

Analyze the relations between hospitalizations and public health institutions.
Check and balance the origin of patients
Correlate the costs of medical procedures and hospitalizations
Predict the number of hospitalizations in 2019

3- Data Wrangling and Machine Learning

The whole process of Data preparation and Machine Learning was explained on:

Português (BR): Notebook
English (US): Notebook

4- Power BI and Visualization

Relationship

The DataSUS data about health institutions and medical procedures are only codes (numbers) which doesn't represent the true names of organizations. I had to connect this numbers to real names of places and the procedures that's why I've made a deep search on SUS (Unique Health System) for tabular data of Hospital names and procedures names from codes. As a result, this relationship model was created into Power BI:

The (INT_PROCEDS.csv) is from original Dataset (INT_2010_to_2018.csv) created on Notebook, and so it was connected with Group of Procedures(PROCED_GP) and Subgroup of Procedures(PROCED_SG). What's more, the exact same was done with (LISTA_CNES & CNES - T..) whose names connected to the health institutions codes (CNES) on main dataset (INT_2010_to_2018.csv) in order to show correct names on charts.

Columns

Some columns (Variables) were selected to connect the datasets:

PROCED_GP: Informs the name of medical procedure (PROCED) to code (PROCED_GP) in (INT_PROCEDS.csv)
PROCED_SG: Informs the name of medical procedure (PROCED) to code (PROCED_SG) in (INT_PROCEDS.csv)
NAIH: Informs the procedures names (PROCED_GP & SG) and each hospitalization value (VALOR) to Main Dataset (INT_2010_to_2018.csv)
CNES: Informs the name of health institution (Nome Fatasia, Establ. Saude) to Main Dataset (INT_2010_to_2018.csv)
DATE: Informs the Monthly value of each Date to Main Dataset (INT_2010_to_2018.csv)

Power BI Files

File 1: Main Dataset (.csv compressed format)
File 2: Main Procedures (.csv compressed format)
File 3: Group Procedures (.xlsx excel)
File 4: Subgroup Procedures (.xlsx excel)
File 5: Main Inst. Name (.xlsx excel)
File 6: Support Inst. Name (.csv)
File 7: Monthly Value (.csv)

Visualizations

As a result, visualizations were made on Power BI from data in order to compare and get insights.

My reports on Power BI web link: Hospitalization Report

*CLICK ON IMAGES to access the View on Web Power BI App

Report Section 1 - Hospitalizations

Report Section 2 - Medical Procedures

Report Section 3 - Cost and Values

Report Section 4 - Prediction

InovaSES

InovaSES supports this project http://www.saude.df.gov.br/inovases-quem-somos/

Support to new ideas, innovative projects and entrepreneurship of Health Department Services.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Hospitalization Analysis SESDF

1- The Data

2- Objectives

3- Data Wrangling and Machine Learning

4- Power BI and Visualization

InovaSES

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hospitalization Analysis SESDF

1- The Data

2- Objectives

3- Data Wrangling and Machine Learning

4- Power BI and Visualization

InovaSES