Skip to content

Latest commit

 

History

History
110 lines (77 loc) · 5.63 KB

README.md

File metadata and controls

110 lines (77 loc) · 5.63 KB

HeartDetect 💘

LIME SHAP

Python NumPy scikit-learn Git

An Analytical Model For Early Intervention Of Heart Disease, implemented in 2 stages

Docs

Jupyter notebooks


Executive Summary

This report aims to deploy data analytics to solve the business problem for National Heart Centre Singapore (NHCS). Given the increasing incidence of reported cases of cardiovascular disease (CVD) in Singapore, NHCS handles more than 120,000 outpatient consultations each year. The sudden onset of heart disease is severe and expensive to treat. Therefore, NHCS can shift the focus to early prevention rather than treating post-diagnosis.

To increase the involvement of individuals and primary care sectors in the prevention of heart disease, our team proposes a 2-step solution – HeartDetect.

  • The first stage is to raise individuals' awareness and manage their heart health regularly.
  • The second stage is to enable the prediction of heart disease risk in the primary care sector to provide timely prevention.

Getting Started

1. Clone a copy of this repository

Open your terminal and run

git clone https://github.com/xJQx/bc2406-project.git

2. Understanding the jupyter nodebook flow

Data Cleaning and Pre-processing
a) data-cleaning-preprocessing.ipynb

Stage 1:
b) exploratory-data-analysis_1.ipynb
c) stage1-modelling.ipynb

Stage 2:
d) exploratory-data-analysis_2.ipynb
e) stage2-modelling.ipynb


3. Understanding the various csv files (datasets)

View the Data Dictionary here.
Dataset created from the data-cleaning-preprocessing.ipynb notebook:

.
├── heart_pki_2020_original.csv       # original dataset
|   ├── heart_pki_2020_cleaned.csv        # for EDA and visualization
|   └── heart_pki_2020_correlation.csv    # for EDA correlation (IntegerEncoding done)
|   └── heart_pki_2020_encoded.csv        # for analytical models (OneHotEncoding done)
|
├── o2Saturation_original.csv         # original dataset
├── heart_attack_original.csv         # original dataset
│   ├── heart_attack_cleaned.csv          # for EDA and analytical model (default integer encoding)
│   └── heart_attack_cleaned_text.csv     # for EDA and visualization (meaningful values)
└──|

4. Understanding the models directory

The models directory contain all the trained models from stages 1 and 2. They can be imported and used for a dataset that fits their data dimensions.
An example of importing and using an analytical model is as shown:

# Library
import joblib

# Load the model from disk
loaded_random_forest_m3 = joblib.load('models/stage2_random_forest_m3.sav')

# Using the analytical model
result = cross_val_score(loaded_random_forest_m3, X_test, y_test, cv=5, scoring = "roc_auc").mean()
print(result)

Contributors