The goal of this lab is to train a model for the diagnosis of coronary artery disease.
The dataset is provided by the Cleveland Clinic Foundation for Heart Disease (more information). The dataset file to use is available here. Each row describes a patient. Below is a description of each column.
Column | Description | Feature Type | Data Type |
---|---|---|---|
Age | Age in years | Numerical | integer |
Sex | (1 = male; 0 = female) | Categorical | integer |
CP | Chest pain type (0, 1, 2, 3, 4) | Categorical | integer |
Trestbpd | Resting blood pressure (in mm Hg on admission to the hospital) | Numerical | integer |
Chol | Serum cholestoral in mg/dl | Numerical | integer |
FBS | (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) | Categorical | integer |
RestECG | Resting electrocardiographic results (0, 1, 2) | Categorical | integer |
Thalach | Maximum heart rate achieved | Numerical | integer |
Exang | Exercise induced angina (1 = yes; 0 = no) | Categorical | integer |
Oldpeak | ST depression induced by exercise relative to rest | Numerical | float |
Slope | The slope of the peak exercise ST segment | Numerical | integer |
CA | Number of major vessels (0-3) colored by flourosopy | Numerical | integer |
Thal | 3 = normal; 6 = fixed defect; 7 = reversable defect | Categorical | string |
Target | Diagnosis of heart disease (1 = true; 0 = false) | Classification | integer |
You may use either a local or remote Python environment for this lab.
The easiest way to obtain a working Python setup is by using a cloud-based Jupyter notebook execution platform like Google Colaboratory, Paperspace or Kaggle Notebooks.
To tackle this challenge, you should leverage three essential libraries of the Python ecosystem for Machine Learning: NumPy, pandas and scikit-learn.
If any of these tools is new to you, follow the following tutorial(s): NumPy, pandas, scikit-learn.
You may train any binary classification model on this task, for example a basic SGDClassifier implementing the logistic regression algorithm.
To implement the training process, you should take inspiration from the project workflow and classification performance code examples.
Try another model, for example a decision tree, and compare their performances.