This notebook looks into using various Python-based machine learning and data science libraries in an attempt to build a machine learning model capable of predicting whether or not someone has heart disease based on their medical attributes.
We're going to take the following approach:
1. Problem definition
2. Data
3. Evaluation
4. Features
4. Modelling
5. Experimentation
In a statement,
Given clinical parameters about a patient, can we predict whether or not they have heart disease
The original data came from the Cleavland data from the UCI Machine Learning Repository.
https://archive.ics.uci.edu/ml/datasets/heart+Disease
There is also a version of it available on Kaggle.
https://www.kaggle.com/datasets/sumaiyatasmeem/heart-disease-classification-dataset
If we can reach 95% accuracy at predicting whether or not a patient has heart disease during the proof of concept, we'll pursue the project.
This is where you'll get different information about each of the features in your data. You can do this via doing your own research (such as looking at the links above) or by talking to a subject matter expert (someone who knows about the database)
Create data dictionary
-
age: Displays the age of the individual.
-
sex: Displays the gender of the individual using the following format : 1 = male 0 = female
-
cp- Chest-pain type: displays the type of chest-pain experienced by the individual using the following format : 0 = typical angina 1 = atypical angina 2 = non — anginal pain 3 = asymptotic
-
trestbps- Resting Blood Pressure: displays the resting blood pressure value of an individual in mmHg (unit). anything above 130-140 is typically cause for concern.
-
chol- Serum Cholestrol: displays the serum cholesterol in mg/dl (unit)
-
fbs- Fasting Blood Sugar: compares the fasting blood sugar value of an individual with 120mg/dl. If fasting blood sugar > 120mg/dl then : 1 (true) else : 0 (false) '>126' mg/dL signals diabetes
-
restecg- Resting ECG : displays resting electrocardiographic results 0 = normal 1 = having ST-T wave abnormality 2 = left ventricular hyperthrophy
-
thalach- Max heart rate achieved : displays the max heart rate achieved by an individual.
-
exang- Exercise induced angina : 1 = yes 0 = no
-
oldpeak- ST depression induced by exercise relative to rest: displays the value which is an integer or float.
-
slope- Slope of the peak exercise ST segment : 0 = upsloping: better heart rate with excercise (uncommon) 1 = flat: minimal change (typical healthy heart) 2 = downsloping: signs of unhealthy heart
-
ca- Number of major vessels (0–3) colored by flourosopy : displays the value as integer or float.
-
thal : Displays the thalassemia : 1,3 = normal 6 = fixed defect 7 = reversible defect: no proper blood movement when excercising
-
target : Displays whether the individual is suffering from heart disease or not : 1 = yes 0 = no
We're going to use Pandas, Matplotlib and NumPy for data analysis and manipulation.
Note: Project done in course: Complete Machine Learning & Data Science Bootcamp 2023 by Zero to Mastery