The rising incidence of heart disease is a major public health concern, underscoring the importance of early detection and intervention. Accurately and efficiently diagnosing heart disease is a challenging task, as it involves numerous risk factors, such as cholesterol levels, smoking habits, obesity, family history, blood pressure, and working conditions.
Thankfully, advances in machine learning algorithms have made it possible to develop predictive models that can help healthcare professionals identify individuals at risk of heart disease. By using regression models or the KNN method, machine learning algorithms can analyze large datasets of patient information and generate accurate predictions.
Developing these models can have a significant impact on public health, potentially saving tens of thousands of lives in the future. Being a part of such a critical endeavor is a humbling and inspiring experience, as it highlights the immense potential of machine learning to improve human health and well-being.
This project requires Python and the following Python libraries installed:
You will also need to have software installed to run and execute a Jupyter Notebook.
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included.
Template code is provided in the HeartDiseacePrediction.ipynb
notebook file. You will also be required to use the included HeartDiseacePrediction.ipynb
Jupyter Notebook file and the heart.csv
dataset file to complete your work.
In a terminal or command window, navigate to the top-level project directory HeartDiseacePrediction/
(that contains this README) and run one of the following commands:
ipython notebook HeartDiseacePrediction.ipynb
jupyter notebook HeartDiseacePrediction.ipynb
or open with Juoyter Lab
jupyter lab
This will open the Jupyter Notebook software and project file in your browser.
This data set dates from 1988 and consists of four databases: Cleveland, Hungary, Switzerland, and Long Beach V. It contains 76 attributes, including the predicted attribute, but all published experiments refer to using a subset of 14 of them. The "target" field refers to the presence of heart disease in the patient. It is integer valued 0 = no disease and 1 = disease. Attribute Information: Description of the data
age - Age of the patient
sex - Sex of the patient
cp - Chest pain type ~ 0 = Typical Angina, 1 = Atypical Angina, 2 = Non-anginal Pain, 3 = Asymptomatic
trtbps - Resting blood pressure (in mm Hg)
chol - Cholestoral in mg/dl fetched via BMI sensor
fbs - (fasting blood sugar > 120 mg/dl) ~ 1 = True, 0 = False
restecg - Resting electrocardiographic results ~ 0 = Normal, 1 = ST-T wave normality, 2 = Left ventricular hypertrophy
thalach - Maximum heart rate achieved
oldpeak - Previous peak
slope - the slope of the peak exercise ST segment
caa - Number of major vessels
thal - Thalium Stress Test result ~ (0,3)
exng - Exercise induced angina ~ 1 = Yes, 0 = No
output - Target variable
All the output examples I have on each step you can see in HeartDiseasePrediction.ipynb
Jupyter notebook file.
Logistic Regression (Scikit-learn) Naive Bayes (Scikit-learn) K-Nearest Neighbours (Scikit-learn)
📧 Feel free to contact me @