Skip to content

lazarzivanovicc/pyspark-hf-predictions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Heart Failure Prediction using PySpark

This project focuses on solving Heart Failure prediction using PySpark and the Heart Failure Prediction - Clinical Records dataset from Kaggle.

Dataset Description

The dataset contains medical records of 5000 patients who experienced heart failure, collected during their follow-up period. Each patient profile includes 13 clinical features:

  • age: Age of the patient (years)
  • anaemia: Decrease of red blood cells or hemoglobin (boolean)
  • creatinine phosphokinase (CPK): Level of the CPK enzyme in the blood (mcg/L)
  • diabetes: If the patient has diabetes (boolean)
  • ejection fraction: Percentage of blood leaving the heart at each contraction (percentage)
  • high blood pressure: If the patient has hypertension (boolean)
  • platelets: Platelets in the blood (kiloplatelets/mL)
  • sex: Woman or man (binary)
  • serum creatinine: Level of serum creatinine in the blood (mg/dL)
  • serum sodium: Level of serum sodium in the blood (mEq/L)
  • smoking: If the patient smokes or not (boolean)
  • time: Follow-up period (days)
  • [TARGET] DEATH_EVENT: If the patient died during the follow-up period (boolean)

Objective

The primary goal of this project is to develop a model that predicts the survival of a patient with heart failure. Given that this is a binary classification problem, we will implement the XGBoost Classifier.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published