Table of Contents
Accurate assessment of mortality risk of the patients at the time of admission has to be made inorder to determine and make available, the medical resources required for the patient.
Rather than going the conventional way of checking vital signs of the patients, details regarding quickly repeatable and efficient laboratory tests have been used as the features for building the model.
Due to the quick repetitive nature of the tests, these features can be used for generating a quick assesment of the mortality condition of the patient. Leveraging these sets of features, this app determines whether a patient admitted to a hospital has a high risk of mortality or not.
PatientSurvivalPrediction
|
|—— .streamlit
|—— saved_files
| |—— data
| | |—— Data Dictionary.csv
| | |—— Dataset.csv
| |
| |—— images
| | |—— app_recording.gif
| | |—— sidebar_image.jpg
| | |—— title_image.jpg
| |
| |—— notebooks
| | |—— Patient_survival_prediction.ipynb
| |
| |—— plots
| |—— feature_contribution.png
| |—— force_plot.png
|
|—— src
| |—— models
| | |—— features.sav
| |
| |—— utils
| |—— final_model.sav
| |—— scaler_selected.sav
| |—— shap_values.sav
| |—— winsorizer.sav
|
|—— .gitignore
|—— Procfile
|—— README.md
|—— app.py
|—— requirements.txt
|—— runtime.txt
|—— setup.sh
The app uses a machine learning model known as "Random Forest Regressor" for making the prediction. Random Forest belongs to tree based ensemble models category and hence it is tough to interpret the final output of the model.
Hence, I have used an explainable AI tool known as "SHAP" for understanding the model and interpreting the output.
Check the app for more detailed explanation of the model and the predicted output.
-
The dataset had 186 features and most of them were correlated.
-
There were lot of missing values in the dataset and majority of the features had more than 50% missing values.
-
The target feature was highly imbalanced with data corresponding to hospital deaths constituting only about 8.63% of the entire data. This required the usage of sampling techniques.
-
The dataset had lots of outliers as well.
The app was built using Python3.8 and with the help of Streamlit library.
Other major libraries used include:
- Numpy
- Pandas
- Matplotlib
- Scikit-learn
- Shap
-
Inside a new directory clone the files from the repo using
https://github.com/Retinpkumar/PatientSurvivalPrediction
-
Create a new virtual environment and activate it.
-
Open the terminal and install the libraries and dependancies using
pip install -r requirements.txt
-
Start and run the app by entering the code given below in the terminal
streamlit run app.py
For further discussions and queries, contact me at: retinpkumar@gmail.com