- Overview
- Project Set Up and Installation
- Dataset
- Automated ML
- Hyperparameter Tuning
- Model Deployment
- Screen Recording
- Comments and future improvements
- Dataset Citation
- References
The current project uses machine learning to predict patients’ survival based on their medical data.
I create two models in the environment of Azure Machine Learning Studio: one using Automated Machine Learning (i.e. AutoML) and one customized model whose hyperparameters are tuned using HyperDrive. I then compare the performance of both models and deploy the best performing model as a service using Azure Container Instances (ACI).
The diagram below is a visualization of the rough overview of the operations that take place in this project:
In order to run the project in Azure Machine Learning Studio, we will need the two Jupyter Notebooks:
automl.ipynb
: for the AutoML experiment;hyperparameter_tuning.ipynb
: for the HyperDrive experiment.
The following files are also necessary:
heart_failure_clinical_records_dataset.csv
: the dataset file. It can also be taken directly from Kaggle;train.py
: a basic script for manipulating the data used in the HyperDrive experiment;scoring_file_v_1_0_0.py
: the script used to deploy the model which is downloaded from within Azure Machine Learning Studio; &env.yml
: the environment file which is also downloaded from within Azure Machine Learning Studio.
Cardiovascular diseases (CVDs) kill approximately 18 million people globally every year, being the number 1 cause of death globally. Heart failure is one of the two ways CVDs exhibit (the other one being myocardial infarctions) and occurs when the heart cannot pump enough blood to meet the needs of the body. People with cardiovascular disease or who are at high cardiovascular risk need early detection and management wherein Machine Learning would be of great help. This is what this project attempts to do: create an ML model that could help predicting patients’ survival based on their medical data.
The dataset used is taken from Kaggle and -as we can read in the original Research article- the data comes from 299 patients with heart failure collected at the Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad (Punjab, Pakistan), during April–December 2015. The patients consisted of 105 women and 194 men, and their ages range between 40 and 95 years old.
The dataset contains 13 features:
Feature | Explanation | Measurement |
---|---|---|
age | Age of patient | Years (40-95) |
anaemia | Decrease of red blood cells or hemoglobin | Boolean (0=No, 1=Yes) |
creatinine-phosphokinase | Level of the CPK enzyme in the blood | mcg/L |
diabetes | Whether the patient has diabetes or not | Boolean (0=No, 1=Yes) |
ejection_fraction | Percentage of blood leaving the heart at each contraction | Percentage |
high_blood_pressure | Whether the patient has hypertension or not | Boolean (0=No, 1=Yes) |
platelets | Platelets in the blood | kiloplatelets/mL |
serum_creatinine | Level of creatinine in the blood | mg/dL |
serum_sodium | Level of sodium in the blood | mEq/L |
sex | Female (F) or Male (M) | Binary (0=F, 1=M) |
smoking | Whether the patient smokes or not | Boolean (0=No, 1=Yes) |
time | Follow-up period | Days |
DEATH_EVENT | Whether the patient died during the follow-up period | Boolean (0=No, 1=Yes) |
The main task that I seek to solve with this project & dataset is to classify patients based on their odds of survival. The prediction is based on the first 12 features included in the above table, while the classification result is reflected in the last column named Death event (target) and it is either 0
(no
) or 1
(yes
).
First, I made the data publicly accessible in the current GitHub repository via this link: https://raw.githubusercontent.com/dimikara/heart-failure-prediction/master/heart_failure_clinical_records_dataset.csv
and then create the dataset:
As it is depicted below, the dataset is registered in Azure Machine Learning Studio:
Registered datasets: Dataset heart-failure-prediction registered
I am also accessing the data directly via:
data = pd.read_csv('./heart_failure_clinical_records_dataset.csv')
AutoML settings and configuration:
Below you can see an overview of the automl
settings and configuration I used for the AutoML run:
automl_settings = {"n_cross_validations": 2,
"primary_metric": 'accuracy',
"enable_early_stopping": True,
"max_concurrent_iterations": 4,
"experiment_timeout_minutes": 20,
"verbosity": logging.INFO
}
automl_config = AutoMLConfig(compute_target = compute_target,
task = 'classification',
training_data = dataset,
label_column_name = 'DEATH_EVENT',
path = project_folder,
featurization = 'auto',
debug_log = 'automl_errors.log,
enable_onnx_compatible_models = False
**automl_settings
)
"n_cross_validations": 2
This parameter sets how many cross validations to perform, based on the same number of folds (number of subsets). As one cross-validation could result in overfit, in my code I chose 2 folds for cross-validation; thus the metrics are calculated with the average of the 2 validation metrics.
"primary_metric": 'accuracy'
I chose accuracy as the primary metric as it is the default metric used for classification tasks.
"enable_early_stopping": True
It defines to enable early termination if the score is not improving in the short term. In this experiment, it could also be omitted because the experiment_timeout_minutes is already defined below.
"max_concurrent_iterations": 4
It represents the maximum number of iterations that would be executed in parallel.
"experiment_timeout_minutes": 20
This is an exit criterion and is used to define how long, in minutes, the experiment should continue to run. To help avoid experiment time out failures, I used the value of 20 minutes.
"verbosity": logging.INFO
The verbosity level for writing to the log file.
compute_target = compute_target
The Azure Machine Learning compute target to run the Automated Machine Learning experiment on.
task = 'classification'
This defines the experiment type which in this case is classification. Other options are regression and forecasting.
training_data = dataset
The training data to be used within the experiment. It should contain both training features and a label column - see next parameter.
label_column_name = 'DEATH_EVENT'
The name of the label column i.e. the target column based on which the prediction is done.
path = project_folder
The full path to the Azure Machine Learning project folder.
featurization = 'auto'
This parameter defines whether featurization step should be done automatically as in this case (auto) or not (off).
debug_log = 'automl_errors.log
The log file to write debug information to.
enable_onnx_compatible_models = False
I chose not to enable enforcing the ONNX-compatible models at this stage. However, I will try it in the future. For more info on Open Neural Network Exchange (ONNX), please see here.
During the AutoML run, the Data Guardrails are run when automatic featurization is enabled. As we can see in the screenshot below, the dataset passed all three checks:
Data Guardrails Checks in the Notebook
Data Guardrails Checks in Azure Machine Learning Studio
After the completion, we can see and take the metrics and details of the best run:
Best model results:
AutoML Model | |
---|---|
id | AutoML_213153bb-f0e4-4be9-b265-6bbad4f0f9e4_40 |
Accuracy | 0.8595525727069351 |
AUC_weighted | 0.9087491748331944 |
Algorithm | VotingEnsemble |
Screenshots from Azure ML Studio
AutoML models
Best model data
Best model metrics
Charts
Aggregate feature importance
As we can see, time is by far the most important factor, followed by serum creatinine and ejection fraction.
For this experiment I am using a custom Scikit-learn Logistic Regression model, whose hyperparameters I am optimising using HyperDrive. Logistic regression is best suited for binary classification models like this one and this is the main reason I chose it.
I specify the parameter sampler using the parameters C and max_iter and chose discrete values with choice for both parameters.
Parameter sampler
I specified the parameter sampler as such:
ps = RandomParameterSampling(
{
'--C' : choice(0.001,0.01,0.1,1,10,20,50,100,200,500,1000),
'--max_iter': choice(50,100,200,300)
}
)
I chose discrete values with choice for both parameters, C and max_iter.
C is the Regularization while max_iter is the maximum number of iterations.
RandomParameterSampling is one of the choices available for the sampler and I chose it because it is the faster and supports early termination of low-performance runs. If budget is not an issue, we could use GridParameterSampling to exhaustively search over the search space or BayesianParameterSampling to explore the hyperparameter space.
Early stopping policy
An early stopping policy is used to automatically terminate poorly performing runs thus improving computational efficiency. I chose the BanditPolicy which I specified as follows:
policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)
evaluation_interval: This is optional and represents the frequency for applying the policy. Each time the training script logs the primary metric counts as one interval.
slack_factor: The amount of slack allowed with respect to the best performing training run. This factor specifies the slack as a ratio.
Any run that doesn't fall within the slack factor or slack amount of the evaluation metric with respect to the best performing run will be terminated. This means that with this policy, the best performing runs will execute until they finish and this is the reason I chose it.
Please also see this video here where we can see that the RunDetails widget is enabled and the experiment is logging during its run until it shows 'Completed'.
After the completion, we can see and get the metrics and details of the best run:
Best model overview:
HyperDrive Model | |
---|---|
id | HD_debd4c29-658d-4280-b761-2308b5eff7e4_1 |
Accuracy | 0.8333333333333334 |
--C | 0.01 |
--max_iter | 300 |
Screenshots from Azure ML Studio
HyperDrive model
Best model data and details
Best model metrics
The deployment is done following the steps below:
- Selection of an already registered model
- Preparation of an inference configuration
- Preparation of an entry script
- Choosing a compute target
- Deployment of the model
- Testing the resulting web service
Using as basis the accuracy
metric, we can state that the best AutoML model is superior to the best model that resulted from the HyperDrive run. For this reason, I choose to deploy the best model from AutoML run (best_run_automl.pkl
, Version 2).
Registered models in Azure Machine Learning Studio
Runs of the experiment
The inference configuration defines the environment used to run the deployed model. The inference configuration includes two entities, which are used to run the model when it's deployed:
- An entry script, named
scoring_file_v_1_0_0.py
. - An Azure Machine Learning environment, named
env.yml
in this case. The environment defines the software dependencies needed to run the model and entry script.
The entry script is the scoring_file_v_1_0_0.py
file. The entry script loads the model when the deployed service starts and it is also responsible for receiving data, passing it to the model, and then returning a response.
As compute target, I chose the Azure Container Instances (ACI) service, which is used for low-scale CPU-based workloads that require less than 48 GB of RAM.
The AciWebservice Class represents a machine learning model deployed as a web service endpoint on Azure Container Instances. The deployed service is created from the model, script, and associated files, as I explain above. The resulting web service is a load-balanced, HTTP endpoint with a REST API. We can send data to this API and receive the prediction returned by the model.
cpu_cores
: It is the number of CPU cores to allocate for this Webservice. Can also be a decimal.
memory_gb
: The amount of memory (in GB) to allocate for this Webservice. Can be a decimal as well.
auth_enabled
: I set it to True in order to enable auth for the Webservice.
enable_app_insights
: I set it to True in order to enable AppInsights for this Webservice.
Bringing all of the above together, here is the actual deployment in action:
Best AutoML model deployed (Azure Machine Learning Studio)
Deployment takes some time to conclude, but when it finishes successfully the ACI web service has a status of Healthy and the model is deployed correctly. We can now move to the next step of actually testing the endpoint.
Endpoint (Azure Machine Learning Studio)
After the successful deployment of the model and with a Healthy service, I can print the scoring URI, the Swagger URI and the primary authentication key:
The same info can be retrieved from Azure Machine Learning Studio as well:
The scoring URI can be used by clients to submit requests to the service.
In order to test the deployed model, I use a Python file, named endpoint.py
:
In the beginning, I fill in the scoring_uri
and key
with the data of the aciservice printed above. We can test our deployed service, using test data in JSON format, to make sure the web service returns a result.
In order to request data, the REST API expects the body of the request to be a JSON document with the following structure:
{
"data":
[
<model-specific-data-structure>
]
}
In our case:
The data is then converted to JSON string format:
We set the content type:
Finally, we make the request and print the response on screen:
I execute Cell 21 and, based on the above, I expect to get a response in the format of true
or false
:
In order to test the deployed service, one could use the above file by inserting data in the endpoint.py
file, saving it, and then run the relevant cell in the automl.ipynb
Jupyter Notebook.
Another way would be using the Swagger URI of the deployed service and the Swagger UI.
A third way would also be to use Azure Machine Learning Studio. Go to the Endpoints section, choose aciservice and click on the tab Test:
Fill in the empty fields with the medical data you want to get a prediction for and click Test:
The screen recording can be found here and it shows the project in action.
More specifically, the screencast demonstrates:
- A working model
- Demo of the deployed model
- Demo of a sample request sent to the endpoint and its response
-
The first factor that could improve the model is increasing the training time. This suggestion might be seen as a no-brainer, but it would also increase costs and this is a limitation that can be very difficult to overcome: there must always be a balance between minimum required accuracy and assigned budget.
-
Continuing the above point, it would be great to be able to experiment more with the hyperparameters chosen for the HyperDrive model or even try running it with more of the available hyperparameters, with less time contraints.
-
Another thing I would try is deploying the best models to the Edge using Azure IoT Edge and enabling logging in the deployed web apps.
-
I would certainly try to deploy the HyperDrive model as well, since the deployment procedure is a bit defferent than the one used for the AutoML model.
-
In the original Research article where this dataset was used it is mentioned that:
Random Forests [...] turned out to be the top performing classifier on the complete dataset
I would love to further explore on this in order to create a model with higher accuracy that would give better and more reliable results, with potential practical benefits in the field of medicine.
-
The question of how much training data is required for machine learning is always valid and, by all means, the dataset used here is rather small and geographically limited: it contains the medical records of only 299 patients and comes from only a specific geographical area. Increasing the sample size can mean higher level of accuracy and more reliable results. Plus, a dataset including data from patients from around the world would also be more reliable as it would compensate for factors specific to geographical regions.
-
Finally, although cheerful and taking into account gender equality, it would be great not to stumble upon issues like this:
Davide Chicco, Giuseppe Jurman: Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making 20, 16 (2020).
- Udacity Nanodegree material
- Research article: Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone
- Heart Failure Prediction Dataset
- Consume an Azure Machine Learning model deployed as a web service
- Deploy machine learning models to Azure
- A Review of Azure Automated Machine Learning (AutoML)
- The Holy Bible of Azure Machine Learning Service. A walk-through for the believer (Part 3)
- What is Azure Container Instances (ACI)?
- AutoMLConfig Class
- Using Azure Machine Learning for Hyperparameter Optimization
- hyperdrive Package
- Tune hyperparameters for your model with Azure Machine Learning
- Configure automated ML experiments in Python
- How Azure Machine Learning works: Architecture and concepts
- Configure data splits and cross-validation in automated machine learning
- How Much Training Data is Required for Machine Learning?
- How Can Machine Learning be Reliable When the Sample is Adequate for Only One Feature?
- Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints
- Predicting sample size required for classification performance