Thorax-Pneumonia Classifier Project

Project Overview

Welcome to this Deep Learning project: Chest X-ray images are classified being normal or pneumonia ones by using Convolutional Neural Networks. Means, given a converted .jpeg compressed image of a chest X-ray DICOM image, the algorithm will identify an estimate of the image status showing a pneumonia or not.

First image source, second image source

The international Digital Imaging and Communications in Medicine standard, DICOM standard for short, delivers the processes and interfaces to transmit, store, retrieve, print, process, and display medical imaging information between relevant modality components of an hospital information system.

The used Kaggle dataset delivers already labelled images as training, validation and testing samples. As mentioned, these images are already converted to the .jpeg image format. In other words, private individual data information sets don't exist.

After viewing such images it has been identified, that posterior-anterior or anterior-posterior X-ray image orientation is available and that mostly children images are selected. No X-ray images of all human age categories and X-ray lateral orientation have been found. But this could only be analysed more properly by reconverting the images to the .dcm DICOM format having the associated DICOM tags available. Doing this, regulatory data protection aspects have to be taken into account (e.g. Health Insurance Portability and Accountability Act, HIPAA), therefore this has not been done. It would be a HIPAA compliance breach.

Image source

Project Documentation

As an introduction to the projects way of working and implementation, read this report documentation.

More general, this project is linked as example part to the Medium blog post 'AI in Healthcare Not Only Changing Doctors Diagnostic Workflow'.

Project Instructions

Download GraphViz and on Windows install it to 'C:/Program Files (x86)/Graphviz2.38/'. Afterwards add the 'C:/Program Files (x86)/Graphviz2.38/bin/' directory to the PATH environment variable. This path information is part of the 'step 0' chapter of the python project file too. So, don't change it.

Pydot and GraphViz are used together to plot the neural network architecture. GraphViz is now licensed on an open source basis, only under The Common Public License.

Download the chest image dataset. Unzip the folder and place the delivered 'chest_xray' directory in your repository, at location path/to/chest-classifier-project/data.
Have a look to the new directories and delete all the '.DS_store' files, they are not needed for this algorithm and would throw errors by using this coding.

Using the original chest X-ray image separations to the directories train, test, val and their associated subdirectories caused the neural network to unreliable results. Its distribution does not fit to the 80/20 or 70/30 rule of thumb according training and testing data. This original distribution has been changed to other ratios, like e.g. 80/20 set (60/20/20 distribution for training/validation/testing) and further information is mentioned in the chest-class_app.ipynb file. Playing around with different distributions - especially the amount of validation samples - showed big changes in the prediction metric results. Having a look to some model architectures, their prediction performance isn't good anymore, e.g. having a ROC AUC even worse than random prediction. Bias and overfitting appeared in some cases. So, for the final project different python notebook files are stored, each one using a different data distribution. The associated ratio is part of the file name.

Some of the best weights training results are stored in the saved_models directory, where are CNN architecture .png files are stored as well.

Regarding the bottleneck features for the transfer learning models, only the npz files of the ResNet50 from the different data distributions could be stored in this repository as a zip file. They can be downloaded and unpacked in a subdirectory called bottleneck_features. The ones created for the InceptionV3 models are too big for the GitHub repository. The same issue appeared for the best model weights files of the fine-tuned ResNet models for each data distribution.

As a future to do: Better hyperparameter values (batch and epoch sizes together with the initialisation) must be found by having a better environment for machine learning algorithms. Hyperparameter tuning from Scikit-Learn with GridSearchCV or RandomizedCV as an alternative has not been done, because it is computational expensive having such a lot of parameters for the neural network architectures. It was not possible with the existing environment (own hardware or AWS EC2 service).

If you are running the project on your local machine (and not using AWS) create and activate a new environment. First, move to directory path/to/chest-classifier-project.

Windows

  conda create --name chest-class-project python=3.6
  activate chest-class-project
  pip install -r requirements/requirements.txt

If you are running the project on your local machine (and not using AWS), create an IPython kernel for the chest-class-project environment.

python -m ipykernel install --user --name chest-class-project --display-name "chest-class-project"

Open the notebook.

jupyter notebook chest-class_app.ipynb

If you are running the project on your local machine (and not using AWS), before running code, change the kernel to match the chest-class-project environment by using the drop-down menu (Kernel > Change kernel > chest-class-project).

License

This project coding is released under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
bottleneck_features		bottleneck_features
images		images
models		models
requirements		requirements
saved_models		saved_models
CapstoneProposal_chest-pneumonia-classifier.pdf		CapstoneProposal_chest-pneumonia-classifier.pdf
Evaluation_Class.py		Evaluation_Class.py
LICENSE		LICENSE
ML2_CapstoneReport_CNN-Thorax-Pneumonia-Classifier.pdf		ML2_CapstoneReport_CNN-Thorax-Pneumonia-Classifier.pdf
README.md		README.md
chest-class_app_V2_70-10-20.ipynb		chest-class_app_V2_70-10-20.ipynb
chest-class_app_V3_60-20-20.ipynb		chest-class_app_V3_60-20-20.ipynb
chest-class_app_originalData.ipynb		chest-class_app_originalData.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thorax-Pneumonia Classifier Project

Project Overview

Project Documentation

Project Instructions

License

About

Releases

Packages

Contributors 2

Languages

License

IloBe/CNN_Thorax-Pneumonia_Classifier

Folders and files

Latest commit

History

Repository files navigation

Thorax-Pneumonia Classifier Project

Project Overview

Project Documentation

Project Instructions

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages