Predict Bristol consistency of pediatric stool from a picture taken by a parent
To create the virtual environment directory for StoolTool, run python -m venv <name of virtual environment>
.
- Running
python -m venv venv
will make a virtual environment that will not be tracked by the.gitignore
.
pip install -r requirements.txt
To install the environment using conda
, follow the instructions from Anaconda for your operating system.
After installing Anaconda, you may need to install some packages using conda install -c conda-forge <package>
.
python3 transform_images.py [Images Directory] [-o <Output Directory>] [-c] [-g] [-u]
-o
Specify an output directory at the location . If not specified, defaults to output/
.
-c Crops the images in [Images Directory].
-g Turns images in [Images Directory] into greyscale.
-u Applies unique UUIDs to images in [Images Directory].
If no options are specified, the program will run all options using images from [Images Directory].
Transforms Susan's validation directory to our data format.
python susan_to_validation.py [Susan Directory] Optional: [Output Directory]
python3 train.py [Config File]
- To run using our setting, use
config.yaml
.
training_data_dir: Path to directory containing images to train on. Must have subdirectories for each class of image (1-7) and "weird" images
validation-data_dir: Path to directory containing validataion data (if separate). Must have subdirectories for each class of image (1-7) and "weird" images.
use_validation_dir: Boolean to use separate validation data form validation directory
downsample_train_data: Boolean for downsampling on the training data due to unbalanced data set
results_dir: Directory containing output performance metrics
train_fraction: Percentage of images from images_dir used as validation (if use_validation_dir is False)
seed: Random seed for shuffling images
learning_rate: Learning rate for classifier
num_epochs: Number of epochs to run
batches_per_epoch: Batches per epoc
batch_size: Batch size
Sample config file formatting in config.yaml
Metrics files are in directory specified by output_dir in config.yaml. Directories in ouput are:
- confusion_matrices
- metrics
- roc_curves
- signed_diff_hists
To see more details on output, watch metrics_info.mkv in the
documentation_media
directory.
Training data is stored as "TrainingData" on the StoolTool shared Google Drive. Validation Data is stored as "validation" on the StoolTool shared Google Drive.
Images of stool were gathered using Google reverse image search. This technique yielded many duplicate and near duplicate images and noisy images (e.g. orange curries). Using imgdupes we were able to reduce the number of images from 9242 (with duplicates and noise) to 280.
- Set directory paths in
app.py
for saving images and desired model and make sure they exist. - In
app.py
, edit "HOST" variable to be either localhost (127.0.0.1) or any network IP. - Run the Flask application with :
python app.py
.
- Navigate to "http://127.0.0.1:5000/get_rating_web" in any web browser.
- Choose an .jpg or .jpeg file to upload from your computer.
- Select a predicted label.
- Press the "Upload" Button.
Demo Video:
documentation_media/StoolToolFlaskWebDemo
To obtain the probabilities values for each label when given file name of image in the specified upload directory and the index of the model that is to be used: "http://127.0.0.1:5000/get_rating_json/<file_name>&<model_index>".
-
Go to hasty.ai and click on the StoolTool project.
-
Select the button with three lines at the top left corner of the screen. Then select project and export data.
-
Select options according to the above image.
-
When clicking the top right bell icon, you should see the status of the exported data and a link to download it once its completed.