CSO Case Study

This repository is provided as supplementary material for the paper "XAI Tools in the Public Sector: A Case Study on Predicting Combined Sewer Overflows" by Nicholas Maltbie, Nan Niu, Reese Johnson, and Matthew VanDoren.

These are the notes for the CSO case study, how the data is prepared, ML models are tuned and created, and the final interpretability analysis.

This repository contains instructions on how to use the code required to create models for the dataset and then how to apply these models to a sample dataset and gather expandability results for our research.

The data in this repository is randomized as the data used for the research is proprietary to our stakeholder.

This project is designed to run on a Linux system with an available NVIDIA GPU as well as a minimum of 4GB of RAM and disk space for libraries, datasets, and results (fairly small and should only require 8GB of disk storage for all libraries and installations).

README.md - this description file
REQUIREMENTS.md - system, hardware, and software requirements to operate this project
STATUS.md - kinds of badges being applied to as part of the project.
LICENSE.txt - License associated with this project
INSTALL.md - Installation instructions
FSE21_XAI_Tools.pdf - Copy of the accepted paper in PDF format.

Scripts

run_lstm_hparam.py - A python script to generate a LSTM model for a given set of hyper parameters
hparams_search.sh - A script to automate searching through hyper parameters.

IPython Notebooks

These are jupyter notebook files that document how to run the project and have example visualizations and information.

Data_Preparation.ipynb - Prepare the data from the original sensors into a synchronized and interpolated form
Interpretability.ipynb - Apply interpretability tools to the various models.
Paper Charts.ipynb - Notebook with code to generate various charts used in the paper

Folders

Datasets - This is a representation of the data used in the project. We are not able to release the proprietary data we used from our stakeholder as part of the case study, but this is randomized data to help show how this code operates and how to use this in future projects.
Datasets-Synchronized - This is the synchronized and interpolated dataset generated from the sensor output
Dataset-Analysis - This folder holds results of tuning models (or a sample of model tuning) from the Datasets-Synchronized data. This folder is generated by the run_lstm_hparam.py script.

Setup Instructions

This project requires python 3.8 (installation guide), anaconda (installation guide).

Steps for installing and setting up the project can be found in the INSTALL file.

Project Organization

The project contains copies of all the files generated using the randomized data as part of the project. The files are all derived from the csv files in the Datasets folder.

The project uses assets in the following order

Data_Perparation.ipynb to prepare and clean the data
hparams_search.sh to find a set of tuned hyper parameters
run_lstm_hparam.py to create final LSTM based models
Interpretability.ipynb to complete an analysis using XAI tools
Paper Charts.ipynb to create the charts based on results and analysis

A more detailed description of how to use these tools is written next.

Project Usage Explanation

This project should be run in the format of first following setup instructions to setup an environment. Then the Data_Perparation.ipynb notebook can be used to read in the raw sensor data from Datasets folder to create synchronized and interpolated datasets into the Datasets-Synchronized folder.

Next the hparams_search.sh can be used to search various hyper parameters. This uses the run_lstm_hparam.py to generate a LSTM based model. The final set of hyper parameters we ended up using (2 layers with 24 nodes) can be generated through this command:

python run_lstm_hparam.py \
    --end_offset 1 --start_offset 0 --seq_len 12 \
    --num_units 24 --dropout 0 --num_layers 2 \
    --class_weight 2 --learning_rate 0.001 \
    --batch_size=1024

To visualize the results of the training, you can use TensorBoard:

tensorboard --logdir Dataset-Analysis/lstm_hparams/logs

Once the model has been generated, results for each run can be found using either tensorboard under the hparams menu or by looking up the genreated results in Dataset-Analysis/lstm_hparams/logs/complete/{model_name}. This will have the results for both the validation subset as well as the training subset of the data.

Now that we have the functional results of the model, we can move onto generating the interpretability analysis of the model. To do this use the Interpretability.ipynb notebook. (this notebook has to be run using jupyter notebook and NOT jupyter lab due to limitations of the tools).

The final notebook Paper Charts.ipynb has code to generate various charts used in the paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSO Case Study

Contents

Documentation

Scripts

IPython Notebooks

Folders

Setup Instructions

Project Organization

Project Usage Explanation

About

Releases 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Dataset-Analysis		Dataset-Analysis
Datasets-Synchronized		Datasets-Synchronized
Datasets		Datasets
Data_Preparation.ipynb		Data_Preparation.ipynb
FSE21_XAI_Tools.pdf		FSE21_XAI_Tools.pdf
INSTALL.md		INSTALL.md
Interpretability.ipynb		Interpretability.ipynb
LICENSE.txt		LICENSE.txt
Paper Charts.ipynb		Paper Charts.ipynb
README.md		README.md
REQUIREMENTS.md		REQUIREMENTS.md
STATUS.md		STATUS.md
conda-requirements.txt		conda-requirements.txt
hparams_search.sh		hparams_search.sh
pip-requirements.txt		pip-requirements.txt
run_lstm_hparam.py		run_lstm_hparam.py

License

nicholas-maltbie/CSO-Analysis

Folders and files

Latest commit

History

Repository files navigation

CSO Case Study

Contents

Documentation

Scripts

IPython Notebooks

Folders

Setup Instructions

Project Organization

Project Usage Explanation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Languages