cluster

Aug 26, 2020

d6cd330 · Aug 26, 2020

Name	Name	Last commit message	Last commit date
parent directory ..
Makefile.pool	Makefile.pool	Updated default batch sizes and improved docstrings	Aug 26, 2020
Makefile.serve	Makefile.serve	Updated default batch sizes and improved docstrings	Aug 26, 2020
README.md	README.md	Added k8s manifest and deployment makefile	Aug 20, 2020
ray_cluster.yaml	ray_cluster.yaml	Bump image version	Aug 26, 2020
ray_pool_cluster.yaml	ray_pool_cluster.yaml	Bump image version	Aug 26, 2020

README.md

Running distributed KernelSHAP

To create a virtual environment that allows you to run KernelSHAP in a distributed fashion with ray you need to configure your environment first, which requires conda to be installed. You can then run the command::

conda env create -f environment.yml -p /home/user/anaconda3/envs/env_name

to create the environment and then activate it with conda activate shap. If you don not wish to change the installation path then you can skip the -p option. You are now ready to run the experiments. The steps involved are:

data processing
running the experiments

To process the data it is sufficient to run python preprocess_data.py with the default options. This will output a preprocessed version of the Adult dataset and a partition of it that is used to initialise the KernelSHAP explainer. However, you can proceed to step 2 if you don't intend to change the default parameters as the same data will be automatically downloaded.

You can run an experiment with the command python experiment.py. By default, this will run the explainer on the 2560 examples from the Adult dataset with a background dataset with 100 samples, sequentially (5 times if the -benchmark 1 option is passed to it). The resuults are saved in the results/ folder. If you wish to run the same explanations in parallel, then run the command

python experiment.py -cores 3

which will use ray to perform explanations across multiple cores.

Other options for the script are:

-benchmark: if set to 1, -cores will be treated as the upper bound of number of cores to compute the explanations on. The lower bound is 2, and the explanations are computed 5 times (by default) to provide runtime averages. The number of repetitions can be controlled using the -nruns argument.
-batch_size: controls how many instances are explained by a core at once. This parameter has an important bearing to the code runtime performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

cluster

cluster

README.md

Running distributed KernelSHAP

Collapse file tree

Files

cluster

Directory actions

More options

Directory actions

More options

Latest commit

History

cluster

Folders and files

parent directory

README.md

Running distributed KernelSHAP