A model-agnostic tool for explaining machine learning models using rule surrogate and matrix-style visualization.
Check the paper "RuleMatrix: Visualizing and Understanding Classifiers with Rules" for detailed information and technical references. The preprint pdf can be found on Arxiv. The published version can be found on IEEE Explorer.
Basically, RuleMatrix can be used to extract a human-readable rule list that approximate a given classifier. For example, we have a neural network trained on the Iris dataset to classify iris plants to three different classes ['setosa', 'versicolor', 'virginica']. A rule list surrogate of the neural network could be (prob is the probability of the three different classes):
IF (petal length (cm) in (-inf, 2.9799)) THEN prob: [0.9375, 0.0500, 0.0125]
ELSE IF (petal width (cm) in (2.0558, inf)) THEN prob: [0.0200, 0.0200, 0.9600]
ELSE IF (petal length (cm) in (2.9799, 4.7345)) THEN prob: [0.0164, 0.9508, 0.0328]
ELSE DEFAULT prob: [0.0222, 0.1778, 0.8000]
Besides the basic rule surrogate algorithm, RuleMatrix also provides a visualization toolkit to help you analyze the rules, and the relation between the rules and the original model.
The code is still under development and is not ready to publish on PyPI.
You can download the code or clone the repository from github by:
https://github.com/rulematrix/rule-matrix-py.git
Then run pip install -e .
to install the package.
The core function of the package is rulematrix.Surrogate
, which takes a trained teacher model (only use the predict function), and a student model (use RuleList by default).
Then you can use the scikit-learn style API to fit the surrogate model to the provided teacher model on a given training dataset (only provide X).
import rulematrix
from sklearn.neural_network import MLPClassifier
teacher = MLPClassifier()
# ...code to train the neural net teacher model
surrogate = rulematrix.Surrogate(teacher.predict, student=None, is_continuous=None, is_categorical=None, is_integer=None,
ranges=None, cov_factor=1.0, sampling_rate=2.0, seed=None, verbose=False)
surrogate.fit(train_x)
print(surrogate.student)
You can also checkout the helper function rulematrix.rule_surrogate
, which makes life easier.
The usage example can be found at this notebook.
The default student model used by Surrogate
.
It inherits from the BayesianRuleList
of the pysbrl
to provide functions to handle numeric data automatically.
It will discretize numeric data using a MDLP (Minimum Description Length Principal) discretizer.
This is because BayesianRuleList
can only take discreized data as input.
Their are two modes to render the visualization. The first is to use it in a jupyter notebook (recommended), the second is to run a server-client application (with more powerful functions)
After installing the package, you can create a jupyter notebook and try out surrogate rules and RuleMatrix visualization easily. You can check this example notebook.
If you are working on the jupyter notebook offline, make sure you set local=True
when calling the render function rulematrix.render
.
The server code is migrated from the draft repo and is currently under refactoring. You can check the online demo hosted here
The package has a few python dependencies:
pysbrl
: A python wrapper for the SBRL (Scalable Bayesian Rule List). To utilize fast bit operations, SBRL is written in C to make the training of rule list faster.mdlp-discretization
: A Cython pacakge for discretizing numeric data using MDLP.pyfim
: A pacakge that implements different frequent itemset mining algorithms.
Since pysbrl
and mdlp-discretization
are still under developing, there can appear compatibility issues between rulematrix
and these packages.
Raise an issue here or at these packages if there is bugs.
A draft version is originally hosted at:
https://github.com/myaooo/x-rule
The code in this repo is under active development, APIs are redesigned.
@ARTICLE{ming18,
author={Yao Ming and Huamin Qu and Enrico Bertini},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={RuleMatrix: Visualizing and Understanding Classifiers with Rules},
year={2018},
volume={},
number={},
pages={1-1},
keywords={Machine learning;Data visualization;Visualization;Neural networks;Decision trees;Data models;Support vector machines;explainable machine learning;rule visualization;visual analytics},
doi={10.1109/TVCG.2018.2864812},
ISSN={1077-2626},
month={},}