Skip to content

Commit

Permalink
make force_plot compatible with shap >= 0.36.0
Browse files Browse the repository at this point in the history
  • Loading branch information
bgreenwell committed Feb 21, 2021
1 parent c96733e commit c593285
Show file tree
Hide file tree
Showing 8 changed files with 112 additions and 123 deletions.

This file was deleted.

3 changes: 0 additions & 3 deletions .Rproj.user/shared/notebooks/paths
Original file line number Diff line number Diff line change
@@ -1,3 +0,0 @@
/Users/bgreenwell/Dropbox/devel/fastshap/R/gen_pkg_bib.R="E1B8FA98"
/Users/bgreenwell/Dropbox/devel/fastshap/rjournal/greenwell.Rmd="44C6DFCC"
/Users/bgreenwell/Dropbox/devel/fastshap/rjournal/greenwell.bib="6C1920D6"
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,4 @@ Suggests:
LinkingTo:
Rcpp,
RcppArmadillo
RoxygenNote: 7.0.2
RoxygenNote: 7.1.1
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# fastshap 0.0.5.9000

## Bug fixes

* The `force_plot()` function should now be compatible with **shap** (>=0.36.0); thanks to @hfshr and @jbwoillard for reporting [(#12)](https://github.com/bgreenwell/fastshap/issues/12).

# fastshap 0.0.5

## Bug fixes
Expand Down
10 changes: 9 additions & 1 deletion R/force_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ force_plot.explain <- function(object, baseline = NULL, feature_values = NULL,

# Import shap module and construct HTML for plot
shap <- reticulate::import("shap")

# shap$initjs()
fp <- shap$force_plot(
base_value = if (is.null(baseline)) 0 else baseline,
Expand All @@ -121,8 +122,15 @@ force_plot.explain <- function(object, baseline = NULL, feature_values = NULL,
)

# Display results
# FIXME: Is this the best way to determine/compare Python package versions?
shap_version <- reticulate::py_get_attr(shap, name = "__version__")
shap_version <- package_version(as.character(shap_version))
tfile <- tempfile(fileext = ".html")
shap$save_html(tfile, plot_html = fp)
if (shap_version < "0.36.0") {
shap$save_html(tfile, plot_html = fp)
} else {
shap$save_html(tfile, plot = fp)
}
display <- match.arg(display)
# Check for dependencies
if (display == "viewer") {
Expand Down
3 changes: 2 additions & 1 deletion inst/tinytest/test_force_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ trn <- gen_friedman(500, seed = 101)
X <- subset(trn, select = -y)

# Fit a MARS model to the simulated Friedman benchmark data
mars <- eartth::earth(y ~ ., data = trn, degree = 2)
mars <- earth::earth(y ~ ., data = trn, degree = 2)
preds <- predict(mars, newdata = trn)[, 1L, drop = TRUE]

# Prediction wrapper
pfun <- function(object, newdata) {
Expand Down
7 changes: 4 additions & 3 deletions rjournal/greenwell.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,11 @@ In particular, the Shapley contribution of the $i$-th feature to an instance $x$
\nonumber
\phi_i\left(x\right) = \frac{1}{p!} \sum_{\mathcal{O} \in \pi\left(p\right)} \left[\Delta Pre^i\left(\mathcal{O}\right) \cup \left\{i\right\} - Pre^i\left(\mathcal{O}\right)\right], \quad i = 1, 2, \dots, p,
\end{equation}
where ...
where $\pi\left(p\right)$ is the set of all permutations of feature indices $\left\{1, 2, \dots, p\right\}$, and $Pre^i\left(\mathcal{O}\right)$ is the set of all feature indices that appear before $i$ in $\mathcal{O} \in \pi\left(p\right)$.

A simple example may help clarify the main ideas.


### Fairly splitting a bar tab

Alex, Brad, and Brandon decide to go out for drinks after work. They shared a few pitchers of beer, but nobody payed attention to how much each person drank. What's a fair way to split the tab? Suppose we knew the follow information, perhaps based on historical data:
Expand Down Expand Up @@ -111,7 +112,7 @@ So the next time the bartender asks how you want to split the tab, whip out a pe

## Estimating Shapley values via Monte Carlo simulation: SampleSHAP

A single estimate of the contribution of $x_i$ to $f\left(x\right)$ is nothing the more than the difference between two predictions, where each prediction is based on a sort of Frankenstein instance that' are's constructed by swapping out values between the instance being explained ($x$) and an instance selected at random from the training data. To help stabilize the results, the procedure is repeated a large number, say, $R$, times, and the result averaged together.
If we assume the features are independent, then $\phi_i\left(x\right)$ can be approximated via a simple Monte-Carlo simulation. In particular, a single estimate of the contribution of $x_i$ to $f\left(x\right)$ is nothing the more than the difference between two predictions, where each prediction is based on a sort of Frankenstein instance that' are's constructed by swapping out values between the instance being explained ($x$) and an instance selected at random from the training data. To help stabilize the results, the procedure is repeated a large number, say, $R$, times, and the result averaged together.

\begin{algorithm}
\begin{enumerate}
Expand Down Expand Up @@ -157,7 +158,7 @@ Recall that the contribution of the $i$-th feature to the prediction $f\left(X\r
&= \beta_i \left(X_i - \E\left(X_i\right)\right)
\end{split},
\end{equation}
where we estimate $\E\left(X_i\right)$ with the corresponding sample mean $\bar{X}_i$. The quantity $\phi_i\left(X\right)$ is also referred to as the \emph{situational importance of $X_i$} \citep{achen-1982-interpreting}.
where we estimate $\E\left(X_i\right)$ with the corresponding sample mean $\bar{X}_i$. The quantity $\phi_i\left(X\right)$ is also referred to as the \emph{situational importance of $X_i$} \citep{achen-1982-interpreting}. So, for additive linear models, the contribution of feature $X_i$ is the difference between the feature's effect minus the average effect.


### Tree-based models: TreeSHAP
Expand Down
203 changes: 90 additions & 113 deletions rjournal/greenwell.bib
Original file line number Diff line number Diff line change
@@ -1,135 +1,112 @@
% Books ------------------------------------------------------------------------
@book{achen-1982-interpreting,
title = {Interpreting and Using Regression},
author = {Achen, Christopher H.},
isbn = {9780803900004},
lccn = {82042675},
series = {Interpreting and Using Regression},
year = {1982},
publisher = {Sage Publications}
@Manual{R-Rcpp,
title = {Rcpp: Seamless R and C++ Integration},
author = {Dirk Eddelbuettel and Romain Francois and JJ Allaire and Kevin Ushey and Qiang Kou and Nathan Russell and Douglas Bates and John Chambers},
year = {2020},
note = {R package version 1.0.5},
url = {https://CRAN.R-project.org/package=Rcpp},
}


% Articles ---------------------------------------------------------------------
@article{strumbelj-2014-explaining,
author = {{Š}trumbelj, Erik and Kononenko, Igor},
title = {Explaining prediction models and individual predictions with feature contributions},
journal = {Knowledge and Information Systems},
year = {2014},
volume = {31},
number = {3},
pages = {647--665},
url = {https://doi.org/10.1007/s10115-013-0679-x}
@Manual{R-SHAPforxgboost,
title = {SHAPforxgboost: SHAP Plots for XGBoost},
author = {Yang Liu and Allan Just},
year = {2021},
note = {R package version 0.1.0},
url = {https://github.com/liuyanguu/SHAPforxgboost},
}

@article{gosiewska-2019-iBreakDown,
author = {Alicja Gosiewska and Przemyslaw Biecek},
title = {iBreakDown: Uncertainty of Model Explanations for Non-additive Predictive
Models},
journal = {CoRR},
volume = {abs/1903.11420},
year = {2019},
url = {http://arxiv.org/abs/1903.11420},
archivePrefix = {arXiv},
eprint = {1903.11420},
timestamp = {Tue, 02 Apr 2019 11:16:55 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1903-11420.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
@Manual{R-fastshap,
title = {fastshap: Fast Approximate Shapley Values},
author = {Brandon Greenwell},
note = {R package version 0.0.5},
url = {https://github.com/bgreenwell/fastshap},
year = {2020},
}

@article{veronika-2017-catboost,
author = {Anna Veronika Dorogush and Andrey Gulin and Gleb Gusev and Nikita Kazeev and Liudmila Ostroumova Prokhorenkova and
Aleksandr Vorobev},
title = {Fighting biases with dynamic boosting},
journal = {CoRR},
volume = {abs/1706.09516},
year = {2017},
url = {http://arxiv.org/abs/1706.09516},
archivePrefix = {arXiv},
eprint = {1706.09516},
timestamp = {Mon, 13 Aug 2018 16:46:23 +0200},
biburl = {https://dblp.org/rec/journals/corr/DorogushGGKPV17.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
@Manual{R-iBreakDown,
title = {iBreakDown: Model Agnostic Instance Level Variable Attributions},
author = {Przemyslaw Biecek and Alicja Gosiewska and Hubert Baniecki and Adam Izdebski},
year = {2020},
note = {R package version 1.3.1},
url = {https://CRAN.R-project.org/package=iBreakDown},
}

@article{chen-2016-xgboost,
author = {Tianqi Chen and Carlos Guestrin},
title = {XGBoost: {A} Scalable Tree Boosting System},
journal = {CoRR},
volume = {abs/1603.02754},
year = {2016},
url = {http://arxiv.org/abs/1603.02754},
archivePrefix = {arXiv},
eprint = {1603.02754},
timestamp = {Mon, 13 Aug 2018 16:47:00 +0200},
biburl = {https://dblp.org/rec/journals/corr/ChenG16.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
@Manual{R-iml,
title = {iml: Interpretable Machine Learning},
author = {Christoph Molnar and Patrick Schratz},
year = {2020},
note = {R package version 0.10.1},
url = {https://CRAN.R-project.org/package=iml},
}

@article{R-sellereite,
doi = {10.21105/joss.02027},
url = {https://doi.org/10.21105/joss.02027},
year = {2019},
publisher = {The Open Journal},
volume = {5},
number = {46},
pages = {2027},
author = {Nikolai Sellereite and Martin Jullum},
title = {shapr: An R-package for explaining machine learning models with dependence-aware Shapley values},
journal = {Journal of Open Source Software}
@Manual{R-reticulate,
title = {reticulate: Interface to Python},
author = {Kevin Ushey and JJ Allaire and Yuan Tang},
year = {2020},
note = {R package version 1.18},
url = {https://github.com/rstudio/reticulate},
}

@Manual{R-shapper,
title = {shapper: Wrapper of Python Library shap},
author = {Szymon Maksymiuk and Alicja Gosiewska and Przemyslaw Biecek},
year = {2020},
note = {R package version 0.1.3},
url = {https://github.com/ModelOriented/shapper},
}

% Other ------------------------------------------------------------------------
@Article{Rcpp2011,
title = {{Rcpp}: Seamless {R} and {C++} Integration},
author = {Dirk Eddelbuettel and Romain Fran\c{c}ois},
journal = {Journal of Statistical Software},
year = {2011},
volume = {40},
number = {8},
pages = {1--18},
url = {http://www.jstatsoft.org/v40/i08/},
doi = {10.18637/jss.v040.i08},
}

@misc{aas-2019-explaining,
title = {Explaining individual predictions when features are dependent: More accurate approximations to Shapley values},
author = {Kjersti Aas and Martin Jullum and Anders Løland},
year = {2019},
eprint = {1903.10464},
archivePrefix = {arXiv},
primaryClass = {stat.ML}
@Book{Rcpp2013,
title = {Seamless {R} and {C++} Integration with {Rcpp}},
author = {Dirk Eddelbuettel},
publisher = {Springer},
address = {New York},
year = {2013},
note = {ISBN 978-1-4614-6867-7},
doi = {10.1007/978-1-4614-6868-4},
}

@misc{R-shapFlex,
title = {Shapley Decomposition of R-Squared in Machine Learning Models},
author = {Nickalus Redell},
year = {2019},
eprint = {1908.09718},
archivePrefix = {arXiv},
primaryClass = {stat.ME}
@Article{Rcpp2017,
title = {{Extending extit{R} with extit{C++}: A Brief Introduction to extit{Rcpp}}},
author = {Dirk Eddelbuettel and James Joseph Balamuta},
journal = {PeerJ Preprints},
year = {2017},
month = {aug},
volume = {5},
pages = {e3188v1},
issn = {2167-9843},
url = {https://doi.org/10.7287/peerj.preprints.3188v1},
doi = {10.7287/peerj.preprints.3188v1},
}

@misc{frye-2019-asymmetric,
title = {Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability},
author = {Christopher Frye and Ilya Feige and Colin Rowat},
@Misc{iBreakDown2019,
title = {Do Not Trust Additive Explanations},
author = {Alicja Gosiewska and Przemyslaw Biecek},
year = {2019},
eprint = {1910.06358},
archivePrefix = {arXiv},
primaryClass = {stat.ML}
eprint = {arXiv:1903.11420},
url = {https://arxiv.org/abs/1903.11420},
}

@incollection{lundberg-2017-KernelSHAP,
Author = {Lundberg, Scott M and Lee, Su-In},
Booktitle = {Advances in Neural Information Processing Systems 30},
Editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
Pages = {4765--4774},
Publisher = {Curran Associates, Inc.},
Title = {A Unified Approach to Interpreting Model Predictions},
Url = {http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf},
Year = {2017},
Bdsk-Url-1 = {http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf}}
@Article{iml2018,
author = {Christoph Molnar and Bernd Bischl and Giuseppe Casalicchio},
title = {iml: An R package for Interpretable Machine Learning},
doi = {10.21105/joss.00786},
url = {https://joss.theoj.org/papers/10.21105/joss.00786},
year = {2018},
publisher = {Journal of Open Source Software},
volume = {3},
number = {26},
pages = {786},
journal = {JOSS},
}

@incollection{ke-2017-lightgbm,
Author = {Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan},
Booktitle = {Advances in Neural Information Processing Systems 30},
Editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
Pages = {3146--3154},
Publisher = {Curran Associates, Inc.},
Title = {LightGBM: A Highly Efficient Gradient Boosting Decision Tree},
Url = {http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf},
Year = {2017},
Bdsk-Url-1 = {http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf}}

0 comments on commit c593285

Please sign in to comment.