-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can Autosklearn be used with SHAP? #1272
Comments
Hi @roger-yu-ds, Looking at I would suggest using their model agnostic Explainer. estimator = AutoSklearnClassifier(...)
explainer = shap.KernelExplainer(estimator.predict_proba, shap.sample(X_train, 128)) I would also advise updating |
You can also find an example in this notebook: https://github.com/automl/auto-sklearn-talks/blob/main/2021_07_28_EuroPython/Tutorial-Regression.ipynb |
We should give an example, also inspired by this AG example that demonstrates usage with categorical data. |
Unfortunately, Your method won't work with cateogrical data Here I described my problem: |
Hey @Unn20, thanks for sharing the issue with us and raising it with them. It would be good to have test suites for external tool usage with auto-sklearn at some point so we can catch them and point to them too. |
I ran into the same issue with trying to use a pandas trained model with categorical data in shap. This is the workaround that I landed on, which amounts to encoding categorical columns as floats before passing to autosklearn. One thing that would potentially simplify this is placing everything in an sklearn Pipeline (i.e. ColumnTransformer -> FunctionTransformer for remapping columns -> autosklearn Model), I'm not sure if that would put you back to square one with shap issues though. from autosklearn.classification import AutoSklearnClassifier
from sklearn.datasets import fetch_openml
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from category_encoders import OrdinalEncoder
import numpy as np
import shap
def main():
# Note that this dataset has categorical features that are already numerically encoded,
# so we technically don't need to do any of this. However it still works as an illustrative example
bunch = fetch_openml(data_id=40981, as_frame=True)
y = bunch["target"]
X = bunch["data"]
# Fit and transform our data using category_encoders.OrdinalEncoder. This is a more convenient implementation than
# sklearn's, especially given the outdated version of sklearn required for auto-sklearn
cat_cols = [c for c in X.select_dtypes(['category']).columns]
enc = OrdinalEncoder(cols=cat_cols)
X_trans = enc.fit_transform(X)
# Now we can transform our dataframes to numpy arrays to pass to autosklearn
Xnp = X_trans.to_numpy(dtype=np.float64)
ynp = y.to_numpy(dtype=np.float64)
X_train, X_test, y_train, y_test = train_test_split(Xnp, ynp, random_state=1)
# List to tell autosklearn which columns are categorical features. This may change depending on
# if you reorder columns post encoding
feat_type = ["Categorical" if x.name == "category" else "Numerical" for x in X.dtypes]
cls = AutoSklearnClassifier(
time_left_for_this_task=120,
per_run_time_limit=30,
# Required on OSX otherwise autosklearn crashes
memory_limit=None,
)
cls.fit(X_train, y_train, X_test, y_test, feat_type=feat_type)
yhat = cls.predict(X_test)
print('Model accuracy: ', accuracy_score(y_test, yhat))
# Now to show that it works in shap. This is not the optimal way to explain this dataset
# with shap as it took ~10 minutes to run. You'll want to adjust based on your use case
# and dataset size
explainer = shap.KernelExplainer(
cls.predict_proba,
shap.kmeans(X_train, k=10),
feature_names=X.columns.values,
)
shap_values = explainer.shap_values(X_test[:50])
shap.summary_plot(shap_values, X_test[:50], feature_names=X.columns.values)
if __name__ == '__main__':
main()
This was run with the following versions on macOS 12.2.1 on the M1 chipset:
Edit: Updated code to use |
Can the model be used with SHAP?
Currently
results in
System Details (if relevant)
auto-sklearn
0.12.7shap
0.38.1Running on Linux?
The text was updated successfully, but these errors were encountered: