Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Autosklearn be used with SHAP? #1272

Open
roger-yu-ds opened this issue Oct 28, 2021 · 6 comments
Open

Can Autosklearn be used with SHAP? #1272

roger-yu-ds opened this issue Oct 28, 2021 · 6 comments
Labels
bug documentation Something to be documented

Comments

@roger-yu-ds
Copy link

roger-yu-ds commented Oct 28, 2021

Can the model be used with SHAP?

Currently

import shap
explainer = shap.Explainer(model)

results in

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-361-5c713ec694d6> in <module>
----> 1 explainer = shap.Explainer(model)

~/anaconda3/envs/python3/lib/python3.6/site-packages/shap/explainers/_explainer.py in __init__(self, model, masker, link, algorithm, output_names, feature_names, **kwargs)
    145                 # if we get here then we don't know how to handle what was given to us
    146                 else:
--> 147                     raise Exception("The passed model is not callable and cannot be analyzed directly with the given masker! Model: " + str(model))
    148 
    149             # build the right subclass

Exception: The passed model is not callable and cannot be analyzed directly with the given masker! Model: AutoSklearn2Classifier(delete_output_folder_after_terminate=False,
                       ensemble_size=1, memory_limit=7000, metric=f1, n_jobs=8,
                       output_folder='automl4_preds', per_run_time_limit=480,
                       time_left_for_this_task=600)

System Details (if relevant)

auto-sklearn 0.12.7
shap 0.38.1
Running on Linux?

@eddiebergman eddiebergman changed the title [Question] My Question? Can Autosklearn be used with SHAP? Oct 28, 2021
@eddiebergman
Copy link
Contributor

eddiebergman commented Oct 28, 2021

Hi @roger-yu-ds,

Looking at shap source code for Explainer.__init__(), it seems that it will not work out of the box as you have given.

I would suggest using their model agnostic Explainer.

estimator = AutoSklearnClassifier(...) 
explainer = shap.KernelExplainer(estimator.predict_proba, shap.sample(X_train, 128))

I would also advise updating auto-sklearn to 14.0 as we fixed some issues with probability outputs in some scenarios. Hopefully it won't matter but if you face issues with incorrect probability sizes, this should fix it.

@mfeurer
Copy link
Contributor

mfeurer commented Oct 29, 2021

You can also find an example in this notebook: https://github.com/automl/auto-sklearn-talks/blob/main/2021_07_28_EuroPython/Tutorial-Regression.ipynb

@mfeurer mfeurer added documentation Something to be documented and removed Feedback-Required labels Nov 9, 2021
@mfeurer
Copy link
Contributor

mfeurer commented Nov 11, 2021

We should give an example, also inspired by this AG example that demonstrates usage with categorical data.

@Unn20
Copy link

Unn20 commented May 2, 2022

Hi @eddiebergman

Unfortunately, Your method won't work with cateogrical data

Here I described my problem:
shap/shap#2530

@eddiebergman
Copy link
Contributor

Hey @Unn20, thanks for sharing the issue with us and raising it with them. It would be good to have test suites for external tool usage with auto-sklearn at some point so we can catch them and point to them too.

@eliwoods
Copy link

eliwoods commented Jan 13, 2023

I ran into the same issue with trying to use a pandas trained model with categorical data in shap. This is the workaround that I landed on, which amounts to encoding categorical columns as floats before passing to autosklearn. One thing that would potentially simplify this is placing everything in an sklearn Pipeline (i.e. ColumnTransformer -> FunctionTransformer for remapping columns -> autosklearn Model), I'm not sure if that would put you back to square one with shap issues though.

from autosklearn.classification import AutoSklearnClassifier
from sklearn.datasets import fetch_openml
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from category_encoders import OrdinalEncoder
import numpy as np
import shap


def main():
    # Note that this dataset has categorical features that are already numerically encoded,
    # so we technically don't need to do any of this. However it still works as an illustrative example
    bunch = fetch_openml(data_id=40981, as_frame=True)
    y = bunch["target"]
    X = bunch["data"]

    # Fit and transform our data using category_encoders.OrdinalEncoder. This is a more convenient implementation than
    # sklearn's, especially given the outdated version of sklearn required for auto-sklearn
    cat_cols = [c for c in X.select_dtypes(['category']).columns]
    enc = OrdinalEncoder(cols=cat_cols)
    X_trans = enc.fit_transform(X)

    # Now we can transform our dataframes to numpy arrays to pass to autosklearn
    Xnp = X_trans.to_numpy(dtype=np.float64)
    ynp = y.to_numpy(dtype=np.float64)

    X_train, X_test, y_train, y_test = train_test_split(Xnp, ynp, random_state=1)

    # List to tell autosklearn which columns are categorical features. This may change depending on
    # if you reorder columns post encoding
    feat_type = ["Categorical" if x.name == "category" else "Numerical" for x in X.dtypes]
    cls = AutoSklearnClassifier(
        time_left_for_this_task=120,
        per_run_time_limit=30,
        # Required on OSX otherwise autosklearn crashes
        memory_limit=None,
    )
    cls.fit(X_train, y_train, X_test, y_test, feat_type=feat_type)

    yhat = cls.predict(X_test)
    print('Model accuracy: ', accuracy_score(y_test, yhat))

    # Now to show that it works in shap. This is not the optimal way to explain this dataset
    # with shap as it took ~10 minutes to run. You'll want to adjust based on your use case
    # and dataset size
    explainer = shap.KernelExplainer(
        cls.predict_proba,
        shap.kmeans(X_train, k=10),
        feature_names=X.columns.values,
    )
    shap_values = explainer.shap_values(X_test[:50])
    shap.summary_plot(shap_values, X_test[:50], feature_names=X.columns.values)


if __name__ == '__main__':
    main()
The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
Model accuracy:  0.8728323699421965
100%|██████████| 50/50 [10:05<00:00, 12.11s/it]

Process finished with exit code 0

image

This was run with the following versions on macOS 12.2.1 on the M1 chipset:

auto-sklearn==0.14.7
shap==0.41.0
category-encoders==2.5.1.post0

Edit: Updated code to use category_encoders.OrdinalEncoder which simplifies the transformation step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug documentation Something to be documented
Projects
None yet
Development

No branches or pull requests

5 participants