Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHAP explanation for XGBSEKaplanTree or bootstrapestimator. #58

Open
hellorp1990 opened this issue Aug 2, 2022 · 6 comments
Open

SHAP explanation for XGBSEKaplanTree or bootstrapestimator. #58

hellorp1990 opened this issue Aug 2, 2022 · 6 comments
Labels
documentation Improvements or additions to documentation

Comments

@hellorp1990
Copy link

Hi,
Is it possible to use SHAP with XGBSEKaplanTree or bootstrapestimator.
SHAP treeexplainer is not working with them. Permutationexplainer seems to start evaluating but ended up with error "ValueError: max_evals=1785 is too low for the Permutation explainer, it must be at least 2 * num_features + 1 = 1799!"

I am not sure how to fix this error.
If anyone can point me in the right direction, it will be really helpful.
THank you in advance.

@hellorp1990 hellorp1990 added the enhancement New feature or request label Aug 2, 2022
@davivieirab
Copy link
Contributor

davivieirab commented Aug 2, 2022

Hi @hellorp1990 . Could you provide a code example explaining how are you trying to use XGBSE with SHAP?
Are you trying to use the whole survival curve as your target or have you transformed the predict function to output a single value response?

@davivieirab davivieirab added question Further information is requested and removed enhancement New feature or request labels Aug 2, 2022
@hellorp1990
Copy link
Author

@davivieirab
My model:
xgbse_model = XGBSEKaplanTree(params)
bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100)

Shap model:

shap_values = shap.Explainer(bootstrap_estimator.predict, data,feature_names=feature_names,max_evals=2000)
shaps = shap_values(data)

@hellorp1990
Copy link
Author

@davivieirab if i dont use the max_evals in the shap.explainer, it wont run at all. with max_evals=2000, the shap was running but it was showing 10hrs projected time to finish.

My database size was 330 rows and 900 columns and I was doing train-test split (25% for test).

@davivieirab
Copy link
Contributor

davivieirab commented Aug 7, 2022

@hellorp1990 , the output of XGBSEBootstrapEstimator is a multi-output regression problem, so for each sample you get a whole survival function with a probability of survival for each time bucket evaluated.
Consequently, for each sample you will have an array of shap values (one value for each feature) for each time period.

Find a code example below - references: SHAP values for multi-output problems, using KernelSHAP with XGBoost:

import pandas as pd
import shap
from xgbse import XGBSEKaplanTree, XGBSEBootstrapEstimator

xgbse_model = XGBSEKaplanTree(your_params)
bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100)

columns = X_train.columns

## kernel shap sends data as numpy array which has no column names, so we fix it
## source: https://gist.github.com/noleto/05dfa4a691ebbc8816c035b86d2d00d4#file-shap_xgboost-py-L46
def xgbse_predict(data_asarray):
    data_asframe =  pd.DataFrame(data_asarray, columns=columns)
    return bootstrap_estimator.predict(data_asframe)


#### Kernel SHAP
shap_kernel_explainer = shap.KernelExplainer(xgbse_predict, X_train.head(100))

# Explain a single instance - output: (1, n_time_buckets, n_features)
shap_one = shap_kernel_explainer.shap_values(X_train.iloc[0])

# Get explanations for the first time bucket
first_time_bucket_shap_values = pd.Series(shap_one[0])

# Print shap values for the first time bucket and the corresponding features
print(pd.concat([first_time_bucket_shap_values, pd.Series(columns)], axis=1))

You will get something like (for the first time bucket):

shap_value feature
0.001919 x0
0.006411 x1
0.000411 x2
0.002464 x3
0.000239 x4
0.000893 x5
0.002441 x6
0.000117 x7
0.009901 x8

@davivieirab
Copy link
Contributor

As an action item we will add a notebook with a brief documentation on how to use SHAP with the XGBSE lib

@davivieirab davivieirab added documentation Improvements or additions to documentation and removed question Further information is requested labels Aug 7, 2022
@yangwei1993
Copy link

hello,
davivieirab, have you added documentaion for how to use SHAP with XGBSE? when I use my code to run in the way you mentioned above, it runs into error. The following is my code:
from xgbse import XGBSEDebiasedBCE

fitting xgbse model

xgbse_model = XGBSEDebiasedBCE()
xgbse_model.fit(X_train, y_train, time_bins=TIME_BINS)

predicting

y_pred = xgbse_model.predict(X_test)

import shap
from xgbse import XGBSEKaplanTree, XGBSEBootstrapEstimator

kernel shap sends data as numpy array which has no column names, so we fix it

source: https://gist.github.com/noleto/05dfa4a691ebbc8816c035b86d2d00d4#file-shap_xgboost-py-L46

bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100)
def xgbse_predict(data_asarray):
data_asframe = pd.DataFrame(data_asarray, columns=columns)
return bootstrap_estimator.predict(data_asframe)
columns = X_train.columns
shap_kernel_explainer = shap.KernelExplainer(xgbse_predict, X_train)

Kernel SHAP

Explain a single instance - output: (1, n_time_buckets, n_features)

shap_one = shap_kernel_explainer.shap_values(X_train.iloc[0])

Get explanations for the first time bucket

first_time_bucket_shap_values = pd.Series(shap_one[0])
print(pd.concat([first_time_bucket_shap_values, pd.Series(columns)], axis=1))

Error report:
Provided model function fails when applied to the provided data set.
'XGBSEBootstrapEstimator' object has no attribute 'estimators_'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants