SHAP explanation for XGBSEKaplanTree or bootstrapestimator. #58

hellorp1990 · 2022-08-02T13:56:58Z

Hi,
Is it possible to use SHAP with XGBSEKaplanTree or bootstrapestimator.
SHAP treeexplainer is not working with them. Permutationexplainer seems to start evaluating but ended up with error "ValueError: max_evals=1785 is too low for the Permutation explainer, it must be at least 2 * num_features + 1 = 1799!"

I am not sure how to fix this error.
If anyone can point me in the right direction, it will be really helpful.
THank you in advance.

davivieirab · 2022-08-02T17:24:40Z

Hi @hellorp1990 . Could you provide a code example explaining how are you trying to use XGBSE with SHAP?
Are you trying to use the whole survival curve as your target or have you transformed the predict function to output a single value response?

hellorp1990 · 2022-08-02T17:27:02Z

@davivieirab
My model:
xgbse_model = XGBSEKaplanTree(params)
bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100)

Shap model:

shap_values = shap.Explainer(bootstrap_estimator.predict, data,feature_names=feature_names,max_evals=2000)
shaps = shap_values(data)

hellorp1990 · 2022-08-02T17:29:27Z

@davivieirab if i dont use the max_evals in the shap.explainer, it wont run at all. with max_evals=2000, the shap was running but it was showing 10hrs projected time to finish.

My database size was 330 rows and 900 columns and I was doing train-test split (25% for test).

davivieirab · 2022-08-07T23:01:03Z

@hellorp1990 , the output of XGBSEBootstrapEstimator is a multi-output regression problem, so for each sample you get a whole survival function with a probability of survival for each time bucket evaluated.
Consequently, for each sample you will have an array of shap values (one value for each feature) for each time period.

Find a code example below - references: SHAP values for multi-output problems, using KernelSHAP with XGBoost:

import pandas as pd
import shap
from xgbse import XGBSEKaplanTree, XGBSEBootstrapEstimator

xgbse_model = XGBSEKaplanTree(your_params)
bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100)

columns = X_train.columns

## kernel shap sends data as numpy array which has no column names, so we fix it
## source: https://gist.github.com/noleto/05dfa4a691ebbc8816c035b86d2d00d4#file-shap_xgboost-py-L46
def xgbse_predict(data_asarray):
    data_asframe =  pd.DataFrame(data_asarray, columns=columns)
    return bootstrap_estimator.predict(data_asframe)


#### Kernel SHAP
shap_kernel_explainer = shap.KernelExplainer(xgbse_predict, X_train.head(100))

# Explain a single instance - output: (1, n_time_buckets, n_features)
shap_one = shap_kernel_explainer.shap_values(X_train.iloc[0])

# Get explanations for the first time bucket
first_time_bucket_shap_values = pd.Series(shap_one[0])

# Print shap values for the first time bucket and the corresponding features
print(pd.concat([first_time_bucket_shap_values, pd.Series(columns)], axis=1))

You will get something like (for the first time bucket):

shap_value	feature
0.001919	x0
0.006411	x1
0.000411	x2
0.002464	x3
0.000239	x4
0.000893	x5
0.002441	x6
0.000117	x7
0.009901	x8

davivieirab · 2022-08-07T23:04:29Z

As an action item we will add a notebook with a brief documentation on how to use SHAP with the XGBSE lib

yangwei1993 · 2022-10-10T01:28:58Z

hello,
davivieirab, have you added documentaion for how to use SHAP with XGBSE? when I use my code to run in the way you mentioned above, it runs into error. The following is my code:
from xgbse import XGBSEDebiasedBCE

fitting xgbse model

xgbse_model = XGBSEDebiasedBCE()
xgbse_model.fit(X_train, y_train, time_bins=TIME_BINS)

predicting

y_pred = xgbse_model.predict(X_test)

import shap
from xgbse import XGBSEKaplanTree, XGBSEBootstrapEstimator

kernel shap sends data as numpy array which has no column names, so we fix it

source: https://gist.github.com/noleto/05dfa4a691ebbc8816c035b86d2d00d4#file-shap_xgboost-py-L46

bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100)
def xgbse_predict(data_asarray):
data_asframe = pd.DataFrame(data_asarray, columns=columns)
return bootstrap_estimator.predict(data_asframe)
columns = X_train.columns
shap_kernel_explainer = shap.KernelExplainer(xgbse_predict, X_train)

Kernel SHAP

Explain a single instance - output: (1, n_time_buckets, n_features)

shap_one = shap_kernel_explainer.shap_values(X_train.iloc[0])

Get explanations for the first time bucket

first_time_bucket_shap_values = pd.Series(shap_one[0])
print(pd.concat([first_time_bucket_shap_values, pd.Series(columns)], axis=1))

Error report：
Provided model function fails when applied to the provided data set.
'XGBSEBootstrapEstimator' object has no attribute 'estimators_'

hellorp1990 added the enhancement New feature or request label Aug 2, 2022

davivieirab added question Further information is requested and removed enhancement New feature or request labels Aug 2, 2022

davivieirab added documentation Improvements or additions to documentation and removed question Further information is requested labels Aug 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SHAP explanation for XGBSEKaplanTree or bootstrapestimator. #58

SHAP explanation for XGBSEKaplanTree or bootstrapestimator. #58

hellorp1990 commented Aug 2, 2022

davivieirab commented Aug 2, 2022 •

edited

Loading

hellorp1990 commented Aug 2, 2022

hellorp1990 commented Aug 2, 2022

davivieirab commented Aug 7, 2022 •

edited

Loading

davivieirab commented Aug 7, 2022

yangwei1993 commented Oct 10, 2022

SHAP explanation for XGBSEKaplanTree or bootstrapestimator. #58

SHAP explanation for XGBSEKaplanTree or bootstrapestimator. #58

Comments

hellorp1990 commented Aug 2, 2022

davivieirab commented Aug 2, 2022 • edited Loading

hellorp1990 commented Aug 2, 2022

hellorp1990 commented Aug 2, 2022

davivieirab commented Aug 7, 2022 • edited Loading

davivieirab commented Aug 7, 2022

yangwei1993 commented Oct 10, 2022

fitting xgbse model

predicting

kernel shap sends data as numpy array which has no column names, so we fix it

source: https://gist.github.com/noleto/05dfa4a691ebbc8816c035b86d2d00d4#file-shap_xgboost-py-L46

Kernel SHAP

Explain a single instance - output: (1, n_time_buckets, n_features)

Get explanations for the first time bucket

davivieirab commented Aug 2, 2022 •

edited

Loading

davivieirab commented Aug 7, 2022 •

edited

Loading