-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHAP explanation for XGBSEKaplanTree or bootstrapestimator. #58
Comments
Hi @hellorp1990 . Could you provide a code example explaining how are you trying to use XGBSE with SHAP? |
@davivieirab Shap model: shap_values = shap.Explainer(bootstrap_estimator.predict, data,feature_names=feature_names,max_evals=2000) |
@davivieirab if i dont use the max_evals in the shap.explainer, it wont run at all. with max_evals=2000, the shap was running but it was showing 10hrs projected time to finish. My database size was 330 rows and 900 columns and I was doing train-test split (25% for test). |
@hellorp1990 , the output of XGBSEBootstrapEstimator is a multi-output regression problem, so for each sample you get a whole survival function with a probability of survival for each time bucket evaluated. Find a code example below - references: SHAP values for multi-output problems, using KernelSHAP with XGBoost: import pandas as pd
import shap
from xgbse import XGBSEKaplanTree, XGBSEBootstrapEstimator
xgbse_model = XGBSEKaplanTree(your_params)
bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100)
columns = X_train.columns
## kernel shap sends data as numpy array which has no column names, so we fix it
## source: https://gist.github.com/noleto/05dfa4a691ebbc8816c035b86d2d00d4#file-shap_xgboost-py-L46
def xgbse_predict(data_asarray):
data_asframe = pd.DataFrame(data_asarray, columns=columns)
return bootstrap_estimator.predict(data_asframe)
#### Kernel SHAP
shap_kernel_explainer = shap.KernelExplainer(xgbse_predict, X_train.head(100))
# Explain a single instance - output: (1, n_time_buckets, n_features)
shap_one = shap_kernel_explainer.shap_values(X_train.iloc[0])
# Get explanations for the first time bucket
first_time_bucket_shap_values = pd.Series(shap_one[0])
# Print shap values for the first time bucket and the corresponding features
print(pd.concat([first_time_bucket_shap_values, pd.Series(columns)], axis=1)) You will get something like (for the first time bucket):
|
As an action item we will add a notebook with a brief documentation on how to use SHAP with the XGBSE lib |
hello, fitting xgbse modelxgbse_model = XGBSEDebiasedBCE() predictingy_pred = xgbse_model.predict(X_test) import shap kernel shap sends data as numpy array which has no column names, so we fix itsource: https://gist.github.com/noleto/05dfa4a691ebbc8816c035b86d2d00d4#file-shap_xgboost-py-L46bootstrap_estimator = XGBSEBootstrapEstimator(xgbse_model, n_estimators=100) Kernel SHAPExplain a single instance - output: (1, n_time_buckets, n_features)shap_one = shap_kernel_explainer.shap_values(X_train.iloc[0]) Get explanations for the first time bucketfirst_time_bucket_shap_values = pd.Series(shap_one[0]) Error report: |
Hi,
Is it possible to use SHAP with XGBSEKaplanTree or bootstrapestimator.
SHAP treeexplainer is not working with them. Permutationexplainer seems to start evaluating but ended up with error "ValueError: max_evals=1785 is too low for the Permutation explainer, it must be at least 2 * num_features + 1 = 1799!"
I am not sure how to fix this error.
If anyone can point me in the right direction, it will be really helpful.
THank you in advance.
The text was updated successfully, but these errors were encountered: