Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Kernel shap improvements #5187

Merged
merged 7 commits into from
Feb 8, 2023

Conversation

vinaydes
Copy link
Contributor

@vinaydes vinaydes commented Feb 1, 2023

Removed slow modulo operator by minor change in index arithmetic. This gave me following performance improvement for a test case:

branch-23.02 kernel-shap-improvments Gain
sampled_rows_kernel 663 193 3.4x
exact_rows_kernel 363 236 1.5x

All times in microseconds.

Code used for benchmarking:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor as rf
from cuml.explainer import KernelExplainer

import numpy as np

data, labels = make_classification(n_samples=1000, n_features=20, n_informative=20,  random_state=42,
  n_redundant=0, n_repeated=0)

X_train, X_test, y_train, y_test = train_test_split(data, labels, train_size=998,
                                                    random_state=42) #sklearn train_test_split
y_train = np.ravel(y_train)
y_test = np.ravel(y_test)

model = rf(random_state=42).fit(X_train, y_train)
cu_explainer = KernelExplainer(model=model.predict, data=X_train, is_gpu_model=False, random_state=42, nsamples=100)
cu_shap_values = cu_explainer.shap_values(X_test)
print('cu_shap:', cu_shap_values)

@vinaydes vinaydes requested a review from a team as a code owner February 1, 2023 09:57
@vinaydes
Copy link
Contributor Author

vinaydes commented Feb 1, 2023

I'll take a look at CI failures.

@vinaydes vinaydes changed the title Kernel shap improvements [WIP] Kernel shap improvements Feb 1, 2023
@vinaydes vinaydes changed the title [WIP] Kernel shap improvements [REVIEW] Kernel shap improvements Feb 2, 2023
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.04@b26c212). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-23.04    #5187   +/-   ##
===============================================
  Coverage                ?   67.12%           
===============================================
  Files                   ?      192           
  Lines                   ?    12396           
  Branches                ?        0           
===============================================
  Hits                    ?     8321           
  Misses                  ?     4075           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@vinaydes
Copy link
Contributor Author

vinaydes commented Feb 3, 2023

CI is successful, ready to merge.

@dantegd
Copy link
Member

dantegd commented Feb 7, 2023

Changes look great! Just merging branch-23.04 into the PR and will merge after CI runs. There's a new CI check that fails if a PR is 5+ commits behind the target branch, we might increase that threshold.

@dantegd dantegd added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 7, 2023
@dantegd
Copy link
Member

dantegd commented Feb 8, 2023

/merge

@rapids-bot rapids-bot bot merged commit bd138d8 into rapidsai:branch-23.04 Feb 8, 2023
AyodeAwe pushed a commit to AyodeAwe/cuml that referenced this pull request Feb 13, 2023
Removed slow modulo operator by minor change in index arithmetic. This gave me following performance improvement for a test case:

|                         | branch-23.02     |kernel-shap-improvments  | Gain |
|-------------------------|------------------|-------------------------|------|
| sampled_rows_kernel     | 663              | 193                     | 3.4x |
| exact_rows_kernel       | 363              | 236                     | 1.5x |

All times in microseconds.

Code used for benchmarking:
```python
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor as rf
from cuml.explainer import KernelExplainer

import numpy as np

data, labels = make_classification(n_samples=1000, n_features=20, n_informative=20,  random_state=42,
  n_redundant=0, n_repeated=0)

X_train, X_test, y_train, y_test = train_test_split(data, labels, train_size=998,
                                                    random_state=42) #sklearn train_test_split
y_train = np.ravel(y_train)
y_test = np.ravel(y_test)

model = rf(random_state=42).fit(X_train, y_train)
cu_explainer = KernelExplainer(model=model.predict, data=X_train, is_gpu_model=False, random_state=42, nsamples=100)
cu_shap_values = cu_explainer.shap_values(X_test)
print('cu_shap:', cu_shap_values)

```

Authors:
  - Vinay Deshpande (https://github.com/vinaydes)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5187
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA/C++ improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants