Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearnex crashes using GridSearch + SVR with small datasets. #1046

Closed
dguijo opened this issue Sep 23, 2022 · 3 comments
Closed

sklearnex crashes using GridSearch + SVR with small datasets. #1046

dguijo opened this issue Sep 23, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@dguijo
Copy link

dguijo commented Sep 23, 2022

First of all, thank you for your hard work. It is a great package.

Describe the bug
Using GridSearchCV with SVR as base estimator crashes for some datasets (only with small datasets) when intelex is enabled.
I have run some other regressors such as RandomForestRegressor and everything works fine. So the bug comes from using GridSearchCV + SVR.

If intelex is not enabled, it works perfectly.

To Reproduce
python 3.10.4 h12debd9_0
scikit-learn 1.1.1 py310h6a678d5_0
scikit-learn-intelex 2021.6.3 pypi_0 pypi

How to reproduce
A minimal code to reproduce the crash is here as well as the dataset used (which is attached dataset.csv). I have applied bigger datasets (more samples and more features) and the code works perfectly (with intelex enabled).

I enable intelex using:

from sklearnex import patch_sklearn
patch_sklearn()

as can be seen in the lines commented at the top.

import numpy as np

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error
#from sklearnex import patch_sklearn
#patch_sklearn()
from sklearn.svm import SVR

dataset = np.genfromtxt('dataset.csv', delimiter=',')
x_train = dataset[:140, :-1]
x_test = dataset[140:, :-1]
y_train = dataset[:140, -1]
y_test = dataset[140:, -1]

cv_params = {"kernel": ["rbf", "sigmoid"], "gamma": [0.0001, 0.001, 0.01, 0.1, 1], "C": np.logspace(-4, 4, 10)}

search = GridSearchCV(SVR(), cv_params, n_jobs=-1, cv=5, scoring="neg_mean_squared_error")

search.fit(x_train, y_train)
y_pred = search.predict(x_test)

print(mean_squared_error(y_test, y_pred))

Expected behavior
The expected behavior is to finish the fit of the GridSearchCV using SVR. If intelex is not enabled, the code works perfectly and the result obtained (MSE) is 0.0043348505108373745. However, when intelex is enabled it crashes, giving the following warnings:

First a warning is issued:

warnings.warn(
/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/_device_offload.py", line 182, in wrapper
    result = func(self, *args, **kwargs)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/svm/svr.py", line 106, in predict
    return dispatch(self, 'svm.SVR.predict', {
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/_device_offload.py", line 159, in dispatch
    return branches[backend](obj, *hostargs, **hostkwargs, queue=q)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/svm/svr.py", line 139, in _onedal_predict
    return self._onedal_estimator.predict(X, queue=queue)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/onedal/svm/svm.py", line 366, in predict
    y = super()._predict(X, _backend.svm.regression, queue)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/onedal/svm/svm.py", line 281, in _predict
    result = module.infer(policy, params, model, to_table(X))
ValueError: Input model support vectors are empty

  warnings.warn(
/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/model_selection/_search.py:953: UserWarning: One or more of the test scores are non-finite: [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan]
  warnings.warn(

followed by a ValueError

Traceback (most recent call last):
  File "/home/dguijo/time-series-regression/error.py", line 20, in <module>
    y_pred = search.predict(x_test)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 500, in predict
    return self.best_estimator_.predict(X)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packa
[dataset.csv](https://github.com/intel/scikit-learn-intelex/files/9633836/dataset.csv)
ges/sklearnex/_device_offload.py", line 182, in wrapper
    result = func(self, *args, **kwargs)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/svm/svr.py", line 106, in predict
    return dispatch(self, 'svm.SVR.predict', {
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/_device_offload.py", line 159, in dispatch
    return branches[backend](obj, *hostargs, **hostkwargs, queue=q)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/svm/svr.py", line 139, in _onedal_predict
    return self._onedal_estimator.predict(X, queue=queue)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/onedal/svm/svm.py", line 366, in predict
    y = super()._predict(X, _backend.svm.regression, queue)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/onedal/svm/svm.py", line 281, in _predict
    result = module.infer(policy, params, model, to_table(X))
ValueError: Input model support vectors are empty

It appears the predictions cannot be computed because model fit did not produce any support vectors.

Environment:

Checked on Linux Mint.

@MMYusuf
Copy link

MMYusuf commented Sep 27, 2022

I have the same ValueError with skopt.BayesSearchCV and SVR. sklearnex crashes even though my dataset is larger ( ~1e4 samples of 2 features)

@syakov-intel
Copy link

@dguijo my apologies for the late response and thank you for a very detailed issue description! We'll investigate it and keep you updated.

@Alexsandruss
Copy link
Contributor

This issue is related to SVM application for this dataset in general: both default sklearn and sklearnex produce empty support vectors for any kernel. The difference is in handling of this situation: sklearn outputs constant prediction equal to svr._intercept_ while sklearnex fails with error.

svr = SVR().fit(x_train, y_train)
svr.predict(x_test)

Default sklearn output:

array([0.08823529, 0.08823529, 0.08823529, 0.08823529, 0.08823529,
...

Sklearnex output:

ValueError: Input model support vectors are empty

MSE and R2 metrics for sklearn

If intelex is not enabled, the code works perfectly and the result obtained (MSE) is 0.0043348505108373745

0.0043 MSE value is extremely high for provided dataset with next y distribution:

image

-1.1768929781264066 R2 score for sklearn trained SVM shows no valuable result is achieved.

RandomForest demonstrates more meaningful result for this dataset:

rfr = RandomForestRegressor(n_estimators=1000, max_features='sqrt', random_state=42).fit(x_train, y_train)
r2_score(y_test, rfr.predict(x_test)), mean_squared_error(y_test, rfr.predict(x_test))

Output:

(0.16760525001675575, 0.0016575490129463269)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants