sklearnex crashes using GridSearch + SVR with small datasets. #1046

dguijo · 2022-09-23T13:10:22Z

First of all, thank you for your hard work. It is a great package.

Describe the bug
Using GridSearchCV with SVR as base estimator crashes for some datasets (only with small datasets) when intelex is enabled.
I have run some other regressors such as RandomForestRegressor and everything works fine. So the bug comes from using GridSearchCV + SVR.

If intelex is not enabled, it works perfectly.

To Reproduce
python 3.10.4 h12debd9_0
scikit-learn 1.1.1 py310h6a678d5_0
scikit-learn-intelex 2021.6.3 pypi_0 pypi

How to reproduce
A minimal code to reproduce the crash is here as well as the dataset used (which is attached dataset.csv). I have applied bigger datasets (more samples and more features) and the code works perfectly (with intelex enabled).

I enable intelex using:

from sklearnex import patch_sklearn
patch_sklearn()

as can be seen in the lines commented at the top.

import numpy as np

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error
#from sklearnex import patch_sklearn
#patch_sklearn()
from sklearn.svm import SVR

dataset = np.genfromtxt('dataset.csv', delimiter=',')
x_train = dataset[:140, :-1]
x_test = dataset[140:, :-1]
y_train = dataset[:140, -1]
y_test = dataset[140:, -1]

cv_params = {"kernel": ["rbf", "sigmoid"], "gamma": [0.0001, 0.001, 0.01, 0.1, 1], "C": np.logspace(-4, 4, 10)}

search = GridSearchCV(SVR(), cv_params, n_jobs=-1, cv=5, scoring="neg_mean_squared_error")

search.fit(x_train, y_train)
y_pred = search.predict(x_test)

print(mean_squared_error(y_test, y_pred))

Expected behavior
The expected behavior is to finish the fit of the GridSearchCV using SVR. If intelex is not enabled, the code works perfectly and the result obtained (MSE) is 0.0043348505108373745. However, when intelex is enabled it crashes, giving the following warnings:

First a warning is issued:

warnings.warn(
/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:776: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/_device_offload.py", line 182, in wrapper
    result = func(self, *args, **kwargs)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/svm/svr.py", line 106, in predict
    return dispatch(self, 'svm.SVR.predict', {
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/_device_offload.py", line 159, in dispatch
    return branches[backend](obj, *hostargs, **hostkwargs, queue=q)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/svm/svr.py", line 139, in _onedal_predict
    return self._onedal_estimator.predict(X, queue=queue)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/onedal/svm/svm.py", line 366, in predict
    y = super()._predict(X, _backend.svm.regression, queue)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/onedal/svm/svm.py", line 281, in _predict
    result = module.infer(policy, params, model, to_table(X))
ValueError: Input model support vectors are empty

  warnings.warn(
/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/model_selection/_search.py:953: UserWarning: One or more of the test scores are non-finite: [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan]
  warnings.warn(

followed by a ValueError

Traceback (most recent call last):
  File "/home/dguijo/time-series-regression/error.py", line 20, in <module>
    y_pred = search.predict(x_test)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 500, in predict
    return self.best_estimator_.predict(X)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packa
[dataset.csv](https://github.com/intel/scikit-learn-intelex/files/9633836/dataset.csv)
ges/sklearnex/_device_offload.py", line 182, in wrapper
    result = func(self, *args, **kwargs)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/svm/svr.py", line 106, in predict
    return dispatch(self, 'svm.SVR.predict', {
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/_device_offload.py", line 159, in dispatch
    return branches[backend](obj, *hostargs, **hostkwargs, queue=q)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/sklearnex/svm/svr.py", line 139, in _onedal_predict
    return self._onedal_estimator.predict(X, queue=queue)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/onedal/svm/svm.py", line 366, in predict
    y = super()._predict(X, _backend.svm.regression, queue)
  File "/home/dguijo/anaconda3/envs/TSR/lib/python3.10/site-packages/onedal/svm/svm.py", line 281, in _predict
    result = module.infer(policy, params, model, to_table(X))
ValueError: Input model support vectors are empty

It appears the predictions cannot be computed because model fit did not produce any support vectors.

Environment:

Checked on Linux Mint.

The text was updated successfully, but these errors were encountered:

MMYusuf · 2022-09-27T09:55:08Z

I have the same ValueError with skopt.BayesSearchCV and SVR. sklearnex crashes even though my dataset is larger ( ~1e4 samples of 2 features)

syakov-intel · 2023-04-20T10:28:50Z

@dguijo my apologies for the late response and thank you for a very detailed issue description! We'll investigate it and keep you updated.

Alexsandruss · 2023-07-13T12:43:30Z

This issue is related to SVM application for this dataset in general: both default sklearn and sklearnex produce empty support vectors for any kernel. The difference is in handling of this situation: sklearn outputs constant prediction equal to svr._intercept_ while sklearnex fails with error.

svr = SVR().fit(x_train, y_train)
svr.predict(x_test)

Default sklearn output:

array([0.08823529, 0.08823529, 0.08823529, 0.08823529, 0.08823529,
...

Sklearnex output:

ValueError: Input model support vectors are empty

MSE and R2 metrics for sklearn

If intelex is not enabled, the code works perfectly and the result obtained (MSE) is 0.0043348505108373745

0.0043 MSE value is extremely high for provided dataset with next y distribution:

-1.1768929781264066 R2 score for sklearn trained SVM shows no valuable result is achieved.

RandomForest demonstrates more meaningful result for this dataset:

rfr = RandomForestRegressor(n_estimators=1000, max_features='sqrt', random_state=42).fit(x_train, y_train)
r2_score(y_test, rfr.predict(x_test)), mean_squared_error(y_test, rfr.predict(x_test))

Output:

(0.16760525001675575, 0.0016575490129463269)

dguijo mentioned this issue Sep 23, 2022

Crashing when running sklearnex in a GridSearchCV fit of a SVR model with TransformedTargetRegressor #1027

Open

Alexsandruss added the bug Something isn't working label Sep 23, 2022

Alexsandruss closed this as completed Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sklearnex crashes using GridSearch + SVR with small datasets. #1046

sklearnex crashes using GridSearch + SVR with small datasets. #1046

dguijo commented Sep 23, 2022 •

edited

Loading

MMYusuf commented Sep 27, 2022

syakov-intel commented Apr 20, 2023

Alexsandruss commented Jul 13, 2023

sklearnex crashes using GridSearch + SVR with small datasets. #1046

sklearnex crashes using GridSearch + SVR with small datasets. #1046

Comments

dguijo commented Sep 23, 2022 • edited Loading

MMYusuf commented Sep 27, 2022

syakov-intel commented Apr 20, 2023

Alexsandruss commented Jul 13, 2023

Default sklearn output:

Sklearnex output:

MSE and R2 metrics for sklearn

dguijo commented Sep 23, 2022 •

edited

Loading