Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SVR performance problem with linear model #1664

Closed
tfeher opened this issue Feb 12, 2020 · 2 comments · Fixed by #4382
Closed

[BUG] SVR performance problem with linear model #1664

tfeher opened this issue Feb 12, 2020 · 2 comments · Fixed by #4382
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@tfeher
Copy link
Contributor

tfeher commented Feb 12, 2020

Describe the bug
Support vector regression has a performance problem if:

  • we use linear kernel, and
  • it is possible to fit the data with a linear model (e.g. created by make_regression)

The execution time can become very long in these cases, longer than Sklearn's single core LIBSVM method.

Steps/Code to reproduce bug

import cuml.svm 
from sklearn.datasets import make_regression
from timeit import default_timer

X, y = make_regression(n_samples=1300, n_features=200, n_informative=200, random_state=1378)
cumlSVR = cuml.svm.SVR(kernel='linear', gamma='scale', verbose=True)
start = default_timer()
cumlSVR.fit(X, y)
print('Time to fit {:4.1f} s'.format(default_timer()-start))

Output:

SMO solver finished after 237 outer iterations, 2304255 total inner iterations, and diff 0.00099707
Time to fit 27.9 s

The execuction time was measured on a V100. From the number of iterations we can see that most of the inner iterations run until we reach the max_inner_iter limit in SmoBlockSolve.

Expected behavior
For the given problem size a significantly faster execution time is expected. For example fitting the following dataset

X, y = make_friedman1(n_samples=1250, n_features=200, random_state=13745)

with'poly' kernel takes around 0.1 sec, with linear kernel it is around 1 sec. (Note however that this dataset has only 5 informative features).

Environment details (please complete the following information):

  • Environment location: Bare-metal
  • Linux Distro/Architecture: Ubuntu 18.04 amd64
  • GPU Model/Driver:V100 and driver 440.33.01
  • CUDA: 10.1
  • Method of cuDF & cuML install: source
    • If method of install is from source, using cmake 3.14.5 & gcc/g++ 7.3.0 and commit hash 1b6f141 (branch-0.13)

Additional context

  • Comparing the accuray with Sklearn's SVR, they agree up to (or close to) machine precision.
  • Even with linear kernel we can be fast if the data is not linearly separable.
  • SVC seem to work fine: all the examples that I tested run fast even with large number of features.
@tfeher tfeher added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 12, 2020
@beckernick
Copy link
Member

Adding a more recent data point. In the 2020-07-23 nightly from about 2 PM EDT. fit on 506 records seems to take 35+ seconds on a V100.

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from cuml.svm import SVR
import cupy as cp

X, y = load_boston(return_X_y=True)

X = cp.asarray(X)
y = cp.asarray(y)

clf = SVR(kernel='linear')
%time clf.fit(X, y)
[W] [13:58:53.561866] Expected column ('F') major order, but got the opposite. Converting data, this will result in additional memory utilization.
CPU times: user 26.5 s, sys: 10 s, total: 36.5 s
Wall time: 36.5 s

@tfeher
Copy link
Contributor Author

tfeher commented Oct 2, 2020

Note that tho following comment applies here (just replace SVC with SVR): #2857 (comment)

In short, this is not a bug, but a corner case where the SMO solver is slow. Issue #2857 lists a few ideas how to improve that. IMHO best would be to have a dedicated LinearSVR solver. I have added this point to #2773.

Thanks @beckernick for providing the reproducer with the boston dataset, it was very helpful while investigating the problem.

rapids-bot bot pushed a commit that referenced this issue Mar 2, 2021
closes #947 

If the input data for SVM is not normalized correctly, then convergence can be very slow. The solver can even fail to converge. This PR detects such cases and prints a debug message with suggestions how to fix this problem. 

Such problems were reported in #947, #1664, #2857, #3233. The threshold for reporting is set so that the message is printed in those cases. I have tested several properly normalized cases to confirm that the message is not shown. Still, the threshold for printing the message does not have a proper theoretical justification, and false positives might occur. Therefore only a debug message is shown instead of a warning.

Authors:
  - Tamas Bela Feher (@tfeher)

Approvers:
  - Dante Gama Dessavre (@dantegd)

URL: #3562
@rapids-bot rapids-bot bot closed this as completed in #4382 Dec 4, 2021
rapids-bot bot pushed a commit that referenced this issue Dec 4, 2021
Suggest using LinearSVM when the user chooses to use the linear kernel in SVM. The reason is that LinearSVM uses a specialized faster solver.

Closes #1664
Also partially addresses #2857

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4382
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
…#4382)

Suggest using LinearSVM when the user chooses to use the linear kernel in SVM. The reason is that LinearSVM uses a specialized faster solver.

Closes rapidsai#1664
Also partially addresses rapidsai#2857

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4382
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants