-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] SVR performance problem with linear model #1664
Comments
Adding a more recent data point. In the 2020-07-23 nightly from about 2 PM EDT. from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from cuml.svm import SVR
import cupy as cp
X, y = load_boston(return_X_y=True)
X = cp.asarray(X)
y = cp.asarray(y)
clf = SVR(kernel='linear')
%time clf.fit(X, y)
[W] [13:58:53.561866] Expected column ('F') major order, but got the opposite. Converting data, this will result in additional memory utilization.
CPU times: user 26.5 s, sys: 10 s, total: 36.5 s
Wall time: 36.5 s |
Note that tho following comment applies here (just replace SVC with SVR): #2857 (comment) In short, this is not a bug, but a corner case where the SMO solver is slow. Issue #2857 lists a few ideas how to improve that. IMHO best would be to have a dedicated LinearSVR solver. I have added this point to #2773. Thanks @beckernick for providing the reproducer with the boston dataset, it was very helpful while investigating the problem. |
closes #947 If the input data for SVM is not normalized correctly, then convergence can be very slow. The solver can even fail to converge. This PR detects such cases and prints a debug message with suggestions how to fix this problem. Such problems were reported in #947, #1664, #2857, #3233. The threshold for reporting is set so that the message is printed in those cases. I have tested several properly normalized cases to confirm that the message is not shown. Still, the threshold for printing the message does not have a proper theoretical justification, and false positives might occur. Therefore only a debug message is shown instead of a warning. Authors: - Tamas Bela Feher (@tfeher) Approvers: - Dante Gama Dessavre (@dantegd) URL: #3562
Suggest using LinearSVM when the user chooses to use the linear kernel in SVM. The reason is that LinearSVM uses a specialized faster solver. Closes #1664 Also partially addresses #2857 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Dante Gama Dessavre (https://github.com/dantegd) URL: #4382
…#4382) Suggest using LinearSVM when the user chooses to use the linear kernel in SVM. The reason is that LinearSVM uses a specialized faster solver. Closes rapidsai#1664 Also partially addresses rapidsai#2857 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4382
Describe the bug
Support vector regression has a performance problem if:
The execution time can become very long in these cases, longer than Sklearn's single core LIBSVM method.
Steps/Code to reproduce bug
Output:
The execuction time was measured on a V100. From the number of iterations we can see that most of the inner iterations run until we reach the max_inner_iter limit in SmoBlockSolve.
Expected behavior
For the given problem size a significantly faster execution time is expected. For example fitting the following dataset
with
'poly'
kernel takes around 0.1 sec, with linear kernel it is around 1 sec. (Note however that this dataset has only 5 informative features).Environment details (please complete the following information):
cmake
3.14.5 &gcc/g++
7.3.0 and commit hash 1b6f141 (branch-0.13)Additional context
The text was updated successfully, but these errors were encountered: