Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to configure n_jobs in SVC or RandomizedSearchCV? #281

Open
ZYSK0 opened this issue Aug 4, 2024 · 1 comment
Open

How to configure n_jobs in SVC or RandomizedSearchCV? #281

ZYSK0 opened this issue Aug 4, 2024 · 1 comment

Comments

@ZYSK0
Copy link

ZYSK0 commented Aug 4, 2024

I'm using CPU-only Thundersvm on a centos colony, each of the calculating node has 28 cores. But I got a problem when I try to optimize my parameters when using RandomizedSearchCV, beacuse I do not know how to set the 'n_jobs' value in SVC() and in RandomizedSearchCV(). The former seems to allow each svm_model to utlize n_jobs numbers of threads, and the latter seems to allow each parameters combination utlize n_jobs numbers of PIDs.

My problem is how to set these two n_jobs to aplly to my calculating node cores restrictions? And if these two n_jobs have different meanings? By the way, I noticed in another issue saying that ThunderSVM do not support gridsearchcv's n_jobs in skcit, is this true?

My codes are as follows:
#ThunderSVM

parameters = {
'C': [1, 5, 9],
'gamma': [0.00001, 0.0001, 0.001, 0.1],
'kernel': ['rbf']
}

#I just set both of them into '5'
svm_model = SVC(kernel='rbf', probability=False, random_state=42, max_iter=1000000, n_jobs = 5,verbose = 1)

rdsearch = RandomizedSearchCV(estimator = svm_model, param_distributions = parameters, n_iter = 10, cv = 3, n_jobs = 5,verbose = 1, random_state = 42)

#train model
rdsearch.fit(X_train, y_train)

print(f"Best parameters: {rdsearch.best_params_}")
print(f"Best score: {rdsearch.best_score_}")

#save model
import joblib
joblib.dump(rdsearch, "ThunderSVM_model.joblib")

@DeltaGa
Copy link

DeltaGa commented Jan 9, 2025

I used to set it to the number of threads on my system (26 cores, 56 threads), and it managed to max out every core on my computer. The same event happened when setting n_jobs to the number of cores. Ultimately, it depends on the ability of your system to support heavy workloads because setting n_jobs to the maximum number of threads would likely result in I/O bandwidth bottleneck, then to CPU throttling, unless edit your system is capable enough. While setting it to the maximum number of cores is a safer option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants