Parallelize cross validation as a provisional optimization #302

ghgr · 2016-11-04T08:33:25Z

I propose setting the parameter n_jobs to num_cv_folds to get a sort of quick parallelism. When better solutions with dask are implemented we could set it again to 1

On base.py, _evaluate_individual method (line 575)
cv_scores = cross_val_score(self, sklearn_pipeline, features, classes, cv=self.num_cv_folds, scoring=self.scoring_function)

to

cv_scores = cross_val_score(self, sklearn_pipeline, features, classes, cv=self.num_cv_folds, scoring=self.scoring_function, n_jobs=self.num_cv_folds)

The text was updated successfully, but these errors were encountered:

weixuanfu · 2016-11-04T13:53:17Z

Thank you for sharing this nice tips. Based on the User Guide of cross_val_score from scikit-learn njobs parameter determines the CPU number to use in cross-validation while cv parameter determines the number of folds. Maybe adding njobs as another parameter in TPOT for paralleling the cross-validation with default of 1 since this way may use much more system resource. @rhiever

ghgr · 2016-11-04T13:58:53Z

Indeed that would be more precise. I proposed to make njobs == num_cv_folds since the default number of cv folds in tpot is 3, and most machines (used for machine learning) have more than 3 cores. Just to make @minimumnz feel better not having idle cores [1] ;-)

[1] #177

rhiever · 2016-11-04T17:56:39Z

We've been talking about adding a n_jobs parameter to TPOT for quite some time, which would basically do this. Perhaps we should just do that.

s-udhaya · 2016-11-17T17:21:38Z

Would not be better to use the multiprocessing capabilities of DEAP. i.e Each combination(preprocessor, algorithm, postprocessor, etc) is an individual in the population of different combinations in TPOT. Hence exploiting the DEAP's multiprocessing feature helps TPOT in parallelizing, through running different individual in different cores?

rhiever · 2016-12-19T19:58:02Z

We looked into using the multiprocessing capabilities of DEAP, but ran into issues with pickling lambda functions and a few other tricks we use in TPOT. Maybe @weixuanfu2016 can provide full details.

In the meantime, I've merged the PR into the development branch that exposes n_jobs for the cross-validation procedure.

rhiever added enhancement need contributor labels Nov 4, 2016

weixuanfu mentioned this issue Nov 7, 2016

Parallelizing cross validation in Linux and MacOS #307

Merged

weixuanfu mentioned this issue Jan 4, 2017

Paralleling pipeline evaluation #338

Closed

weixuanfu mentioned this issue Feb 22, 2017

Paralleling pipeline evaluation #355

Merged

rhiever closed this as completed Mar 21, 2017

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize cross validation as a provisional optimization #302

Parallelize cross validation as a provisional optimization #302

ghgr commented Nov 4, 2016

weixuanfu commented Nov 4, 2016

ghgr commented Nov 4, 2016

rhiever commented Nov 4, 2016

s-udhaya commented Nov 17, 2016 •

edited

Loading

rhiever commented Dec 19, 2016

Parallelize cross validation as a provisional optimization #302

Parallelize cross validation as a provisional optimization #302

Comments

ghgr commented Nov 4, 2016

weixuanfu commented Nov 4, 2016

ghgr commented Nov 4, 2016

rhiever commented Nov 4, 2016

s-udhaya commented Nov 17, 2016 • edited Loading

rhiever commented Dec 19, 2016

s-udhaya commented Nov 17, 2016 •

edited

Loading