tpot freezing jupyter notebook #645

davidbp · 2018-01-02T15:49:01Z

Hi,

When training a regressor with n_jobs=-1 the process freezes after a minute. All resources go to 0 and the algorithm never stops.

Process to reproduce the issue

import sklearn
import numpy as np
import tpot

X_train = np.random.random((1000,10))
y_train = np.random.random(1000)

def RMSLE(p,a):
    return np.sqrt(np.mean( (np.log(p+1) - np.log(a+1))**2 ))

rmsle_score = sklearn.metrics.make_scorer(RMSLE,greater_is_better=False)

reg1 = tpot.TPOTRegressor(verbosity=1, 
                          n_jobs=-1, 
                          scoring= rmsle_score, 
                          cv=10, 
                          max_time_mins=3)

reg1.fit(X_train, y_train)

I get the following

/Users/davidbuchacagmail.com/anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Optimization Progress: 100%|██████████| 200/200 [01:05<00:00,  1.20pipeline/s]
Generation 1 - Current best internal CV score: 0.19893855301909574
Optimization Progress: 100%|██████████| 300/300 [01:31<00:00,  1.37pipeline/s]
Generation 2 - Current best internal CV score: 0.19893855301909574
Optimization Progress: 100%|██████████| 400/400 [01:58<00:00,  1.46pipeline/s]
Generation 3 - Current best internal CV score: 0.19893855301909574
Optimization Progress:  80%|████████  | 401/500 [02:10<00:47,  2.07pipeline/s]

And the process freezes even though max_time_mins=3.

tpot.__version__
'0.9.1'
sklearn.__version__
'0.19.1'

The text was updated successfully, but these errors were encountered:

weixuanfu · 2018-01-02T15:51:32Z

Please check the alternative solution about this issue

davidbp · 2018-01-02T16:03:51Z

Hello,

After inserting (at the first cell of the notebook)

import multiprocessing
multiprocessing.set_start_method('forkserver')

Then I get quite a big error that ends up with:

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-8-84c7f7a33512> in <module>()
----> 1 reg1.fit(X_train, y_train[target[0]])

~/anaconda3/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    660                     # raise the exception if it's our last attempt
    661                     if attempt == (attempts - 1):
--> 662                         raise e
    663             return self
    664 

~/anaconda3/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    651                         self._pbar.close()
    652 
--> 653                     self._update_top_pipeline()
    654                     self._summary_of_best_pipeline(features, target)
    655                     # Delete the temporary cache before exiting

~/anaconda3/lib/python3.6/site-packages/tpot/base.py in _update_top_pipeline(self)
    725             # If user passes CTRL+C in initial generation, self._pareto_front (halloffame) shoule be not updated yet.
    726             # need raise RuntimeError because no pipeline has been optimized
--> 727             raise RuntimeError('A pipeline has not yet been optimized. Please call fit() first.')
    728 
    729     def _summary_of_best_pipeline(self, features, target):

RuntimeError: A pipeline has not yet been optimized. Please call fit() first.

can it be an error "notebook related"?

davidbp · 2018-01-02T16:09:47Z

I should add that I executed the same code in a test.py and it seems to work sometimes (not freezing).

import multiprocessing
import numpy as np
import tpot
import sklearn

if __name__ == '__main__':
	X_train = np.random.random((1000,10))
	y_train = np.random.random(1000)+10

	def RMSLE(p,a):
	    return np.sqrt(np.mean( (np.log(p+1) - np.log(a+1))**2 ))

	rmsle_score = sklearn.metrics.make_scorer(RMSLE,greater_is_better=False)

	reg1 = tpot.TPOTRegressor(verbosity=2, 
	                          n_jobs=-1, 
	                          scoring= rmsle_score, 
	                          cv=10, 
	                          max_time_mins=2)


	reg1.fit(X_train, y_train)

weixuanfu · 2018-01-02T16:25:06Z

I rechecked the issue. I think there is a bug in new scoring api. Check PR #626

Try reinstall TPOT with this fix via the command below

pip install --upgrade --no-deps --force-reinstall git+https://github.com/weixuanfu/tpot.git@scoring_api_bug

I think your original codes without reseting start mode in multiprocessing will work in jupyter. I tested it in my MacOS

weixuanfu · 2018-01-03T21:12:53Z

Hmm, now I think it is a notebook-related issue and it also related to the scoring API for customized scoring function. I will look into it and refine the API. Thank you for report this issue here.

weixuanfu · 2018-01-09T15:56:45Z

I have another look on this issue. I think this issue is related to whether the customized scorer is pickable in parallel computing using joblib.

I can reproduce this issue using GridSearchCV from sklearn instead of using tpot (examples below). It seems that scorer is not pickable somehow.

from sklearn import linear_model, metrics
from sklearn.model_selection import GridSearchCV
import numpy as np
np.random.seed(42)
X_train = np.random.random((1000,10))
y_train = np.random.random(1000)

def RMSLE(p,a):
    return np.sqrt(np.mean( (np.log(p+1) - np.log(a+1))**2 ))

rmsle_score = metrics.make_scorer(RMSLE,greater_is_better=False)
parameters = {'fit_intercept':(True, False), 'normalize':[True, False]}
regr = linear_model.LinearRegression()
reg1 = GridSearchCV(regr, parameters, verbose=2,scoring=rmsle_score,n_jobs=-1)
reg1.fit(X_train, y_train)

Maybe it is a issue in sklearn's repo.

saddy001 · 2018-02-26T10:34:29Z

I see it freezing after ~3 generations with and without forkserver for different scorers. A workaround seems to be setting backend='threading' as default kwarg for Parallel in sklearn/externals/joblib/parallel.py

HamedMP · 2018-06-12T13:50:28Z

I found that this happens when setting n_jobs to anything other than 1.

weixuanfu · 2018-06-12T13:53:41Z

I think this issue is for notebook only. I will try to find a work around for this.

rhiever · 2018-06-12T15:20:48Z

@HamedMP, have you tried running TPOT with n_jobs!=1 on the command line?

HamedMP · 2018-06-12T15:26:12Z

Not actually. Although looking into other comments, and importing `multiprocessing` helped to run the training, although the time to update the bar is very long (maybe it waits all the processes finish benchmarking or something). Like for 5 minutes it’s 0 model, then it goes to 1%, 10%, …

…

On Jun 12, 2018, 5:21 PM +0200, Randy Olson ***@***.***>, wrote: @HamedMP, have you tried running it with n_jobs!=1 on the command line? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

davidbp · 2018-06-23T19:09:49Z

This still happens when n_jobs is set to somethig bigger than 1. I have tested it today several times.
Same code with n_jobs=1 works but when n_jobs=n it stays forever in Optimization Progress: 0%.

I have tried to add the multiprocessing forkserver explained in the documentation but I actually get an error

This

import multiprocessing
from tpot import TPOTRegressor
multiprocessing.set_start_method('forkserver')

if __name__ == '__main__':
    #mycode

returns

Traceback (most recent call last):
  File "test_tpot_santander.py", line 3, in <module>
    multiprocessing.set_start_method('forkserver')
  File "/Users/davidbuchaca1/anaconda3/lib/python3.6/multiprocessing/context.py", line 242, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

Nevertheless

import multiprocessing
multiprocessing.set_start_method('forkserver')
from tpot import TPOTRegressor
if __name__ == '__main__':
    #mycode

Does not return any error but the same behaviour occurs. Nothing happens (even though CPU goes to 100 all threads for a long time).

Probably there is something weird in my multiprocessing in OSX because in Ubuntu It works fine.

jaksmid · 2018-07-10T12:45:13Z

Setting the forkserver worked for me as described in documentation

davidbp · 2018-07-11T08:13:46Z

@jaksmid did it work with forkserver in a linux or OSX machine?

jaksmid · 2018-07-12T08:41:45Z

OSX machine

shaunstoltz · 2018-07-28T11:15:00Z

Someone commit some example code, whats the point if this cannot be scaled? Have tried every combination of forked, nothing works. Have 32 processors, and no progress after 30 minutes at verbosity 3. Garbage.

Wasted 4 hours trying to get this to do anything with more than 1 cpu.

weixuanfu · 2018-07-28T12:36:37Z

It is a documented open issue. We are trying to use dask backend to solve it. Related to #730

weixuanfu added the question label Jan 2, 2018

weixuanfu mentioned this issue Jan 4, 2018

Scoring parameter in TPOTRegressor unable to take customized loss functions #648

Closed

bartdp1 mentioned this issue Feb 5, 2018

Custom scoring function and forkserver #664

Open

HamedMP mentioned this issue Jun 12, 2018

How to set n_jobs = -1 from tpot #558

Closed

weixuanfu mentioned this issue Jul 10, 2018

Add feature selection operators to select categorical or continuous features #549

Closed

This was referenced Aug 29, 2018

[WIP] Use dask.delayed within fit #730

Merged

pip installed xgboost causing cpu to spike and crash #759

Closed

perib mentioned this issue Sep 21, 2023

TPOT2 and the future of TPOT development -- From the Devs #1322

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tpot freezing jupyter notebook #645

tpot freezing jupyter notebook #645

davidbp commented Jan 2, 2018 •

edited

Loading

weixuanfu commented Jan 2, 2018

davidbp commented Jan 2, 2018

davidbp commented Jan 2, 2018 •

edited

Loading

weixuanfu commented Jan 2, 2018 •

edited

Loading

weixuanfu commented Jan 3, 2018

weixuanfu commented Jan 9, 2018 •

edited

Loading

saddy001 commented Feb 26, 2018

HamedMP commented Jun 12, 2018

weixuanfu commented Jun 12, 2018

rhiever commented Jun 12, 2018 •

edited

Loading

HamedMP commented Jun 12, 2018 via email

davidbp commented Jun 23, 2018 •

edited

Loading

jaksmid commented Jul 10, 2018 •

edited

Loading

davidbp commented Jul 11, 2018

jaksmid commented Jul 12, 2018

shaunstoltz commented Jul 28, 2018

weixuanfu commented Jul 28, 2018

tpot freezing jupyter notebook #645

tpot freezing jupyter notebook #645

Comments

davidbp commented Jan 2, 2018 • edited Loading

Process to reproduce the issue

weixuanfu commented Jan 2, 2018

davidbp commented Jan 2, 2018

davidbp commented Jan 2, 2018 • edited Loading

weixuanfu commented Jan 2, 2018 • edited Loading

weixuanfu commented Jan 3, 2018

weixuanfu commented Jan 9, 2018 • edited Loading

saddy001 commented Feb 26, 2018

HamedMP commented Jun 12, 2018

weixuanfu commented Jun 12, 2018

rhiever commented Jun 12, 2018 • edited Loading

HamedMP commented Jun 12, 2018 via email

davidbp commented Jun 23, 2018 • edited Loading

jaksmid commented Jul 10, 2018 • edited Loading

davidbp commented Jul 11, 2018

jaksmid commented Jul 12, 2018

shaunstoltz commented Jul 28, 2018

weixuanfu commented Jul 28, 2018

davidbp commented Jan 2, 2018 •

edited

Loading

davidbp commented Jan 2, 2018 •

edited

Loading

weixuanfu commented Jan 2, 2018 •

edited

Loading

weixuanfu commented Jan 9, 2018 •

edited

Loading

rhiever commented Jun 12, 2018 •

edited

Loading

davidbp commented Jun 23, 2018 •

edited

Loading

jaksmid commented Jul 10, 2018 •

edited

Loading