Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tpot freezing jupyter notebook #645

Open
davidbp opened this issue Jan 2, 2018 · 17 comments
Open

tpot freezing jupyter notebook #645

davidbp opened this issue Jan 2, 2018 · 17 comments
Labels

Comments

@davidbp
Copy link

davidbp commented Jan 2, 2018

Hi,

When training a regressor with n_jobs=-1 the process freezes after a minute. All resources go to 0 and the algorithm never stops.

Process to reproduce the issue

import sklearn
import numpy as np
import tpot

X_train = np.random.random((1000,10))
y_train = np.random.random(1000)

def RMSLE(p,a):
    return np.sqrt(np.mean( (np.log(p+1) - np.log(a+1))**2 ))

rmsle_score = sklearn.metrics.make_scorer(RMSLE,greater_is_better=False)

reg1 = tpot.TPOTRegressor(verbosity=1, 
                          n_jobs=-1, 
                          scoring= rmsle_score, 
                          cv=10, 
                          max_time_mins=3)

reg1.fit(X_train, y_train)

I get the following

/Users/davidbuchacagmail.com/anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Optimization Progress: 100%|██████████| 200/200 [01:05<00:00,  1.20pipeline/s]
Generation 1 - Current best internal CV score: 0.19893855301909574
Optimization Progress: 100%|██████████| 300/300 [01:31<00:00,  1.37pipeline/s]
Generation 2 - Current best internal CV score: 0.19893855301909574
Optimization Progress: 100%|██████████| 400/400 [01:58<00:00,  1.46pipeline/s]
Generation 3 - Current best internal CV score: 0.19893855301909574
Optimization Progress:  80%|████████  | 401/500 [02:10<00:47,  2.07pipeline/s]

And the process freezes even though max_time_mins=3.

tpot.__version__
'0.9.1'
sklearn.__version__
'0.19.1'
@weixuanfu
Copy link
Contributor

Please check the alternative solution about this issue

@davidbp
Copy link
Author

davidbp commented Jan 2, 2018

Hello,

After inserting (at the first cell of the notebook)

import multiprocessing
multiprocessing.set_start_method('forkserver')

Then I get quite a big error that ends up with:

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-8-84c7f7a33512> in <module>()
----> 1 reg1.fit(X_train, y_train[target[0]])

~/anaconda3/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    660                     # raise the exception if it's our last attempt
    661                     if attempt == (attempts - 1):
--> 662                         raise e
    663             return self
    664 

~/anaconda3/lib/python3.6/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    651                         self._pbar.close()
    652 
--> 653                     self._update_top_pipeline()
    654                     self._summary_of_best_pipeline(features, target)
    655                     # Delete the temporary cache before exiting

~/anaconda3/lib/python3.6/site-packages/tpot/base.py in _update_top_pipeline(self)
    725             # If user passes CTRL+C in initial generation, self._pareto_front (halloffame) shoule be not updated yet.
    726             # need raise RuntimeError because no pipeline has been optimized
--> 727             raise RuntimeError('A pipeline has not yet been optimized. Please call fit() first.')
    728 
    729     def _summary_of_best_pipeline(self, features, target):

RuntimeError: A pipeline has not yet been optimized. Please call fit() first.

can it be an error "notebook related"?

@davidbp
Copy link
Author

davidbp commented Jan 2, 2018

I should add that I executed the same code in a test.py and it seems to work sometimes (not freezing).

import multiprocessing
import numpy as np
import tpot
import sklearn

if __name__ == '__main__':
	X_train = np.random.random((1000,10))
	y_train = np.random.random(1000)+10

	def RMSLE(p,a):
	    return np.sqrt(np.mean( (np.log(p+1) - np.log(a+1))**2 ))

	rmsle_score = sklearn.metrics.make_scorer(RMSLE,greater_is_better=False)

	reg1 = tpot.TPOTRegressor(verbosity=2, 
	                          n_jobs=-1, 
	                          scoring= rmsle_score, 
	                          cv=10, 
	                          max_time_mins=2)


	reg1.fit(X_train, y_train)

@weixuanfu
Copy link
Contributor

weixuanfu commented Jan 2, 2018

I rechecked the issue. I think there is a bug in new scoring api. Check PR #626

Try reinstall TPOT with this fix via the command below

pip install --upgrade --no-deps --force-reinstall git+https://github.com/weixuanfu/tpot.git@scoring_api_bug

I think your original codes without reseting start mode in multiprocessing will work in jupyter. I tested it in my MacOS

@weixuanfu
Copy link
Contributor

Hmm, now I think it is a notebook-related issue and it also related to the scoring API for customized scoring function. I will look into it and refine the API. Thank you for report this issue here.

@weixuanfu
Copy link
Contributor

weixuanfu commented Jan 9, 2018

I have another look on this issue. I think this issue is related to whether the customized scorer is pickable in parallel computing using joblib.

I can reproduce this issue using GridSearchCV from sklearn instead of using tpot (examples below). It seems that scorer is not pickable somehow.

from sklearn import linear_model, metrics
from sklearn.model_selection import GridSearchCV
import numpy as np
np.random.seed(42)
X_train = np.random.random((1000,10))
y_train = np.random.random(1000)

def RMSLE(p,a):
    return np.sqrt(np.mean( (np.log(p+1) - np.log(a+1))**2 ))

rmsle_score = metrics.make_scorer(RMSLE,greater_is_better=False)
parameters = {'fit_intercept':(True, False), 'normalize':[True, False]}
regr = linear_model.LinearRegression()
reg1 = GridSearchCV(regr, parameters, verbose=2,scoring=rmsle_score,n_jobs=-1)
reg1.fit(X_train, y_train)

Maybe it is a issue in sklearn's repo.

@saddy001
Copy link

I see it freezing after ~3 generations with and without forkserver for different scorers. A workaround seems to be setting backend='threading' as default kwarg for Parallel in sklearn/externals/joblib/parallel.py

@HamedMP
Copy link

HamedMP commented Jun 12, 2018

I found that this happens when setting n_jobs to anything other than 1.

@weixuanfu
Copy link
Contributor

I think this issue is for notebook only. I will try to find a work around for this.

@rhiever
Copy link
Contributor

rhiever commented Jun 12, 2018

@HamedMP, have you tried running TPOT with n_jobs!=1 on the command line?

@HamedMP
Copy link

HamedMP commented Jun 12, 2018 via email

@davidbp
Copy link
Author

davidbp commented Jun 23, 2018

This still happens when n_jobs is set to somethig bigger than 1. I have tested it today several times.
Same code with n_jobs=1 works but when n_jobs=n it stays forever in Optimization Progress: 0%.

I have tried to add the multiprocessing forkserver explained in the documentation but I actually get an error

This

import multiprocessing
from tpot import TPOTRegressor
multiprocessing.set_start_method('forkserver')

if __name__ == '__main__':
    #mycode

returns

Traceback (most recent call last):
  File "test_tpot_santander.py", line 3, in <module>
    multiprocessing.set_start_method('forkserver')
  File "/Users/davidbuchaca1/anaconda3/lib/python3.6/multiprocessing/context.py", line 242, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

Nevertheless

import multiprocessing
multiprocessing.set_start_method('forkserver')
from tpot import TPOTRegressor
if __name__ == '__main__':
    #mycode

Does not return any error but the same behaviour occurs. Nothing happens (even though CPU goes to 100 all threads for a long time).

Probably there is something weird in my multiprocessing in OSX because in Ubuntu It works fine.

@jaksmid
Copy link

jaksmid commented Jul 10, 2018

Setting the forkserver worked for me as described in documentation

@davidbp
Copy link
Author

davidbp commented Jul 11, 2018

@jaksmid did it work with forkserver in a linux or OSX machine?

@jaksmid
Copy link

jaksmid commented Jul 12, 2018

OSX machine

@shaunstoltz
Copy link

Someone commit some example code, whats the point if this cannot be scaled? Have tried every combination of forked, nothing works. Have 32 processors, and no progress after 30 minutes at verbosity 3. Garbage.

Wasted 4 hours trying to get this to do anything with more than 1 cpu.

@weixuanfu
Copy link
Contributor

It is a documented open issue. We are trying to use dask backend to solve it. Related to #730

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants