Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPOT stuck at 0% #542

Closed
Nirvana2211 opened this issue Aug 5, 2017 · 11 comments
Closed

TPOT stuck at 0% #542

Nirvana2211 opened this issue Aug 5, 2017 · 11 comments
Labels

Comments

@Nirvana2211
Copy link

I am using TPot classifier for a smallish dataset with 20,000 rows and 68 features. I ran the following code

pipeline_optimizer = TPOTClassifier()
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=0, verbosity=2,n_jobs = 10)
X_train = np.nan_to_num(X_train)
pipeline_optimizer.fit(X_train, dataY_train)
Warning: Although parallelization is currently supported in TPOT for Windows, pressing Ctrl+C will freeze the optimization process without saving the best pipeline! Thus, Please DO NOT press Ctrl+C during the optimization procss if n_jobs is not equal to 1. For quick test in Windows, please set n_jobs to 1 for saving the best pipeline in the middle of the optimization process via Ctrl+C.
Optimization Progress: 0%| | 0/120 [00:00<?, ?pipeline/s]

The optimization process is stuff at 0% for last 14 hours. Is this normal? Any help would be appreciated. Thank you!

@weixuanfu
Copy link
Contributor

I think this is related to #508. Please try to run TPOT like this demo below:

import multiprocessing
if __name__ == '__main__':
    multiprocessing.set_start_method('forkserver')
    # Note: need move import sklearn into main unless a RuntimeError (RuntimeError: context has already been set) will raise
    from sklearn.datasets import make_classification
    from tpot import TPOTClassifier
    # your TPOT codes
    pipeline_optimizer = TPOTClassifier()
    pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=0, 
                                                              verbosity=2,n_jobs = 10)
    X_train = np.nan_to_num(X_train)
    pipeline_optimizer.fit(X_train, dataY_train)

Please let me know if this way solve this issue.

@Nirvana2211
Copy link
Author

Nirvana2211 commented Aug 5, 2017

@weixuanfu Thank you for the prompt reply. I should have mentioned that I am using Windows. Sorry about that. 'fork server' doesn't work on windows. How can I make it work for Windows? I have changed n_jobs =1, and even that doesn't seem to work. Thanks again!

@weixuanfu
Copy link
Contributor

Oh, I just found that. Maybe decreasing n_jobs to 1 would help for Windows. Or you could try the latest dev branch where has better timeout control. Please let me know and inform me more environment infos (Tpot and its deps' versions) if both possible solutions do not work. I need double-check it.

@Nirvana2211
Copy link
Author

Nirvana2211 commented Aug 6, 2017

@weixuanfu n_jobs =1 worked for windows. I am also running it on a linux box with with n_jobs=20. It seems to be working on linux.

@deo1
Copy link

deo1 commented Sep 17, 2017

I have the exact same issue, running on Windows. Even with the below params running on the tiny Titanic dataset (100's of rows), the optimizer simply never makes progress.

model = tp.TPOTClassifier(generations=1, population_size=1, cv=5, verbosity=2, n_jobs=8, config_dict=config_dict)

Optimization Progress: 0%| | 0/2 [00:00<?, ?pipeline/s]

That said, CPU usage is around 100% and python processes are constantly getting spun up and torn down, but no progress is made. n_jobs=1 works as expected (< 15 sec).

Have any of the devs tried multiprocessing on a Windows machine? I suspect it just doesn't work.

multiprocessing.cpu_count() == 12

            platform : win-64
       conda version : 4.3.25
    conda is private : False
   conda-env version : 4.3.25
 conda-build version : 3.0.14
      python version : 3.5.4.final.0
    requests version : 2.13.0
                TPOT : 0.8.3
               numpy : 1.12.1
               scipy : 0.19.1
        scikit-learn : 0.19.0
                deap : 1.0.2

@rhiever
Copy link
Contributor

rhiever commented Sep 18, 2017

Multiprocessing simply doesn't work in Windows with Python, so we had to drop support for it.

@rhiever
Copy link
Contributor

rhiever commented Oct 10, 2017

Closing this issue. Please feel free to re-open or file a new issue if you have any further questions or comments.

@rhiever rhiever closed this as completed Oct 10, 2017
@OhMyGodness
Copy link

I think this is related to #508. Please try to run TPOT like this demo below:

import multiprocessing
if __name__ == '__main__':
    multiprocessing.set_start_method('forkserver')
    # Note: need move import sklearn into main unless a RuntimeError (RuntimeError: context has already been set) will raise
    from sklearn.datasets import make_classification
    from tpot import TPOTClassifier
    # your TPOT codes
    pipeline_optimizer = TPOTClassifier()
    pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=0, 
                                                              verbosity=2,n_jobs = 10)
    X_train = np.nan_to_num(X_train)
    pipeline_optimizer.fit(X_train, dataY_train)

Please let me know if this way solve this issue.

I am just rewrite my code like this demo,but it still stuck at 0% for two days.
my data is 100w rows and 64 cols,are they too large to cause this problem?
I run my code in aws Linux with 16 cores and 120G RAMs and I set the n_jobs =10.

1 similar comment
@OhMyGodness
Copy link

I think this is related to #508. Please try to run TPOT like this demo below:

import multiprocessing
if __name__ == '__main__':
    multiprocessing.set_start_method('forkserver')
    # Note: need move import sklearn into main unless a RuntimeError (RuntimeError: context has already been set) will raise
    from sklearn.datasets import make_classification
    from tpot import TPOTClassifier
    # your TPOT codes
    pipeline_optimizer = TPOTClassifier()
    pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5, random_state=0, 
                                                              verbosity=2,n_jobs = 10)
    X_train = np.nan_to_num(X_train)
    pipeline_optimizer.fit(X_train, dataY_train)

Please let me know if this way solve this issue.

I am just rewrite my code like this demo,but it still stuck at 0% for two days.
my data is 100w rows and 64 cols,are they too large to cause this problem?
I run my code in aws Linux with 16 cores and 120G RAMs and I set the n_jobs =10.

@weixuanfu
Copy link
Contributor

@OhMyGodness could you please try Parallel Training with Dask for this big dataset?

@OhMyGodness
Copy link

@OhMyGodness could you please try Parallel Training with Dask for this big dataset?

Yes,I have done some try using dask just like this demo ,but ,but find some other mistake in the script way on my aws
Linux instance.Could you give me detail in using dask by script way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants