Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPOT optimization gets stuck #1107

Open
hanshupe opened this issue Aug 19, 2020 · 3 comments
Open

TPOT optimization gets stuck #1107

hanshupe opened this issue Aug 19, 2020 · 3 comments

Comments

@hanshupe
Copy link

I experience that the optimization gets stuck and is not improving anymore after many hours while a new start sometimes gives a better score at the initial population already (!). The dataset is not very complex (50 variables, regression problem, 5000 rows) and i used default settings but also played around with the parameters of the genetic optimization.

What I noticed is that especially when I have poor performance, my log gets flooded with the following messages:

_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler..
_pre_test decorator: _random_mutation_operator: num_test=1 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler..
_pre_test decorator: _random_mutation_operator: num_test=2 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler..
_pre_test decorator: _random_mutation_operator: num_test=3 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler..
_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by `RobustScaler.

Sometimes I get that warnings 80 times for a population of 100 pipelines. What exactly does the message mean and how can I improve the optimization?

Btw.: Is it possible to pass some initial pipelines to TPOT to start with a better initial population?

@weixuanfu
Copy link
Contributor

TPOT may randomly generate invalid pipelines, for example using invalid hyperparameter combinations (eg. calling Logistic Regression with dual=True and penalty=L1), so, to avoild this, the pre_test decorator can evaluates such an pipeline on a small test set with maximum sample size of 50.

So far you may try different random_state for better initial population which should be reproduced with the same random_state.

I think it is great idea to pass some good initial pipelines. There are a related issue #296 but we did not implement in a ideal way (there was a related PR #502 but we revoked changes there). Any contributions are welcome for this new features.

@hanshupe
Copy link
Author

Is a possible reason for "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler." that due to some feature selection hyperparameter 0 features are passed to the next step? I will try to change the minimum threshold, maybe that helps.

So if a pipeline is detected as invalid, is still kept in the population with a low score? In my case it looks like 80% of the pipeline is full of invalid pipelines sometimes, ideally they should not propagate through so many generations.

@weixuanfu
Copy link
Contributor

weixuanfu commented Aug 19, 2020

Is a possible reason for "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler." that due to some feature selection hyperparameter 0 features are passed to the next step? I will try to change the minimum threshold, maybe that helps.

Yes, I think so. Changing the configuration of those feature selection operator may help.

So if a pipeline is detected as invalid, is still kept in the population with a low score? In my case it looks like 80% of the pipeline is full of invalid pipelines sometimes, ideally they should not propagate through so many generations.

The invalid pipelines tested in _pre_test decorator should not pass to population unless the newly-generated pipelines from one alteration in GP (crossover or mutation or randomly initial generation) failed ten times in _pre_test. So the population should not have those invalid pipelines in most of cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants