TPOT optimization gets stuck #1107

hanshupe · 2020-08-19T07:00:27Z

I experience that the optimization gets stuck and is not improving anymore after many hours while a new start sometimes gives a better score at the initial population already (!). The dataset is not very complex (50 variables, regression problem, 5000 rows) and i used default settings but also played around with the parameters of the genetic optimization.

What I noticed is that especially when I have poor performance, my log gets flooded with the following messages:

_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler..
_pre_test decorator: _random_mutation_operator: num_test=1 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler..
_pre_test decorator: _random_mutation_operator: num_test=2 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler..
_pre_test decorator: _random_mutation_operator: num_test=3 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler..
_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by `RobustScaler.

Sometimes I get that warnings 80 times for a population of 100 pipelines. What exactly does the message mean and how can I improve the optimization?

Btw.: Is it possible to pass some initial pipelines to TPOT to start with a better initial population?

The text was updated successfully, but these errors were encountered:

weixuanfu · 2020-08-19T13:52:21Z

TPOT may randomly generate invalid pipelines, for example using invalid hyperparameter combinations (eg. calling Logistic Regression with dual=True and penalty=L1), so, to avoild this, the pre_test decorator can evaluates such an pipeline on a small test set with maximum sample size of 50.

So far you may try different random_state for better initial population which should be reproduced with the same random_state.

I think it is great idea to pass some good initial pipelines. There are a related issue #296 but we did not implement in a ideal way (there was a related PR #502 but we revoked changes there). Any contributions are welcome for this new features.

hanshupe · 2020-08-19T14:04:44Z

Is a possible reason for "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler." that due to some feature selection hyperparameter 0 features are passed to the next step? I will try to change the minimum threshold, maybe that helps.

So if a pipeline is detected as invalid, is still kept in the population with a low score? In my case it looks like 80% of the pipeline is full of invalid pipelines sometimes, ideally they should not propagate through so many generations.

weixuanfu · 2020-08-19T15:03:39Z

Is a possible reason for "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler." that due to some feature selection hyperparameter 0 features are passed to the next step? I will try to change the minimum threshold, maybe that helps.

Yes, I think so. Changing the configuration of those feature selection operator may help.

So if a pipeline is detected as invalid, is still kept in the population with a low score? In my case it looks like 80% of the pipeline is full of invalid pipelines sometimes, ideally they should not propagate through so many generations.

The invalid pipelines tested in _pre_test decorator should not pass to population unless the newly-generated pipelines from one alteration in GP (crossover or mutation or randomly initial generation) failed ten times in _pre_test. So the population should not have those invalid pipelines in most of cases.

weixuanfu added enhancement question need contributor labels Aug 19, 2020

t-harden mentioned this issue Sep 13, 2023

Potential New Feature: allowing users to input customized initial pipelines #1321

Open

perib mentioned this issue Sep 21, 2023

TPOT2 and the future of TPOT development -- From the Devs #1322

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPOT optimization gets stuck #1107

TPOT optimization gets stuck #1107

hanshupe commented Aug 19, 2020

weixuanfu commented Aug 19, 2020

hanshupe commented Aug 19, 2020

weixuanfu commented Aug 19, 2020 •

edited

Loading

TPOT optimization gets stuck #1107

TPOT optimization gets stuck #1107

Comments

hanshupe commented Aug 19, 2020

weixuanfu commented Aug 19, 2020

hanshupe commented Aug 19, 2020

weixuanfu commented Aug 19, 2020 • edited Loading

weixuanfu commented Aug 19, 2020 •

edited

Loading