-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TPOT optimization gets stuck #1107
Comments
TPOT may randomly generate invalid pipelines, for example using invalid hyperparameter combinations (eg. calling Logistic Regression with dual=True and penalty=L1), so, to avoild this, the pre_test decorator can evaluates such an pipeline on a small test set with maximum sample size of 50. So far you may try different I think it is great idea to pass some good initial pipelines. There are a related issue #296 but we did not implement in a ideal way (there was a related PR #502 but we revoked changes there). Any contributions are welcome for this new features. |
Is a possible reason for "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by RobustScaler." that due to some feature selection hyperparameter 0 features are passed to the next step? I will try to change the minimum threshold, maybe that helps. So if a pipeline is detected as invalid, is still kept in the population with a low score? In my case it looks like 80% of the pipeline is full of invalid pipelines sometimes, ideally they should not propagate through so many generations. |
Yes, I think so. Changing the configuration of those feature selection operator may help.
The invalid pipelines tested in _pre_test decorator should not pass to population unless the newly-generated pipelines from one alteration in GP (crossover or mutation or randomly initial generation) failed ten times in _pre_test. So the population should not have those invalid pipelines in most of cases. |
I experience that the optimization gets stuck and is not improving anymore after many hours while a new start sometimes gives a better score at the initial population already (!). The dataset is not very complex (50 variables, regression problem, 5000 rows) and i used default settings but also played around with the parameters of the genetic optimization.
What I noticed is that especially when I have poor performance, my log gets flooded with the following messages:
Sometimes I get that warnings 80 times for a population of 100 pipelines. What exactly does the message mean and how can I improve the optimization?
Btw.: Is it possible to pass some initial pipelines to TPOT to start with a better initial population?
The text was updated successfully, but these errors were encountered: