Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Produced Before Generations Completed #1308

Closed
gokhanonderaksu opened this issue Jul 11, 2023 · 3 comments
Closed

Pipeline Produced Before Generations Completed #1308

gokhanonderaksu opened this issue Jul 11, 2023 · 3 comments
Assignees

Comments

@gokhanonderaksu
Copy link

gokhanonderaksu commented Jul 11, 2023

Hello,

So I am running this code to get a pipeline by using TPOT version of 0.12.0:

from tpot import TPOTRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import pandas as pd
import numpy as np

df = pd.read_excel('C:/Users/OneDrive/Desktop/KodSystems/TPOT/abc.xlsx')

X = df.iloc[:, :-1]
y = df.iloc[:, -1]

print(X.shape, y.shape)

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.80, test_size=0.20, random_state=42)

tpot = TPOTRegressor(generations=10, population_size=50, verbosity=2, random_state=42, n_jobs=-2 ,cv=10)
...

perform the search

tpot.fit(X_train, y_train)

export the best model

tpot.export('abc.py')

extracted_best_model = tpot.fitted_pipeline_.steps[-1][1]
extracted_best_model.fit(X_train ,y_train)
print(extracted_best_model.feature_importances_)

However, it gives me a pipeline, before 10 generation is completed, as the following:

(7478, 5) (7478,)

Best pipeline: RandomForestRegressor(input_matrix, bootstrap=True, max_features=0.7500000000000001, min_samples_leaf=11, min_samples_split=9, n_estimators=100)
[0.06836239 0.08344129 0.18414733 0.25859585 0.40545313]

When I change random_state to 1 from 42, it does give me a pipeline after 10 generations of run, but the same thing happens in another dataset with shape of (7478, 2061). I have run the same datasets in 0.11.7 version, but didn't get any problem. What could be the reason, and the solution for that problem?

Thanks in advance!

@Lingepumpe
Copy link

Lingepumpe commented Aug 4, 2023

Have a similar issue, there is a exception that is not caught that terminates the training loop. I am not sure why this exception doesn't get raised and show a stack trace by default, but if I specifically extend the "try" block in base.py:813 to also catch other exceptions, I got:

Traceback (most recent call last):
  File "/home/myusername/.cache/pypoetry/virtualenvs/myproject--nQ0R-Yy-py3.11/lib/python3.11/site-packages/tpot/base.py", line 817, in fit
    self._pop, _ = eaMuPlusLambda(
                   ^^^^^^^^^^^^^^^
  File "/home/myusername/.cache/pypoetry/virtualenvs/myproject--nQ0R-Yy-py3.11/lib/python3.11/site-packages/tpot/gp_deap.py", line 255, in eaMuPlusLambda
    population[:] = toolbox.select(population + offspring, mu)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myusername/.cache/pypoetry/virtualenvs/myproject--nQ0R-Yy-py3.11/lib/python3.11/site-packages/deap/tools/emo.py", line 41, in selNSGA2
    assignCrowdingDist(front)
  File "/home/myusername/.cache/pypoetry/virtualenvs/myproject--nQ0R-Yy-py3.11/lib/python3.11/site-packages/deap/tools/emo.py", line 132, in assignCrowdingDist
    crowd.sort(key=lambda element: element[0][i])
  File "/home/myusername/.cache/pypoetry/virtualenvs/myproject--nQ0R-Yy-py3.11/lib/python3.11/site-packages/deap/tools/emo.py", line 132, in <lambda>
    crowd.sort(key=lambda element: element[0][i])
                                   ~~~~~~~~~~^^^
IndexError: tuple index out of range

So ind.fitness.values is a tuple of a larger size for the first individual in the population, and then a later one has a smaller tuple leading to the IndexError. Indeed, the reason is that there are some elements in the population with ind.fitness.valid == False, with a empty tuple for ind.fitness.values.

Not sure why this is.

@perib
Copy link
Contributor

perib commented Aug 11, 2023

I believe this may be the same thing happening in #1313

@gokhanonderaksu
Copy link
Author

Hello, so I've tried a couple of datasets, which I got early crash errors with 0.12.0, by using 0.12.1 version, and until now they run smoothly, thanks so much for a quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants