TPOT stuck at 75th generation with no errors #1214

aecomesana · 2021-06-09T20:32:09Z

I am running the GPU-accelerated (using dask) configuration of TPOT (TPOT version 0.11.7) on a couple of different data. I am also using TPOT cuML in the configuration. I am using python 3 with anaconda.

For all the data, TPOT is getting stuck at generation 74 or 75, no matter the size of the databases (some of them are 480 rows, 10 columns, up to 9000 rows, 83 columns). No error is output, the periodic checkpoint folder just stops updating, and no new messages appear. I have left it running but after 8 hours nothing new came up.
I have changed the random seed of the TPOT regressor to see if it would be an issue with a specific model architecture, but changing the seed still results in it getting stuck at generation 75.

My tpot regressor looks as follows:
tpot = TPOTRegressor(verbosity=2,
use_dask = True,
n_jobs=-1,
cv=5,
random_state=42, #this was changed, as mentioned above
template='Regressor',
config_dict='TPOT cuML',
periodic_checkpoint_folder='../checkpoints/{}/'.format(target),
max_time_mins = None
)

Any idea how to solve this issue/ why it would be happening every time for different data?
Thank you!

beckernick · 2021-07-13T02:18:45Z

You should use n_jobs=1 (the default). cuML is currently designed for the "one process per GPU" paradigm". Additionally, how are you setting up your Dask cluster?

It might be valuable to test your system and environment with this example gist or confirm your configuration is similar.

rhamnett · 2023-01-11T17:52:46Z

I just got caught by this, there needs to be a better error message when using cuML and leaving the n_jobs set to -1.

beckernick · 2023-01-12T16:41:43Z

If the maintainers are open to it, perhaps we could open a PR that validates the n_jobs parameter when the cuML configuration is used.

rhamnett · 2023-01-12T16:43:55Z

If the maintainers are open to it, perhaps we could open a PR that validates the n_jobs parameter when the cuML configuration is used.

Yes probably just >0 as I think you can use multiple GPUs

perib mentioned this issue Sep 21, 2023

TPOT2 and the future of TPOT development -- From the Devs #1322

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPOT stuck at 75th generation with no errors #1214

TPOT stuck at 75th generation with no errors #1214

aecomesana commented Jun 9, 2021

beckernick commented Jul 13, 2021 •

edited

Loading

rhamnett commented Jan 11, 2023

beckernick commented Jan 12, 2023

rhamnett commented Jan 12, 2023

TPOT stuck at 75th generation with no errors #1214

TPOT stuck at 75th generation with no errors #1214

Comments

aecomesana commented Jun 9, 2021

beckernick commented Jul 13, 2021 • edited Loading

rhamnett commented Jan 11, 2023

beckernick commented Jan 12, 2023

rhamnett commented Jan 12, 2023

beckernick commented Jul 13, 2021 •

edited

Loading