Slow/Freeze with powerful configuration #905

lschneidpro · 2019-08-17T13:38:50Z

Hi there,

With the cluster of my university, I'm trying to run TPOT for a classification problem on a jupyter notebook with a datset of size (166'158x193). I cannot provide the dataset for confidentialy reasons.

Despite:

powerful instance (36 cores, 180 GB RAM)
tuning parameters ("TPOT LIGHT", reducing generation and population_size)
consulting other issues (reinstall TPOT development branch,...)
using Dask

The fitting time is vers slow and the progress bar only progress if i stop the kernel:

Here is my configuration:

Here is my dask cluster:

Here is my TPOT code:

pipeline_optimizer = TPOTClassifier(generations=20, 
                                    population_size=50, 
                                    scoring='neg_log_loss',
                                    cv=ut.RepeatedHoldout(n_iter=30, train_size=0.6, test_size=0.1),
                                    random_state=42, 
                                    verbosity=2,
                                    n_jobs=-1,
                                    warm_start = True,
                                    use_dask=True)
pipeline_optimizer.fit(df_cv[xvars].values, df_cv[yvars].values, weight)

where ut.RepeatedHoldout() is a personal cross-validation generator to deal with Time-dependant data
So I'm wondering if missed any important parameters to run smoothly TPOT,...

Thank you for any help

The text was updated successfully, but these errors were encountered:

weixuanfu · 2019-08-19T13:15:44Z

Did this configuration pass in smaller dataset? If yes, I think the issue maybe caused by high number of n_iter in RepeatedHoldout and max_eval_time_mins=5 in default TPOT setting. Increasing max_eval_time_mins in TPOTClassifier for allowing more time budget of evaluating a single pipeline should be helpful.

lschneidpro · 2019-08-19T14:03:58Z

Thanks for your input.
I'll give a go.
SHould i still keep using dask ?I'm developing on a jupyter notebook

weixuanfu · 2019-08-19T14:26:05Z

I think using dask should be fine.

lschneidpro · 2019-08-19T20:44:37Z

It became worse.

I'm wondering if there is something not working with the parralel execution

weixuanfu · 2019-08-19T21:17:31Z

Did this configuration pass a test with smaller dataset? It seems the dataset is too large to finish cv even within 15 minutes time budget.

lschneidpro · 2019-08-20T06:26:06Z

Yes, it went fine on the example provided in your documentation

Untitled.pdf

weixuanfu added the question label Aug 19, 2019

perib mentioned this issue Sep 21, 2023

TPOT2 and the future of TPOT development -- From the Devs #1322

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow/Freeze with powerful configuration #905

Slow/Freeze with powerful configuration #905

lschneidpro commented Aug 17, 2019 •

edited

Loading

weixuanfu commented Aug 19, 2019

lschneidpro commented Aug 19, 2019

weixuanfu commented Aug 19, 2019

lschneidpro commented Aug 19, 2019

weixuanfu commented Aug 19, 2019 •

edited

Loading

lschneidpro commented Aug 20, 2019

Slow/Freeze with powerful configuration #905

Slow/Freeze with powerful configuration #905

Comments

lschneidpro commented Aug 17, 2019 • edited Loading

weixuanfu commented Aug 19, 2019

lschneidpro commented Aug 19, 2019

weixuanfu commented Aug 19, 2019

lschneidpro commented Aug 19, 2019

weixuanfu commented Aug 19, 2019 • edited Loading

lschneidpro commented Aug 20, 2019

lschneidpro commented Aug 17, 2019 •

edited

Loading

weixuanfu commented Aug 19, 2019 •

edited

Loading