Fit best model on new data in Optuna mode #429

huanvo88 · 2021-07-05T10:41:31Z

Hello,

Is there a way to fit the best automl model (using best hyperparameters from optuna grid search + ensemble etc.) on new data? To give a concrete example, let's say I run automl with Optuna mode with a custom validation set to obtain the best automl model, and now I would like to fit that automl model on train + validation sets, and look at the results on an independent test set.

pplonski · 2021-07-05T11:07:14Z

For Optuna mode it should be possible. You need to create AutoML with optuna_init_params argument pointing to the file with Optuna parameters. The example:

Train AutoML on first data:

automl_1 = AutoML(mode='Optuna', results_path='AutoML_1')
automl_1.fit(X, y)

the best parameters from Optuna tuning will be saved in the file AutoML_1/optuna/optuna.json.

Train on the second data but with the parameters from step 1:

automl_2 = AutoML(mode='Optuna', results_path='AutoML_2', optuna_init_params='AutoML_1/optuna/optuna.json')
automl_2.fit(X_new, y_new)

in this step, the models will be trained with parameters from step 1 but with new data. All models from this step will be saved in AutoML_2 path.

Please let me know if it works.

It is good to test it with small data and small optuna_time_budget, for example 60 seconds, the default tuning time for Optuna model is 3600 seconds.

This is only available in the Optuna mode. In other modes you will need to train model from scratch.

huanvo88 · 2021-07-05T13:50:26Z

Thanks @pplonski, so I don't think I can put the path directly to optuna_init_params, but have to do something like this

optuna_init = json.loads(open('previous_AutoML_training/optuna/optuna.json').read())

Also when I follow your suggestion, then it does another Optuna grid search for automl_2 on new data again, which is not what I was asking. I just want to fit the best automl model found in the previous Optuna grid search (with validation) on the new data.

pplonski · 2021-07-05T14:31:30Z

Yes, you are right, you need to load params and pass as dict. Sorry for confusion.

I just run simple toy example and it should work as expected:

import numpy as np
import pandas as pd
from supervised import AutoML

# some toy data
X = np.random.rand(100,10)
y = np.random.rand(100)

# first training
# tuning with Optuna + training
automl = AutoML(mode='Optuna', results_path='automl_first_run', optuna_time_budget=1)
automl.fit(X, y)

# load params
import json
my_params = json.load(open('automl_first_run/optuna/optuna.json'))

# second training
# just train with best params from the first run (no tuning)
automl_2 = AutoML(mode='Optuna', results_path='automl_second_run', optuna_init_params=my_params)
automl_2.fit(X, y)

Is your code similar?

huanvo88 · 2021-07-05T16:07:29Z

Thanks @pplonski I think I figured out the mistake in my code. In the first run I put n_jobs = 40, so it did not run grid search for neural network. However for the second run I did not specify n_jobs, that is why it spins up an Optuna grid search for neural networks. When I specify n_jobs for both runs it is fine :)

pplonski · 2021-07-05T17:46:08Z

Great that it works. However, Neural Network should be trained with n_jobs specified, but it doesn't use it (sklearn MLP implementation doesn't have n_jobs). Maybe it is a bug, if Optuna disables NN when n_jobs is specified?

huanvo88 · 2021-07-05T18:59:38Z

Right when train with Optuna, the first line it says is "Neural Network algorithm was disabled because it does not support n_jobs parameters".

Maybe we should fix it to include Neural Network algorithm when n_jobs is specified as well.

On the other hand, I'm not sure how useful MLP is for tabular data, maybe we are just wasting time doing grid search. Also it is faster to train MLP on GPU anyway.

pplonski · 2021-07-05T20:02:15Z

@huanvo88 thanks for pasting the output. I remember now, that I disabled it on purpose. Let's leave it as it is. I will update the docs with your use case.

pplonski added the docs Add or update documentation label Jul 5, 2021

pplonski self-assigned this Jul 5, 2021

pplonski changed the title ~~Fit best model on new data~~ Fit best model on new data in Optuna mode Jul 5, 2021

bsaldivaremc2 mentioned this issue Feb 13, 2025

Replicate result of cv splits with the solution of a fit #791

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fit best model on new data in Optuna mode #429

Fit best model on new data in Optuna mode #429

huanvo88 commented Jul 5, 2021

pplonski commented Jul 5, 2021 •

edited

Loading

huanvo88 commented Jul 5, 2021

pplonski commented Jul 5, 2021

huanvo88 commented Jul 5, 2021

pplonski commented Jul 5, 2021

huanvo88 commented Jul 5, 2021

pplonski commented Jul 5, 2021

Fit best model on new data in Optuna mode #429

Fit best model on new data in Optuna mode #429

Comments

huanvo88 commented Jul 5, 2021

pplonski commented Jul 5, 2021 • edited Loading

huanvo88 commented Jul 5, 2021

pplonski commented Jul 5, 2021

huanvo88 commented Jul 5, 2021

pplonski commented Jul 5, 2021

huanvo88 commented Jul 5, 2021

pplonski commented Jul 5, 2021

pplonski commented Jul 5, 2021 •

edited

Loading