Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fit best model on new data in Optuna mode #429

Open
huanvo88 opened this issue Jul 5, 2021 · 7 comments
Open

Fit best model on new data in Optuna mode #429

huanvo88 opened this issue Jul 5, 2021 · 7 comments
Assignees
Labels
docs Add or update documentation

Comments

@huanvo88
Copy link

huanvo88 commented Jul 5, 2021

Hello,

Is there a way to fit the best automl model (using best hyperparameters from optuna grid search + ensemble etc.) on new data? To give a concrete example, let's say I run automl with Optuna mode with a custom validation set to obtain the best automl model, and now I would like to fit that automl model on train + validation sets, and look at the results on an independent test set.

@pplonski
Copy link
Contributor

pplonski commented Jul 5, 2021

For Optuna mode it should be possible. You need to create AutoML with optuna_init_params argument pointing to the file with Optuna parameters. The example:

  1. Train AutoML on first data:
automl_1 = AutoML(mode='Optuna', results_path='AutoML_1')
automl_1.fit(X, y)

the best parameters from Optuna tuning will be saved in the file AutoML_1/optuna/optuna.json.

  1. Train on the second data but with the parameters from step 1:
automl_2 = AutoML(mode='Optuna', results_path='AutoML_2', optuna_init_params='AutoML_1/optuna/optuna.json')
automl_2.fit(X_new, y_new)

in this step, the models will be trained with parameters from step 1 but with new data. All models from this step will be saved in AutoML_2 path.

Please let me know if it works.

It is good to test it with small data and small optuna_time_budget, for example 60 seconds, the default tuning time for Optuna model is 3600 seconds.

This is only available in the Optuna mode. In other modes you will need to train model from scratch.

@pplonski pplonski added the docs Add or update documentation label Jul 5, 2021
@pplonski pplonski self-assigned this Jul 5, 2021
@pplonski pplonski changed the title Fit best model on new data Fit best model on new data in Optuna mode Jul 5, 2021
@huanvo88
Copy link
Author

huanvo88 commented Jul 5, 2021

Thanks @pplonski, so I don't think I can put the path directly to optuna_init_params, but have to do something like this

optuna_init = json.loads(open('previous_AutoML_training/optuna/optuna.json').read())

Also when I follow your suggestion, then it does another Optuna grid search for automl_2 on new data again, which is not what I was asking. I just want to fit the best automl model found in the previous Optuna grid search (with validation) on the new data.

@pplonski
Copy link
Contributor

pplonski commented Jul 5, 2021

Yes, you are right, you need to load params and pass as dict. Sorry for confusion.

I just run simple toy example and it should work as expected:

import numpy as np
import pandas as pd
from supervised import AutoML

# some toy data
X = np.random.rand(100,10)
y = np.random.rand(100)

# first training
# tuning with Optuna + training
automl = AutoML(mode='Optuna', results_path='automl_first_run', optuna_time_budget=1)
automl.fit(X, y)

# load params
import json
my_params = json.load(open('automl_first_run/optuna/optuna.json'))

# second training
# just train with best params from the first run (no tuning)
automl_2 = AutoML(mode='Optuna', results_path='automl_second_run', optuna_init_params=my_params)
automl_2.fit(X, y)

Is your code similar?

@huanvo88
Copy link
Author

huanvo88 commented Jul 5, 2021

Thanks @pplonski I think I figured out the mistake in my code. In the first run I put n_jobs = 40, so it did not run grid search for neural network. However for the second run I did not specify n_jobs, that is why it spins up an Optuna grid search for neural networks. When I specify n_jobs for both runs it is fine :)

@pplonski
Copy link
Contributor

pplonski commented Jul 5, 2021

Great that it works. However, Neural Network should be trained with n_jobs specified, but it doesn't use it (sklearn MLP implementation doesn't have n_jobs). Maybe it is a bug, if Optuna disables NN when n_jobs is specified?

@huanvo88
Copy link
Author

huanvo88 commented Jul 5, 2021

Right when train with Optuna, the first line it says is "Neural Network algorithm was disabled because it does not support n_jobs parameters".

Maybe we should fix it to include Neural Network algorithm when n_jobs is specified as well.

On the other hand, I'm not sure how useful MLP is for tabular data, maybe we are just wasting time doing grid search. Also it is faster to train MLP on GPU anyway.

@pplonski
Copy link
Contributor

pplonski commented Jul 5, 2021

@huanvo88 thanks for pasting the output. I remember now, that I disabled it on purpose. Let's leave it as it is. I will update the docs with your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Add or update documentation
Projects
None yet
Development

No branches or pull requests

2 participants