-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fit best model on new data in Optuna mode #429
Comments
For Optuna mode it should be possible. You need to create AutoML with
automl_1 = AutoML(mode='Optuna', results_path='AutoML_1')
automl_1.fit(X, y) the best parameters from Optuna tuning will be saved in the file
automl_2 = AutoML(mode='Optuna', results_path='AutoML_2', optuna_init_params='AutoML_1/optuna/optuna.json')
automl_2.fit(X_new, y_new) in this step, the models will be trained with parameters from step 1 but with new data. All models from this step will be saved in Please let me know if it works. It is good to test it with small data and small This is only available in the Optuna mode. In other modes you will need to train model from scratch. |
Thanks @pplonski, so I don't think I can put the path directly to optuna_init_params, but have to do something like this optuna_init = json.loads(open('previous_AutoML_training/optuna/optuna.json').read()) Also when I follow your suggestion, then it does another Optuna grid search for automl_2 on new data again, which is not what I was asking. I just want to fit the best automl model found in the previous Optuna grid search (with validation) on the new data. |
Yes, you are right, you need to load params and pass as dict. Sorry for confusion. I just run simple toy example and it should work as expected: import numpy as np
import pandas as pd
from supervised import AutoML
# some toy data
X = np.random.rand(100,10)
y = np.random.rand(100)
# first training
# tuning with Optuna + training
automl = AutoML(mode='Optuna', results_path='automl_first_run', optuna_time_budget=1)
automl.fit(X, y)
# load params
import json
my_params = json.load(open('automl_first_run/optuna/optuna.json'))
# second training
# just train with best params from the first run (no tuning)
automl_2 = AutoML(mode='Optuna', results_path='automl_second_run', optuna_init_params=my_params)
automl_2.fit(X, y) Is your code similar? |
Thanks @pplonski I think I figured out the mistake in my code. In the first run I put n_jobs = 40, so it did not run grid search for neural network. However for the second run I did not specify n_jobs, that is why it spins up an Optuna grid search for neural networks. When I specify n_jobs for both runs it is fine :) |
Great that it works. However, Neural Network should be trained with |
Right when train with Optuna, the first line it says is "Neural Network algorithm was disabled because it does not support n_jobs parameters". Maybe we should fix it to include Neural Network algorithm when n_jobs is specified as well. On the other hand, I'm not sure how useful MLP is for tabular data, maybe we are just wasting time doing grid search. Also it is faster to train MLP on GPU anyway. |
@huanvo88 thanks for pasting the output. I remember now, that I disabled it on purpose. Let's leave it as it is. I will update the docs with your use case. |
Hello,
Is there a way to fit the best automl model (using best hyperparameters from optuna grid search + ensemble etc.) on new data? To give a concrete example, let's say I run automl with Optuna mode with a custom validation set to obtain the best automl model, and now I would like to fit that automl model on train + validation sets, and look at the results on an independent test set.
The text was updated successfully, but these errors were encountered: