You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A note to investigate further: We noticed extremely long refit time for AutoSklearn models. For example, if we set the time_left_for_this_task to be just 1hr, and so the per_run_time_limit is just 6minutes, we noticed that the refit could somehow take hours in some cases.
We didn't investigate it thoroughly, but I think I recall discovering that the HistGradientBoosting models would take a very long time to fit1. I know ensembling adds another layer to this, but I think we had observed this even for ensemble_size=1. @eddiebergman in #1677 illuminated that ensembling also includes all the models from different folds in cross validation.
I'm wondering if you had also noticed any performance issues with that, and if the new updates address it? (perhaps the upgrade to newer sklearn will take care of it)
Footnotes
And maybe even also their prediction time is slow too. We attempted to do permutation feature importance analysis on an AutoSklearnClassifier and it was extremely slow if the classifier was the HistogramGradientBoostingTree, compared to a raw sklearn pipeline. ↩
The text was updated successfully, but these errors were encountered:
I am noticing this as well. Even for small models where the run time is 2 minutes and the task time is 20 minutes, it takes over an hour (sometimes multiple hours) to refit. Here are the following relevant versions:
A note to investigate further: We noticed extremely long refit time for AutoSklearn models. For example, if we set the
time_left_for_this_task
to be just 1hr, and so theper_run_time_limit
is just 6minutes, we noticed that the refit could somehow take hours in some cases.We didn't investigate it thoroughly, but I think I recall discovering that the HistGradientBoosting models would take a very long time to fit1. I know ensembling adds another layer to this, but I think we had observed this even for ensemble_size=1. @eddiebergman in #1677 illuminated that ensembling also includes all the models from different folds in cross validation.
I'm wondering if you had also noticed any performance issues with that, and if the new updates address it? (perhaps the upgrade to newer sklearn will take care of it)
Footnotes
And maybe even also their prediction time is slow too. We attempted to do permutation feature importance analysis on an
AutoSklearnClassifier
and it was extremely slow if the classifier was the HistogramGradientBoostingTree, compared to a raw sklearn pipeline. ↩The text was updated successfully, but these errors were encountered: