You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is my below code the correct approach to extend AutoSklearn with a wrapper class for the scikitlearn Standard Scaler as a feature preprocessor (particularly the settings in get_properties and in get_hyperparameter_search_space)?
How can I best double check in the final output that the features were indeed standard-scaled within AutoSklearn, i.e. if the StandardScaler works as it is supposed to work (better than just comparing two runs with or without standard scaling)?
get_properties looks correct at the very least. However the copy hyperparameter looks dubious, I'm not sure why you would want to search over this parameter. It's probably best to just leave it at True, i.e. don't modify any data in place.
The issue mentioned seems to suggest it should always be applied, I can't really verify this properly from here. One possible way to verify this is to use automl.leaderboard(ensemble_only=False, detailed=True). ensemble_only=False indicates that every run should be included while detailed=True indicates to show more columns, one of which is the featre_preprocessor.
I don't really have a good suggestion for you on this one. I imagine you could use the answer here to get the actual model out and pass data through it. However there are limitations to it. I'm sorry it's the best I can do.
That sounds like a very logical suggestions! Will do so.
It actually should always be applied, but in the sense of a pipeline within Autosklearn, meaning, not as a replacement of the default chosen preprocessing for a certain algorithm but as a first step, but then also other (and differing) pre-processing steps could come. I checked the leaderboard as suggested by you and compared it with a vanilla "no specification" of preprocessing. Unfortunately adding the preprocessor, replaces ALL default preprocessing. Is there an option to set one specific preprocessing for all algorithms but then still keep the default chosen algorithms to be done afterwards (just like in a pipeline)?
Thanks for the informative and useful linked issue!
Short Question Description
get_properties
and inget_hyperparameter_search_space
)?How did this question come about?:
As clarified in question [Question] How can I make sure AutoSklearn is always using StandardScaler for feature preprocessing? #1548 currently it is not possible to standard scale the features in a cv-conistent manner within a CV in Autosklearn. Therefore, currently standard scaling would be needed to be done on the features before calling AutoSklearn, which would lead to data leakage.
What have you already looked at?
I checked the documentation here: https://automl.github.io/auto-sklearn/master/extending.html
I checked out the ConfigSpace documentation: https://automl.github.io/ConfigSpace/main/api/conditions.html
I had a look at the source code of other implementations of components.
Thank you very much!
The code I wrote with the attempt to write a wrapper class and register it to auto-sklearn (runs without errors)
Additional example code to test-run the wrapper
The text was updated successfully, but these errors were encountered: