-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How can I make sure AutoSklearn is always using StandardScaler for feature preprocessing? #1548
Comments
Thank you very much for this really nice library! To add to the question: We are aware that the We are not sure how to derive from the error message, exemplarily resulting from option 1 above Thank you very much for your help! |
So the short answer is that if you do the pre-processing yourself, it will always be part of the dataset. The long answer is that the We do hope to fix this soon though, this has come up quite a few times. With regards to Best, |
Hi @eddiebergman, For Thank your very much! |
Hi @verakye,
If your overarching question is can you ensure "standardize" is applied to each feature through autosklearn, then no it can't currently be done by You can find here the components for Best, |
As a hack, you can remove the other components by deleting all files besides But in any case, could you maybe describe your use case? I'm wondering what you are trying to achieve and whether there's a better solution for this. |
Thanks a lot for the quick reply! "If your overarching question is can you ensure "standardize" is applied to each feature through autosklearn, then no it can't currently be done by data_preprocessing or feature_preprocessing." This pretty much answers it, I would say. We wanted to ensure that "standardize" is applied to all (numerical) features, while avoiding preprocessing before handing the data to auto-sklearn to avoid "data leakage" in its internal evaluations. It is likely standardisation won't have a huge "leakage" effect, but we thought it best to ensure standardisation in a cv-consistent way within auto-sklearn. |
Thank you very much @eddiebergman and @mfeurer for clarification! |
H!
First of all, thanks for this nice tool for the community. It is very useful in finding good models quickly without too much effort.
Short Question Description
My question is this: I would like to make sure that the autosklearn model only evaluates pipelines which scale features using the StandardScaler from sklearn. It is not entirely clear to me how this can be done. I have tried different argument configurations using
"include" and "exclude", but all of my inputs seem to be invalid.
With some extra context to follow it up. This way the question is clear for both you and us without it being lost in the paragraph.
Some useful information to help us with your question:
I am just trying to make sure that autosklearn is always using the StandardScaler on the input features.
Yes, very much.
I have been through the documentation for API, examples, and the code on github. I am not sure, whether what I want is possible anymore. One particular thing I don't really understand about the API is the distinction between the "data_preprocessor" and the "feature_preprocessor".
Before you ask, please have a look at the
Question
, maybe someone has already asked. If the question is about a feature, try searching more of the issues. If you find something related but doesn't directly answer your question, please link to it with #(issue number)!System Details (if relevant)
Which version of
auto-sklearn
are you using?auto-sklearn==0.14.7
Are you running this on Linux / Mac / ... ?
Debian 10.11
The text was updated successfully, but these errors were encountered: