Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question - Support for different types of categorical variable encoding #1237

Open
SSMK-wq opened this issue Jan 15, 2022 · 2 comments
Open

Comments

@SSMK-wq
Copy link

SSMK-wq commented Jan 15, 2022

Hi,

Does Tpot offer any automated way to convert categorical feature into encoded variables.

Context of the issue

I have an input dataset with more than 100 variables where around 80% of the variables are categorical in nature.

While some variables like gender, country etc can be one-hot encoded but I also have few variables which have an inherent order in their values such rating - Very good, good, bad etc.

Is there any approach/option in Tpot which we can use to do this encoding based on the variable type.

For ex: I would like to provide the below two lists as input to the tpot auto-ml arguments.

one-hot-list = ['Gender', 'Country'] #one-hot encoding
ordinal_list = ['Feedback', 'Level_of_interest'] #ordinal encoding

Is there any option in the package that can do this for us?

Or is there any other efficient way to do this as I have 80 categorical columns

@fjpa121197
Copy link

Hi @SSMK-wq,

did you find a work around to this? I don't see any documentation saying that TPOT handles encoding of categorical features, or different/predefined encoding, for example, ordinal vs one-hot encoding.

@spenceforce
Copy link

Bumping this as it would be nice to pass categorical features to tpot. Tpot includes OneHotEncoder in its default estimator set for regressions, but it's only usable for integers as it stands. I see the fit method throws an error on np.isnan. I'm sure there's more to it than changing that though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants