-
To convert categorical features to such integer codes, we can use the OrdinalEncoder. This estimator transforms each categorical feature to one new feature of integers (0 to n_categories - 1)
- Retains the order of categories when encoding ordinal data, which can be ranked or ordered. For example, education levels (high school, bachelor's, master's, Ph.D.) or temperature categories (cold, warm, hot).
- Retains the order of categories when encoding ordinal data, which can be ranked or ordered. For example, education levels (high school, bachelor's, master's, Ph.D.) or temperature categories (cold, warm, hot).
-
Another possibility to convert categorical features to features that can be used with scikit-learn estimators is to use a one-of-K, also known as one-hot or dummy encoding. This type of encoding can be obtained with the OneHotEncoder, which transforms each categorical feature with n_categories possible values into n_categories binary features, with one of them 1, and all others 0.
- Considers the presence or absence of a feature when encoding nominal data, which has categories with no intrinsic order or ranking. For example, colors (red, blue, green), types of animals (mammal, fish, reptile, amphibian, or bird), brand names (Coca-Cola, Pepsi, Sprite), or pizza toppings (pepperoni, mushrooms, onions).
- Considers the presence or absence of a feature when encoding nominal data, which has categories with no intrinsic order or ranking. For example, colors (red, blue, green), types of animals (mammal, fish, reptile, amphibian, or bird), brand names (Coca-Cola, Pepsi, Sprite), or pizza toppings (pepperoni, mushrooms, onions).
-
LabelEncoder encode target labels with value between 0 and n_classes-1
sklearn OrdinalEncoder sklearn OneHotEncoder sklearn LabelEncoder
- This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space. This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer.