Skip to content

CatBoost Estimators should be able to handle float categories  #3965

@tamargrey

Description

@tamargrey
  • As a user, I wish I could pass categorical data whose categories are floats into the CatBoostRegressor and CatBoostClassifier estimators. They currently raise Invalid type for cat_feature category for [feature_idx=312]=1.0 : cat_features must be integer or string, real number values and NaN values should be converted to string. with the code snippet below:

Code Example

    import woodwork as ww
    import pandas as pd

    X = pd.DataFrame({"double_cats": pd.Series([1.0, 2.0, 3.0, 4.0, 5.0]*20)})
    y = pd.Series(range(100))
    y.ww.init()
    X.ww.init(logical_types={"double_cats": "Categorical"})

    clf = CatBoostRegressor(
        n_estimators=1,
        max_depth=1,
    )
    clf.fit(X, y)

In the case above the floats can be converted to ints, but we should also handle the case where we cannot convert the floats to ints without truncating the data (we can consider converting to the object dtype in that case)

Metadata

Metadata

Assignees

No one assigned

    Labels

    new featureFeatures which don't yet exist.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions