-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pandas nullable types in feature matrix #6851
Comments
Took a quick look. I think a better way is to work on #5667 so that we can have masked arrays. |
Thanks for taking a look @trivialfis ! So a feature matrix with pandas nullable types will be converted to a masked array? What's the status of #5667 ? |
That's what I'm hoping and I suspect that's how it's implemented for nullable arrays.
I assigned myself to work on it based on the existing work by @zhangzhang10 . But arrow can trunk the data in an arbitrary way so it's a bit messy, I will try to follow up. |
@trivialfis @hcho3 Are there any updates on this? I see that xgboost can handle missing data. If I have a dataframe with nullable integers, e.g. from xgboost import XGBClassifier
import pandas as pd
import pytest
xgb = XGBClassifier()
# ValueError if only features use nullable types
df = pd.DataFrame({"a": pd.Series([True, False, True, False]),
"b": pd.Series([4, 5, 6, None], dtype='Int64'),
"c": pd.Series([1.23, 4.58, 3.14, 83.12])})
y = pd.Series([1, 0, 1, 0])
with pytest.raises(ValueError):
xgb.fit(df, y) Thank you! |
XGBoost still fails when encountering nullable floating-point dtypes. |
@Ark-kun I believe this now works (xgboost supports nullable types). from xgboost import XGBClassifier
import pandas as pd
import pytest
xgb = XGBClassifier()
# ValueError if only features use nullable types
df = pd.DataFrame({"a": pd.Series([True, False, True, False], dtype='boolean'),
"b": pd.Series([4, 5, 6, None], dtype='Int64'),
"c": pd.Series([1.23, 4.58, 3.14, 83.12], dtype='Float64')})
y = pd.Series([1, 0, 1, 0])
xgb.fit(df, y) |
I would like to use pandas nullable types with xgboost. Currently, xgboost will raise a ValueError if any of the features use the new nullable pandas types.
Repro with xgb
1.2.1
and pandas1.2.3
:As a user, I would like this feature so I can build a data processing pipeline using the latest pandas capabilities.
Thank you!
The text was updated successfully, but these errors were encountered: