-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
[enc] Expose categories container into Python. #11591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR exposes the categories container into Python by creating a new Python class wrapper and making pyarrow optional for categorical data handling. The changes refactor the categories API to return a container object instead of direct pyarrow arrays, while maintaining backwards compatibility through the export_to_arrow parameter.
Key changes:
- Refactor C API to use opaque handles for category containers with separate memory management
- Add a new Python
Categoriesclass that wraps the C++ container and provides controlled access to arrow exports - Make pyarrow dependency optional by allowing category containers to be created without immediate arrow conversion
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/c_api/c_api.cc | Adds new C API functions for category container creation, export, and cleanup with handle-based memory management |
| src/data/data.cc | Updates function signature for SparsePage::Push method with type changes |
| include/xgboost/data.h | Updates header declaration to match implementation changes |
| python-package/xgboost/core.py | Refactors get_categories methods to return Categories objects and handle optional arrow export |
| python-package/xgboost/_data_utils.py | Implements new Categories class with handle management and arrow export functionality |
| python-package/xgboost/_typing.py | Adds new type aliases for arrow category data structures |
| python-package/xgboost/testing/ordinal.py | Updates all test code to use new Categories API with export_to_arrow parameter |
|
cc @rongou . |
Ref #11088