Skip to content

Conversation

@trivialfis
Copy link
Member

@trivialfis trivialfis commented Jul 26, 2025

  • Make pyarrow optional.
  • Add a Python class for the container.

Ref #11088

@trivialfis trivialfis requested a review from Copilot July 26, 2025 15:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR exposes the categories container into Python by creating a new Python class wrapper and making pyarrow optional for categorical data handling. The changes refactor the categories API to return a container object instead of direct pyarrow arrays, while maintaining backwards compatibility through the export_to_arrow parameter.

Key changes:

  • Refactor C API to use opaque handles for category containers with separate memory management
  • Add a new Python Categories class that wraps the C++ container and provides controlled access to arrow exports
  • Make pyarrow dependency optional by allowing category containers to be created without immediate arrow conversion

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/c_api/c_api.cc Adds new C API functions for category container creation, export, and cleanup with handle-based memory management
src/data/data.cc Updates function signature for SparsePage::Push method with type changes
include/xgboost/data.h Updates header declaration to match implementation changes
python-package/xgboost/core.py Refactors get_categories methods to return Categories objects and handle optional arrow export
python-package/xgboost/_data_utils.py Implements new Categories class with handle management and arrow export functionality
python-package/xgboost/_typing.py Adds new type aliases for arrow category data structures
python-package/xgboost/testing/ordinal.py Updates all test code to use new Categories API with export_to_arrow parameter

@trivialfis
Copy link
Member Author

cc @rongou .

@trivialfis trivialfis merged commit 86a9809 into dmlc:master Jul 27, 2025
80 of 82 checks passed
@trivialfis trivialfis deleted the enc-expose-cat branch July 27, 2025 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants