Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Upgrade to Arrow 14 #14370

Closed
bdice opened this issue Nov 7, 2023 · 5 comments · Fixed by #14506
Closed

[FEA] Upgrade to Arrow 14 #14370

bdice opened this issue Nov 7, 2023 · 5 comments · Fixed by #14506
Assignees
Labels
feature request New feature or request

Comments

@bdice
Copy link
Contributor

bdice commented Nov 7, 2023

This issue tracks the migration of Arrow to major version 14.

The conda-forge packaging of libarrow has been split up (conda-forge/arrow-cpp-feedstock#1201), resulting in this list of packages split from libarrow:

  • libarrow-all is a metapackage which depends on all the following packages
  • libarrow
  • libarrow-acero
  • libarrow-dataset
  • libarrow-flight
  • libarrow-flight-sql
  • libarrow-gandiva
  • libarrow-substrait
  • libparquet

The pyarrow package has run dependencies on all of the above packages (excluding libarrow-all).

As a first attempt at migration, I think we can use libarrow-all in place of libarrow in our conda builds, and if that works, we can try just using the components that we need. My best guess is that this includes only libarrow and libarrow-dataset, and maybe libparquet.

cc: @galipremsagar

@bdice bdice added feature request New feature or request Needs Triage Need team to review and classify and removed Needs Triage Need team to review and classify labels Nov 7, 2023
@galipremsagar galipremsagar self-assigned this Nov 7, 2023
@bdice
Copy link
Contributor Author

bdice commented Nov 7, 2023

This update is important because if an environment already contains a package that depends on (or solves with) pyarrow or libarrow version 14, it's not possible for the conda/mamba solver to downgrade from libarrow 14 to libarrow 13 due to the differences in package structure. We're starting to see this issue pop up across RAPIDS. Pinning this to Arrow 13 everywhere this problem shows up is not a good strategy because this should be solved within cudf's pinnings.

@bdice
Copy link
Contributor Author

bdice commented Nov 8, 2023

#14371 updated cudf to use Arrow 14. The next step I would take before closing this is to identify which subpackages we rely on and only install those subpackages where needed, rather than libarrow-all.

Based on this list:

GLOBAL_TARGETS arrow_shared parquet_shared arrow_acero_shared arrow_dataset_shared arrow_static
parquet_static arrow_acero_static arrow_dataset_static

I think we want libarrow, libparquet, libarrow-acero, libarrow-dataset. We probably won't need libarrow-flight, libarrow-flight-sql, libarrow-gandiva, or libarrow-substrait.

@galipremsagar
Copy link
Contributor

galipremsagar commented Nov 8, 2023

We probably won't need libarrow-flight, libarrow-flight-sql, libarrow-gandiva, or libarrow-substrait.

Just a note, pyarrow=14 will still depend on these and fetch them. The change we make would be beneficial to someone who is depending only on libcudf and not on cudf.

@h-vetinari
Copy link

Just a note, pyarrow=14 will still depend on these and fetch them.

This is something we want to change, and which may well still happen in the 14 series. The first iteration only split the libraries, but getting the installation size of pyarrow down (where the whole enchilada isn't needed) has been something arrow wanted to do for a while, and a bunch of the necessary work already landed in 14.

@jakirkham
Copy link
Member

Glad to hear that! Thanks Axel 🙏

rapids-bot bot pushed a commit that referenced this issue Nov 28, 2023
This PR splits the libarrow build dependencies, rather than using `libarrow-all`. This implements the proposal in #14370 (comment) and closes #14370.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Ray Douglass (https://github.com/raydouglass)

URL: #14506
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants