-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17838: [Python] Unify CMakeLists.txt in python/ #14925
Conversation
This also moves copying codes in setup.py to CMakeLists.txt. setup.py uses "cmake --build --target install" to put artifacts to suitable location.
@github-actions crossbow submit -g python |
|
|
@github-actions crossbow submit -g python |
|
Hmm. Crossbow doesn't work... |
@github-actions crossbow -g python |
|
@github-actions crossbow submit -g python |
This comment was marked as outdated.
This comment was marked as outdated.
@github-actions crossbow submit test-fedora-35-python-3 test-ubuntu-20.04-python-3 |
This comment was marked as outdated.
This comment was marked as outdated.
@github-actions crossbow submit -g wheel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I did a shallow review of the changes, but also tested locally and for my development setup this seems to be working nicely.
@@ -1,28 +0,0 @@ | |||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a consequence of those pc files being removed? (are those actually usable right now? I don't think they actually get installed, so this is just cleaning up?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we can't provide .pc
in portable way for Python package.
.pc
requires fixed install location but Python package is relocatable. Users can install it to /usr/local/lib/python*/dist-packages/pyarrow/
, ~/.local/lib/python*/site-packages/pyarrow/
and so on.
If we want to keep .pc
support, we need to rewrite .pc
after pyarrow is installed. (This is the approach that is used by MSYS2 package.)
FYI: .pc
aren't installed since our cpp/src/arrow/python/
-> python/
migration. It means that users already can't use arrow-python.pc
since 10.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@h-vetinari what you point to are the .pc files for libarrow, and that doesn't include the arrow_python library (anymore, since 10.0). This PR only affects the latter, and not the pc files for libarrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a test lined up for pkg-config metadata in conda builds that I can either push into #15014, but more likely would put in a separate PR after that one is merged (or into this one, if desired).
In any case, I think this PR should ideally be tested to not break the conda build setup (I'm happy to help with eventual adaptations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[gh hadn't updated the page so I didn't see you comment while writing my follow-up, apologies]
Sorry I overlooked what .pc
files this was pointing at. It seems there are still some .pc
files being installed for pyarrow
(or rather .pc.in
, so probably not functional).
I don't have very strong feelings about .pc
files for the python libs, but in general I think if we can create them easily (and it's not an undue maintenance burden), we should keep them for the conda builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in general I think if we can create them easily (and it's not an undue maintenance burden), we should keep them for the conda builds.
If we prepare arrow-python.pc
only for conda, conda users only can use it. It's not portable. It means that developers who use pyarrow's C++ API need to support arrow-python.pc
environment and no-arrow-python.pc
environment.
I think that all developers who use pyarrow's C++ API use pyarrow.get_include_dir()
/pyarrow.get_library_dirs()
/pyarrow.get_libraries()
instead of arrow-python.pc
. pyarrow.*
are available on conda and non-conda environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to complement to what @kou said: this is very similar for numpy. There one also need to use np.get_include()
.
if(PYARROW_BUILD_PARQUET) | ||
# Parquet | ||
find_package(Parquet REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hasn't yet been done at this point in the file (if you don't use parquet encryption), so I would expect this needs to stay? (although testing locally with my development setup (where I don't enable parquet encryption), it seems to work fine ..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Parquet
is found by find_package(ArrowDataset REQUIRED)
implicitly but we should call find_package(Parquet REQUIRED)
explicitly. I'll fix it.
This comment was marked as outdated.
This comment was marked as outdated.
This is great, thanks so much! |
@github-actions crossbow submit -g python -g wheel |
Revision: 75a7574 Submitted crossbow builds: ursacomputing/crossbow @ actions-8f625a77be |
Can we merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested it locally on M1, no issues 👍
The CI and crossbow failures are seen elsewhere also so I am happy to approve this. Thank you so much for such great optimisation in pyarrow build process 🙏
Thanks for testing it locally! I merge this. |
Benchmark runs are scheduled for baseline = 9ed98bf and contender = df4cb95. df4cb95 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
@kou @AlenkaF @jorisvandenbossche Benchmark builds on I am reproducing this issue on
Any advice on this? Thank you! |
Could you show |
|
Thanks. |
@kou Thank you! Yes, we can set ARROW_INSTALL_NAME_RPATH=ON. I am going to create a new PR now and confirm that this will work for benchmark builds on all machines. |
This also moves copying codes in setup.py to CMakeLists.txt. setup.py uses "cmake --build --target install" to put artifacts to suitable location.