Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17838: [Python] Unify CMakeLists.txt in python/ #14925

Merged
merged 13 commits into from
Dec 21, 2022
Merged

Conversation

kou
Copy link
Member

@kou kou commented Dec 13, 2022

This also moves copying codes in setup.py to CMakeLists.txt. setup.py uses "cmake --build --target install" to put artifacts to suitable location.

This also moves copying codes in setup.py to CMakeLists.txt. setup.py
uses "cmake --build --target install" to put artifacts to suitable
location.
@kou
Copy link
Member Author

kou commented Dec 13, 2022

@github-actions crossbow submit -g python

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@github-actions
Copy link

Wrong oauth personal access token
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/3683723044

@kou
Copy link
Member Author

kou commented Dec 13, 2022

@github-actions crossbow submit -g python

@github-actions
Copy link

Wrong oauth personal access token
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/3683737026

@kou
Copy link
Member Author

kou commented Dec 13, 2022

Hmm. Crossbow doesn't work...

@kou
Copy link
Member Author

kou commented Dec 14, 2022

@github-actions crossbow -g python

@github-actions
Copy link

No such option: -g
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/3698591563

@kou
Copy link
Member Author

kou commented Dec 15, 2022

@github-actions crossbow submit -g python

@github-actions

This comment was marked as outdated.

@kou
Copy link
Member Author

kou commented Dec 15, 2022

@github-actions crossbow submit test-fedora-35-python-3 test-ubuntu-20.04-python-3

@github-actions

This comment was marked as outdated.

@jorisvandenbossche
Copy link
Member

@github-actions crossbow submit -g wheel

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!
I did a shallow review of the changes, but also tested locally and for my development setup this seems to be working nicely.

@@ -1,28 +0,0 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a consequence of those pc files being removed? (are those actually usable right now? I don't think they actually get installed, so this is just cleaning up?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can't provide .pc in portable way for Python package.
.pc requires fixed install location but Python package is relocatable. Users can install it to /usr/local/lib/python*/dist-packages/pyarrow/, ~/.local/lib/python*/site-packages/pyarrow/ and so on.
If we want to keep .pc support, we need to rewrite .pc after pyarrow is installed. (This is the approach that is used by MSYS2 package.)

FYI: .pc aren't installed since our cpp/src/arrow/python/ -> python/ migration. It means that users already can't use arrow-python.pc since 10.0.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the conda builds don't currently test for the pc files, but they are being installed even for v>=10.

I think they should continue to be installed at least on conda, where the location is well-specified (and conda will take care of fixing the .pc files automatically).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vetinari what you point to are the .pc files for libarrow, and that doesn't include the arrow_python library (anymore, since 10.0). This PR only affects the latter, and not the pc files for libarrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a test lined up for pkg-config metadata in conda builds that I can either push into #15014, but more likely would put in a separate PR after that one is merged (or into this one, if desired).

In any case, I think this PR should ideally be tested to not break the conda build setup (I'm happy to help with eventual adaptations).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[gh hadn't updated the page so I didn't see you comment while writing my follow-up, apologies]

Sorry I overlooked what .pc files this was pointing at. It seems there are still some .pc files being installed for pyarrow (or rather .pc.in, so probably not functional).

I don't have very strong feelings about .pc files for the python libs, but in general I think if we can create them easily (and it's not an undue maintenance burden), we should keep them for the conda builds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in general I think if we can create them easily (and it's not an undue maintenance burden), we should keep them for the conda builds.

If we prepare arrow-python.pc only for conda, conda users only can use it. It's not portable. It means that developers who use pyarrow's C++ API need to support arrow-python.pc environment and no-arrow-python.pc environment.

I think that all developers who use pyarrow's C++ API use pyarrow.get_include_dir()/pyarrow.get_library_dirs()/pyarrow.get_libraries() instead of arrow-python.pc. pyarrow.* are available on conda and non-conda environments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to complement to what @kou said: this is very similar for numpy. There one also need to use np.get_include().

if(PYARROW_BUILD_PARQUET)
# Parquet
find_package(Parquet REQUIRED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't yet been done at this point in the file (if you don't use parquet encryption), so I would expect this needs to stay? (although testing locally with my development setup (where I don't enable parquet encryption), it seems to work fine ..)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Parquet is found by find_package(ArrowDataset REQUIRED) implicitly but we should call find_package(Parquet REQUIRED) explicitly. I'll fix it.

@github-actions

This comment was marked as outdated.

cpp/cmake_modules/BuildUtils.cmake Show resolved Hide resolved
cpp/src/arrow/symbols.map Show resolved Hide resolved
python/CMakeLists.txt Show resolved Hide resolved
python/CMakeLists.txt Show resolved Hide resolved
@AlenkaF
Copy link
Member

AlenkaF commented Dec 16, 2022

This is great, thanks so much!
I did not test t locally on M1 yet. Will do that asap.

@kou
Copy link
Member Author

kou commented Dec 18, 2022

@github-actions crossbow submit -g python -g wheel

@github-actions
Copy link

Revision: 75a7574

Submitted crossbow builds: ursacomputing/crossbow @ actions-8f625a77be

Task Status
test-conda-python-3.10 Github Actions
test-conda-python-3.11 Github Actions
test-conda-python-3.7 Github Actions
test-conda-python-3.7-hdfs-2.9.2 Github Actions
test-conda-python-3.7-hdfs-3.2.1 Github Actions
test-conda-python-3.7-pandas-1.0 Github Actions
test-conda-python-3.7-pandas-latest Github Actions
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-hypothesis Github Actions
test-conda-python-3.8-pandas-latest Github Actions
test-conda-python-3.8-pandas-nightly Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-dask-latest Github Actions
test-conda-python-3.9-dask-upstream_devel Github Actions
test-conda-python-3.9-pandas-upstream_devel Github Actions
test-conda-python-3.9-spark-master Github Actions
test-cuda-python Github Actions
test-debian-11-python-3 Azure
test-fedora-35-python-3 Azure
test-ubuntu-20.04-python-3 Azure
wheel-macos-big-sur-cp310-arm64 Github Actions
wheel-macos-big-sur-cp311-arm64 Github Actions
wheel-macos-big-sur-cp38-arm64 Github Actions
wheel-macos-big-sur-cp39-arm64 Github Actions
wheel-macos-mojave-cp310-amd64 Github Actions
wheel-macos-mojave-cp311-amd64 Github Actions
wheel-macos-mojave-cp37-amd64 Github Actions
wheel-macos-mojave-cp38-amd64 Github Actions
wheel-macos-mojave-cp39-amd64 Github Actions
wheel-manylinux2014-cp310-amd64 Github Actions
wheel-manylinux2014-cp310-arm64 Travis CI
wheel-manylinux2014-cp311-amd64 Github Actions
wheel-manylinux2014-cp311-arm64 Travis CI
wheel-manylinux2014-cp37-amd64 Github Actions
wheel-manylinux2014-cp37-arm64 Travis CI
wheel-manylinux2014-cp38-amd64 Github Actions
wheel-manylinux2014-cp38-arm64 Travis CI
wheel-manylinux2014-cp39-amd64 Github Actions
wheel-manylinux2014-cp39-arm64 Travis CI
wheel-windows-cp310-amd64 Github Actions
wheel-windows-cp311-amd64 Github Actions
wheel-windows-cp37-amd64 Github Actions
wheel-windows-cp38-amd64 Github Actions
wheel-windows-cp39-amd64 Github Actions

@kou
Copy link
Member Author

kou commented Dec 21, 2022

Can we merge this?

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested it locally on M1, no issues 👍

The CI and crossbow failures are seen elsewhere also so I am happy to approve this. Thank you so much for such great optimisation in pyarrow build process 🙏

@kou
Copy link
Member Author

kou commented Dec 21, 2022

Thanks for testing it locally!

I merge this.

@kou kou merged commit df4cb95 into apache:master Dec 21, 2022
@kou kou deleted the python-cmake branch December 21, 2022 20:27
@ursabot
Copy link

ursabot commented Dec 21, 2022

Benchmark runs are scheduled for baseline = 9ed98bf and contender = df4cb95. df4cb95 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.07% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] df4cb958 ec2-t3-xlarge-us-east-2
[Failed] df4cb958 test-mac-arm
[Finished] df4cb958 ursa-i9-9960x
[Finished] df4cb958 ursa-thinkcentre-m75q
[Finished] 9ed98bf8 ec2-t3-xlarge-us-east-2
[Finished] 9ed98bf8 test-mac-arm
[Finished] 9ed98bf8 ursa-i9-9960x
[Finished] 9ed98bf8 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Dec 21, 2022

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

@ElenaHenderson
Copy link
Contributor

ElenaHenderson commented Dec 28, 2022

@kou @AlenkaF @jorisvandenbossche Benchmark builds on test-mac-arm (https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm) started failing since this change was merged into main branch:

I am reproducing this issue on test-mac-arm:


cd ~
rm -rf arrow
git clone https://github.com/apache/arrow.git
cd arrow

conda create -y -n arrow-commit -c conda-forge \
  --file ci/conda_env_unix.txt \
  --file ci/conda_env_cpp.txt \
  --file ci/conda_env_python.txt \
  compilers \
  python=3.8 \
  pandas \
  aws-sdk-cpp \
  r

conda activate arrow-commit
pip install -r python/requirements-build.txt -r python/requirements-test.txt
source dev/conbench_envs/hooks.sh set_arrow_build_and_run_env_vars
export RANLIB=`which $RANLIB`
export AR=`which $AR`
export ARROW_JEMALLOC=OFF
ci/scripts/cpp_build.sh $(pwd) $(pwd)
ci/scripts/python_build.sh $(pwd) $(pwd)

(arrow-commit) voltrondata@m1mini01 arrow % python
Python 3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:49:06) 
[Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
ImportError: dlopen(/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/pyarrow/lib.cpython-38-darwin.so, 2): Library not loaded: /Users/voltrondata/arrow/python/build/lib.macosx-11.0-arm64-cpython-38/pyarrow/lib/libarrow_python.1100.dylib
  Referenced from: /Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/pyarrow/lib.cpython-38-darwin.so
  Reason: image not found
>>> 

Any advice on this? Thank you!

@kou
Copy link
Member Author

kou commented Dec 28, 2022

Could you show otool -L /Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/pyarrow/lib.cpython-38-darwin.so?

@ElenaHenderson
Copy link
Contributor

@kou

(arrow-commit) voltrondata@m1mini01 arrow % otool -L /Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/pyarrow/lib.cpython-38-darwin.so
/Users/voltrondata/miniconda3/envs/arrow-commit/lib/python3.8/site-packages/pyarrow/lib.cpython-38-darwin.so:
	/Users/voltrondata/arrow/python/build/lib.macosx-11.0-arm64-cpython-38/pyarrow/lib/libarrow_python.1100.dylib (compatibility version 1100.0.0, current version 1100.0.0)
	/Users/voltrondata/miniconda3/envs/arrow-commit/lib/libarrow.1100.dylib (compatibility version 1100.0.0, current version 1100.0.0)
	@rpath/libc++.1.dylib (compatibility version 1.0.0, current version 1.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)
(arrow-commit) voltrondata@m1mini01 arrow % 

@kou
Copy link
Member Author

kou commented Dec 29, 2022

Thanks.
It seems that ARROW_INSTALL_NAME_RPATH=OFF is specified explicitly: https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2099#01855a81-39fc-4639-b82c-df63b9db8f78/34-986
Can we stop it?

@ElenaHenderson
Copy link
Contributor

@kou Thank you! Yes, we can set ARROW_INSTALL_NAME_RPATH=ON. I am going to create a new PR now and confirm that this will work for benchmark builds on all machines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants