-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python: Use meson-python instead of setuptools #644
base: main
Are you sure you want to change the base?
Conversation
python/src/nanoarrow/meson.build
Outdated
sources: [cyf], | ||
cython_args: cython_args, | ||
include_directories: nanoarrow_src_dir, | ||
dependencies: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little too broad to use every dependency for every Cython file; can refine so that only IPC files need the IPC dep, Device files need the device dep, etc...
7245021
to
7c0e44a
Compare
@@ -103,7 +105,7 @@ if get_option('ipc') | |||
) | |||
nanoarrow_ipc_dep = declare_dependency(include_directories: [incdir], | |||
link_with: nanoarrow_ipc_lib, | |||
dependencies: [nanoarrow_dep, flatcc_dep]) | |||
dependencies: [nanoarrow_dep]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was a mistake to include the flatcc_dep
transitively here; I think its only required to build the nanoarrow_ipc
lib itself, but shouldn't be required by libraries that want to use nanoarrow_ipc
(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear on the difference but yes, the flatcc headers are deliberately not included in nanoarrow's public headers such that a library using nanoarrow does not need flatcc/*
files in the include path.
I know you're still working here, but just making sure you know about https://github.com/apache/arrow-nanoarrow/blob/main/ci/scripts/test-python-wheels.sh Will this work with CUDA? I imagine it's possible to add I do think we have to support building from sdist since I believe that's how the conda-forge setup currently works. (In general I think meson is way better than setuptools provided it has the same level of Python version and platform support...thank you for working on this!) |
# Replaced by version-bumping scripts at release time | ||
version = "0.6.0.dev0" | ||
[wrap-file] | ||
directory = arrow-nanoarrow-b78c0395a6fe74e776daf07933583f06567fb99c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just pointing to a single commit here that includes some of the changes made to the top level meson.build
configuration. In a follow up ideally point to main or even a wrapdb release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintaining the commit-level integration is the real trick of this particular Python package. It is probably difficult, but solving that is essential if this going to be the approach we use (i.e., if we are going to update the Python build system, we can't do it in such a way that it removes features or increases complexity).
Does the approach you use in the examples work?
arrow-nanoarrow/examples/meson-minimal/subprojects/nanoarrow.wrap
Lines 18 to 24 in 4168119
[wrap-git] | |
# url should point to the project root with the top-level meson.build file | |
url = ../../.. | |
revision = head | |
[provide] | |
nanoarrow = nanoarrow_dep |
...and maybe if building a source distribution you could copy arrow-nanoarrow/src
into python/vendor/arrow-nanoarrow/arc
and update the subprojects file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit level integration is managed through the symlink to the subprojects folder. A wrap file doesn't work like that - instead it just copies the library at a point in time to subprojects, and you would have to manually run meson wrap update
constantly to get that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To try and bring that all together...I wanted to have a subprojects folder that looked like:
python/
subprojects/
arrow-nanoarrow/ -- this is a symlink to the project root
arrow-nanoarrow.wrap
Ideally, if meson finds the arrow-nanoarrow
subproject it should not need the wrap file at all, and just happily use the symlink for a live integration. When it comes to sdists where the symlink will not exist, the wrap file should be responsible for resolving and unpacking a particular arrow-nanoarrow
distribution to the subprojects folder (probably from the wrap db, but in this PR I have it pointing to this branch to get the config updates)
Due to the "bug" I linked elsewhere in this PR, Meson confusingly still downloads the contents of arrow-nanoarrow.wrap
even when the arrow-nanoarrow
symlink exists, so to work around that the local directory structure instead looks like:
python/
subprojects/
arrow-nanoarrow-dev/ -- this is a symlink to the project root
arrow-nanoarrow.wrap
And the configuration looks like:
nanoarrow_proj = subproject('arrow-nanoarrow-dev', required: false)
if not nanoarrow_proj.found()
nanoarrow_proj = subproject('arrow-nanoarrow')
endif
When developing locally, the arrow-nanoarrow-dev symlink will always be prioritized. Of course, when we create source distributions we end up removing symlinks, so the wrap file comes into play in those instances.
Hope that makes sense but let me know if anything is unclear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get it, but I would like the sdist to have a full copy of the nanoarrow C library rather than download something. I can experiment a bit to see if I can get that to work (as well as fire up my Windows and CUDA test machines locally to ensure those both work too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK. In that case we can probably get rid of the wrap file then; just need to figure out the right way/hook to install a copy into subprojects when generating the sdist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in bootstrap.py
? ADBC sets an environment variable when building a source distribution and we probably could too:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meson-python and meson both actually have a dist option to include subprojects, which I think is where our best bet is
https://meson-python.readthedocs.io/en/stable/how-to-guides/meson-args.html
Of course, it doesn't seem to work right now (I think the symlink behavior tricks it into failure) but let me research that a bit more. Would rather let the build system take care of this if it can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meson.add_dist_script
may be useful here? It will run when building an sdist, and you can invoke a Python script to copy exactly the files/dirs you want under subprojects/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK I think that's a great idea. The --include-subprojects
flag I was looking at seems to just copy the symlink, but not deeply copy the subproject. So a custom script is probably the safest bet
0943ede
to
c0d3bc5
Compare
Definitely not an area of expertise for me, but I don't think Meson provides that many abstractions over CUDA at the moment
Yep should still be a part of this - when cloning the git repo if the arrow-nanoarrow subproject is there, meson should reference that. For source distributions, they should include the |
c0d3bc5
to
30914ec
Compare
if: ${{ matrix.config.label == "windows" }} | ||
with: | ||
python-version: "3.12" | ||
architecture: x64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like setup-python still installs a 32 bit Python by default on the windows runner, while cibuildwheel is expected 64 bit on that host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still get 32-bit wheels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so...I think cibuildwheel controls that, and since we don't override anything in the pyproject.toml I think it should still generate the same types of wheels as specified here:
https://cibuildwheel.pypa.io/en/stable/options/#build-skip
Admittedly my knowledge of Windows and how it builds for different architectures is limited, so definitely something to verify when wheels actually get build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have to update the python-wheels.yaml
workflow to fire on python/meson.build
instead of python/setup.py
to get it to run on this PR (in which case we can inspect the results and find out!)
Putting back in draft mode for now as there might be an upstream bug in meson we need to resolve mesonbuild/meson#13746 ...or my understanding is wrong. Either way, I think the way things stand now are a very close approximation of what this solution could look like |
Another outlet of this (or something in addition to this) could be something in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few thoughts! If we can solve the commit-level integration issue I think we should merge just after the release 🙂
# Replaced by version-bumping scripts at release time | ||
version = "0.6.0.dev0" | ||
[wrap-file] | ||
directory = arrow-nanoarrow-b78c0395a6fe74e776daf07933583f06567fb99c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintaining the commit-level integration is the real trick of this particular Python package. It is probably difficult, but solving that is essential if this going to be the approach we use (i.e., if we are going to update the Python build system, we can't do it in such a way that it removes features or increases complexity).
Does the approach you use in the examples work?
arrow-nanoarrow/examples/meson-minimal/subprojects/nanoarrow.wrap
Lines 18 to 24 in 4168119
[wrap-git] | |
# url should point to the project root with the top-level meson.build file | |
url = ../../.. | |
revision = head | |
[provide] | |
nanoarrow = nanoarrow_dep |
...and maybe if building a source distribution you could copy arrow-nanoarrow/src
into python/vendor/arrow-nanoarrow/arc
and update the subprojects file?
if: ${{ matrix.config.label == "windows" }} | ||
with: | ||
python-version: "3.12" | ||
architecture: x64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still get 32-bit wheels?
I don't really have a stake in this, but just to mention it: for pyarrow we are considering to switch from setuptools to scikit-build-core, and for maintenance overhead there might be some value in using the same build backend for the various python packages in the project (but in the end, we should also use the best tool in each case, and here there is less existing cmake compared to pyarrow, so using meson-python might be the simpler option) |
Given the dependency-freeness, probably either would work, and probably it would not be too hard to transition to one or the other if we need to for some reason! (We definitely need something better than setuptools, though, since the day is coming soon when we need zstd and lz4 to support IPC compression). |
I spent some time working with this to see if there was any easy way out of the "meson python doesn't make it easy to build a Python package in a subdirectory" problem, and I wonder if a slightly lower barrier to entry would be to keep the existing |
Yea I think that or what @rgommers suggested (run a script during Meson's dist hook to vendor what we need) are the two options. I'll take another look at this next week |
9bae7f3
to
ae750f0
Compare
target_src_dir = subproj_dir / "src" | ||
shutil.copytree(src_dir / "src", target_src_dir) | ||
|
||
# this files are only needed by bootstrap.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amount of scripting here is unfortunate...would still be nice to replace bootstrap.py and/or bundle.py, but not sure I want to tackle that in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR I think you will have the most success just renaming bootstrap.py
to generate_dist.py
and including a meson lib definition for nanoarrow in python/meson.build
(i.e., don't try to build nanoarrow C using a subproject).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "including a meson lib definition?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running bootstrap.py
will get you python/vendor/nanoarrow.c
and friends, and you should be able to put the appropriate nanoarrow_c_lib = library(...)
in python/meson.build
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK - so you want to run bootstrap.py continuously, not just when the distribution is being created?
The current form uses the subproject symlink for live integration with the C project and only vendors when the distribution is created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can get all the tests passing and all the wheels building with that go for it! I think you will have more success tackling those two battles separately but the time is yours 🙂
ae750f0
to
2b35a72
Compare
2b35a72
to
254202c
Compare
py_sources = [ | ||
'__init__.py', | ||
'array.py', | ||
'array_stream.py', | ||
'c_array.py', | ||
'c_array_stream.py', | ||
'c_buffer.py', | ||
'c_schema.py', | ||
'device.py', | ||
'ipc.py', | ||
'iterator.py', | ||
'_repr_utils.py', | ||
'schema.py', | ||
'visitor.py', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to make this a glob like "*.py"
so that a new contributor doesn't have to remember to update this list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Here's some documentation in Meson as to why:
https://mesonbuild.com/FAQ.html#why-cant-i-specify-target-files-with-a-wildcard
target_src_dir = subproj_dir / "src" | ||
shutil.copytree(src_dir / "src", target_src_dir) | ||
|
||
# this files are only needed by bootstrap.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR I think you will have the most success just renaming bootstrap.py
to generate_dist.py
and including a meson lib definition for nanoarrow in python/meson.build
(i.e., don't try to build nanoarrow C using a subproject).
@@ -75,7 +77,8 @@ | |||
from nanoarrow.array import array, Array | |||
from nanoarrow.array_stream import ArrayStream | |||
from nanoarrow.visitor import nulls_as_sentinel, nulls_forbid, nulls_separate | |||
from nanoarrow._version import __version__ # noqa: F401 | |||
|
|||
__version__ = importlib.metadata.version("nanoarrow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche is there any issues with versioning in this way? Is there any cost to importing importlib.metadata
by default?
(I am only skeptical because I haven't seen a Python package do this before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I accidentally removed the comment, but the downside to this approach versus a dynamic version generator like miniver is that the git hash is not included as part of the project metadata.
In this case the project definition from pyproject.toml just imports whatever the build-backend provides, which is currently 0.7.0-SNAPSHOT
# Try to resolve the symlink to a subproject first and fall back | ||
# to the wrap entry if that is unsuccessful. Ideally Meson would | ||
# take care of this for us, but there is a bug upstream | ||
# https://github.com/mesonbuild/meson/issues/13746#issuecomment-2392510954 | ||
nanoarrow_proj = subproject('arrow-nanoarrow') | ||
nanoarrow_dep = nanoarrow_proj.get_variable('nanoarrow_dep') | ||
nanoarrow_ipc_dep = nanoarrow_proj.get_variable('nanoarrow_ipc_dep') | ||
nanoarrow_device_dep = nanoarrow_proj.get_variable('nanoarrow_device_dep') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think you want to vendor the files using bootstrap.py
or whatever facility meson provides for that and not attempt to get the nanoarrow-as-a-subproject bit working (orthogonal battle to using meson-python vs. setuptools).
@@ -38,7 +38,11 @@ Changelog = "https://github.com/apache/arrow-nanoarrow/blob/main/CHANGELOG.md" | |||
|
|||
[build-system] | |||
requires = [ | |||
"setuptools >= 61.0.0", | |||
"meson-python", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need a version constraint here?
@@ -103,7 +105,7 @@ if get_option('ipc') | |||
) | |||
nanoarrow_ipc_dep = declare_dependency(include_directories: [incdir], | |||
link_with: nanoarrow_ipc_lib, | |||
dependencies: [nanoarrow_dep, flatcc_dep]) | |||
dependencies: [nanoarrow_dep]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear on the difference but yes, the flatcc headers are deliberately not included in nanoarrow's public headers such that a library using nanoarrow does not need flatcc/*
files in the include path.
This PR originally started as a nanobind POC to see if we could convert from Cython over to that, but I scaled that back just to focus on the possiblity of porting the build system to meson (you can see some nanobind components in the history, if you cared to look)
The build system port may have some merits on its own. It is more tightly integrated with the build system of the C code, and has the advantage of being built in parallel.
Technically it links to the main project by symlinking it as a subproject in thesubprojects
directory; if we cared to support source Python builds we could replace that symlink with a wrap file on distribution, or could even detect within the configuration if the symlink resolves. In the case it does not, we can fallback to a github releaseIf the user wanted to develop locally, they could simply symlink the project root into the subprojects folder of Python (the git repo already does this). For source distributions, the wrap system can provide a fallback project to download and reference
If we choose to try and move off of Cython incrementally and use something like nanobind or even C extensions, I think Meson can handle those more capably than the setup.py script