Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: Use meson-python instead of setuptools #644

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

WillAyd
Copy link
Contributor

@WillAyd WillAyd commented Oct 3, 2024

This PR originally started as a nanobind POC to see if we could convert from Cython over to that, but I scaled that back just to focus on the possiblity of porting the build system to meson (you can see some nanobind components in the history, if you cared to look)

The build system port may have some merits on its own. It is more tightly integrated with the build system of the C code, and has the advantage of being built in parallel. Technically it links to the main project by symlinking it as a subproject in the subprojects directory; if we cared to support source Python builds we could replace that symlink with a wrap file on distribution, or could even detect within the configuration if the symlink resolves. In the case it does not, we can fallback to a github release

If the user wanted to develop locally, they could simply symlink the project root into the subprojects folder of Python (the git repo already does this). For source distributions, the wrap system can provide a fallback project to download and reference

If we choose to try and move off of Cython incrementally and use something like nanobind or even C extensions, I think Meson can handle those more capably than the setup.py script

sources: [cyf],
cython_args: cython_args,
include_directories: nanoarrow_src_dir,
dependencies: [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little too broad to use every dependency for every Cython file; can refine so that only IPC files need the IPC dep, Device files need the device dep, etc...

python/meson.build Outdated Show resolved Hide resolved
@WillAyd WillAyd force-pushed the mesonpython-over-setuptools branch 2 times, most recently from 7245021 to 7c0e44a Compare October 3, 2024 17:29
@WillAyd WillAyd marked this pull request as ready for review October 3, 2024 17:33
@@ -103,7 +105,7 @@ if get_option('ipc')
)
nanoarrow_ipc_dep = declare_dependency(include_directories: [incdir],
link_with: nanoarrow_ipc_lib,
dependencies: [nanoarrow_dep, flatcc_dep])
dependencies: [nanoarrow_dep])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was a mistake to include the flatcc_dep transitively here; I think its only required to build the nanoarrow_ipc lib itself, but shouldn't be required by libraries that want to use nanoarrow_ipc (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on the difference but yes, the flatcc headers are deliberately not included in nanoarrow's public headers such that a library using nanoarrow does not need flatcc/* files in the include path.

@paleolimbot
Copy link
Member

I know you're still working here, but just making sure you know about https://github.com/apache/arrow-nanoarrow/blob/main/ci/scripts/test-python-wheels.sh

Will this work with CUDA? I imagine it's possible to add LDFLAGS to get around this if needed.

I do think we have to support building from sdist since I believe that's how the conda-forge setup currently works.

(In general I think meson is way better than setuptools provided it has the same level of Python version and platform support...thank you for working on this!)

# Replaced by version-bumping scripts at release time
version = "0.6.0.dev0"
[wrap-file]
directory = arrow-nanoarrow-b78c0395a6fe74e776daf07933583f06567fb99c
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just pointing to a single commit here that includes some of the changes made to the top level meson.build configuration. In a follow up ideally point to main or even a wrapdb release

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintaining the commit-level integration is the real trick of this particular Python package. It is probably difficult, but solving that is essential if this going to be the approach we use (i.e., if we are going to update the Python build system, we can't do it in such a way that it removes features or increases complexity).

Does the approach you use in the examples work?

[wrap-git]
# url should point to the project root with the top-level meson.build file
url = ../../..
revision = head
[provide]
nanoarrow = nanoarrow_dep

...and maybe if building a source distribution you could copy arrow-nanoarrow/src into python/vendor/arrow-nanoarrow/arc and update the subprojects file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit level integration is managed through the symlink to the subprojects folder. A wrap file doesn't work like that - instead it just copies the library at a point in time to subprojects, and you would have to manually run meson wrap update constantly to get that

Copy link
Contributor Author

@WillAyd WillAyd Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To try and bring that all together...I wanted to have a subprojects folder that looked like:

python/
  subprojects/
    arrow-nanoarrow/  -- this is a symlink to the project root
    arrow-nanoarrow.wrap

Ideally, if meson finds the arrow-nanoarrow subproject it should not need the wrap file at all, and just happily use the symlink for a live integration. When it comes to sdists where the symlink will not exist, the wrap file should be responsible for resolving and unpacking a particular arrow-nanoarrow distribution to the subprojects folder (probably from the wrap db, but in this PR I have it pointing to this branch to get the config updates)

Due to the "bug" I linked elsewhere in this PR, Meson confusingly still downloads the contents of arrow-nanoarrow.wrap even when the arrow-nanoarrow symlink exists, so to work around that the local directory structure instead looks like:

python/
  subprojects/
    arrow-nanoarrow-dev/  -- this is a symlink to the project root
    arrow-nanoarrow.wrap

And the configuration looks like:

nanoarrow_proj = subproject('arrow-nanoarrow-dev', required: false)
if not nanoarrow_proj.found()
  nanoarrow_proj = subproject('arrow-nanoarrow')
endif

When developing locally, the arrow-nanoarrow-dev symlink will always be prioritized. Of course, when we create source distributions we end up removing symlinks, so the wrap file comes into play in those instances.

Hope that makes sense but let me know if anything is unclear

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get it, but I would like the sdist to have a full copy of the nanoarrow C library rather than download something. I can experiment a bit to see if I can get that to work (as well as fire up my Windows and CUDA test machines locally to ensure those both work too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. In that case we can probably get rid of the wrap file then; just need to figure out the right way/hook to install a copy into subprojects when generating the sdist

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in bootstrap.py? ADBC sets an environment variable when building a source distribution and we probably could too:

https://github.com/apache/arrow-adbc/blob/47e6a6a5318e9f85ef4eed7903844a05b7614e4f/python/adbc_driver_postgresql/setup.py#L37-L38

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meson-python and meson both actually have a dist option to include subprojects, which I think is where our best bet is

https://meson-python.readthedocs.io/en/stable/how-to-guides/meson-args.html

Of course, it doesn't seem to work right now (I think the symlink behavior tricks it into failure) but let me research that a bit more. Would rather let the build system take care of this if it can

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meson.add_dist_script may be useful here? It will run when building an sdist, and you can invoke a Python script to copy exactly the files/dirs you want under subprojects/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK I think that's a great idea. The --include-subprojects flag I was looking at seems to just copy the symlink, but not deeply copy the subproject. So a custom script is probably the safest bet

@WillAyd WillAyd force-pushed the mesonpython-over-setuptools branch from 0943ede to c0d3bc5 Compare October 3, 2024 17:59
@WillAyd
Copy link
Contributor Author

WillAyd commented Oct 3, 2024

Will this work with CUDA? I imagine it's possible to add LDFLAGS to get around this if needed.

Definitely not an area of expertise for me, but I don't think Meson provides that many abstractions over CUDA at the moment

I do think we have to support building from sdist since I believe that's how the conda-forge setup currently works.

Yep should still be a part of this - when cloning the git repo if the arrow-nanoarrow subproject is there, meson should reference that. For source distributions, they should include the arrow-nanoarrow.wrap file which will download a copy of the project from a remote source (currently this GH branch, but ideally main or better yet the WrapDB)

@WillAyd WillAyd force-pushed the mesonpython-over-setuptools branch from c0d3bc5 to 30914ec Compare October 3, 2024 18:05
meson.build Outdated Show resolved Hide resolved
if: ${{ matrix.config.label == "windows" }}
with:
python-version: "3.12"
architecture: x64
Copy link
Contributor Author

@WillAyd WillAyd Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like setup-python still installs a 32 bit Python by default on the windows runner, while cibuildwheel is expected 64 bit on that host

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still get 32-bit wheels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so...I think cibuildwheel controls that, and since we don't override anything in the pyproject.toml I think it should still generate the same types of wheels as specified here:

https://cibuildwheel.pypa.io/en/stable/options/#build-skip

Admittedly my knowledge of Windows and how it builds for different architectures is limited, so definitely something to verify when wheels actually get build

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have to update the python-wheels.yaml workflow to fire on python/meson.build instead of python/setup.py to get it to run on this PR (in which case we can inspect the results and find out!)

@WillAyd WillAyd marked this pull request as draft October 3, 2024 21:02
@WillAyd
Copy link
Contributor Author

WillAyd commented Oct 3, 2024

Putting back in draft mode for now as there might be an upstream bug in meson we need to resolve mesonbuild/meson#13746

...or my understanding is wrong. Either way, I think the way things stand now are a very close approximation of what this solution could look like

@paleolimbot
Copy link
Member

Another outlet of this (or something in addition to this) could be something in examples: a Python package that uses nanoarrow using meson as the build system.

@WillAyd WillAyd marked this pull request as ready for review October 4, 2024 00:22
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few thoughts! If we can solve the commit-level integration issue I think we should merge just after the release 🙂

python/src/nanoarrow/meson.build Outdated Show resolved Hide resolved
# Replaced by version-bumping scripts at release time
version = "0.6.0.dev0"
[wrap-file]
directory = arrow-nanoarrow-b78c0395a6fe74e776daf07933583f06567fb99c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintaining the commit-level integration is the real trick of this particular Python package. It is probably difficult, but solving that is essential if this going to be the approach we use (i.e., if we are going to update the Python build system, we can't do it in such a way that it removes features or increases complexity).

Does the approach you use in the examples work?

[wrap-git]
# url should point to the project root with the top-level meson.build file
url = ../../..
revision = head
[provide]
nanoarrow = nanoarrow_dep

...and maybe if building a source distribution you could copy arrow-nanoarrow/src into python/vendor/arrow-nanoarrow/arc and update the subprojects file?

if: ${{ matrix.config.label == "windows" }}
with:
python-version: "3.12"
architecture: x64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still get 32-bit wheels?

@jorisvandenbossche
Copy link
Member

I don't really have a stake in this, but just to mention it: for pyarrow we are considering to switch from setuptools to scikit-build-core, and for maintenance overhead there might be some value in using the same build backend for the various python packages in the project (but in the end, we should also use the best tool in each case, and here there is less existing cmake compared to pyarrow, so using meson-python might be the simpler option)

@paleolimbot
Copy link
Member

Given the dependency-freeness, probably either would work, and probably it would not be too hard to transition to one or the other if we need to for some reason! (We definitely need something better than setuptools, though, since the day is coming soon when we need zstd and lz4 to support IPC compression).

@paleolimbot
Copy link
Member

I spent some time working with this to see if there was any easy way out of the "meson python doesn't make it easy to build a Python package in a subdirectory" problem, and I wonder if a slightly lower barrier to entry would be to keep the existing bootstrap.py behaviour (i.e., generate the single-file exports and .pxd and put them in python/vendor) and use Meson to build those instead of attempting to have nanoarrow as a subproject (getting nanoarrow to be a proper subproject could be a follow-up). That way we would get the benefit of meson-python as a build system (e.g., make it easier to transition away from Cython) without bending the build system to do something it wasn't designed to do. Thoughts?

@WillAyd
Copy link
Contributor Author

WillAyd commented Oct 11, 2024

Yea I think that or what @rgommers suggested (run a script during Meson's dist hook to vendor what we need) are the two options.

I'll take another look at this next week

@WillAyd WillAyd force-pushed the mesonpython-over-setuptools branch 2 times, most recently from 9bae7f3 to ae750f0 Compare October 15, 2024 22:04
target_src_dir = subproj_dir / "src"
shutil.copytree(src_dir / "src", target_src_dir)

# this files are only needed by bootstrap.py
Copy link
Contributor Author

@WillAyd WillAyd Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The amount of scripting here is unfortunate...would still be nice to replace bootstrap.py and/or bundle.py, but not sure I want to tackle that in this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR I think you will have the most success just renaming bootstrap.py to generate_dist.py and including a meson lib definition for nanoarrow in python/meson.build (i.e., don't try to build nanoarrow C using a subproject).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "including a meson lib definition?"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running bootstrap.py will get you python/vendor/nanoarrow.c and friends, and you should be able to put the appropriate nanoarrow_c_lib = library(...) in python/meson.build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK - so you want to run bootstrap.py continuously, not just when the distribution is being created?

The current form uses the subproject symlink for live integration with the C project and only vendors when the distribution is created

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can get all the tests passing and all the wheels building with that go for it! I think you will have more success tackling those two battles separately but the time is yours 🙂

Comment on lines +78 to +92
py_sources = [
'__init__.py',
'array.py',
'array_stream.py',
'c_array.py',
'c_array_stream.py',
'c_buffer.py',
'c_schema.py',
'device.py',
'ipc.py',
'iterator.py',
'_repr_utils.py',
'schema.py',
'visitor.py',
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to make this a glob like "*.py" so that a new contributor doesn't have to remember to update this list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Here's some documentation in Meson as to why:

https://mesonbuild.com/FAQ.html#why-cant-i-specify-target-files-with-a-wildcard

target_src_dir = subproj_dir / "src"
shutil.copytree(src_dir / "src", target_src_dir)

# this files are only needed by bootstrap.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR I think you will have the most success just renaming bootstrap.py to generate_dist.py and including a meson lib definition for nanoarrow in python/meson.build (i.e., don't try to build nanoarrow C using a subproject).

@@ -75,7 +77,8 @@
from nanoarrow.array import array, Array
from nanoarrow.array_stream import ArrayStream
from nanoarrow.visitor import nulls_as_sentinel, nulls_forbid, nulls_separate
from nanoarrow._version import __version__ # noqa: F401

__version__ = importlib.metadata.version("nanoarrow")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche is there any issues with versioning in this way? Is there any cost to importing importlib.metadata by default?

(I am only skeptical because I haven't seen a Python package do this before)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I accidentally removed the comment, but the downside to this approach versus a dynamic version generator like miniver is that the git hash is not included as part of the project metadata.

In this case the project definition from pyproject.toml just imports whatever the build-backend provides, which is currently 0.7.0-SNAPSHOT

Comment on lines +18 to +25
# Try to resolve the symlink to a subproject first and fall back
# to the wrap entry if that is unsuccessful. Ideally Meson would
# take care of this for us, but there is a bug upstream
# https://github.com/mesonbuild/meson/issues/13746#issuecomment-2392510954
nanoarrow_proj = subproject('arrow-nanoarrow')
nanoarrow_dep = nanoarrow_proj.get_variable('nanoarrow_dep')
nanoarrow_ipc_dep = nanoarrow_proj.get_variable('nanoarrow_ipc_dep')
nanoarrow_device_dep = nanoarrow_proj.get_variable('nanoarrow_device_dep')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think you want to vendor the files using bootstrap.py or whatever facility meson provides for that and not attempt to get the nanoarrow-as-a-subproject bit working (orthogonal battle to using meson-python vs. setuptools).

@@ -38,7 +38,11 @@ Changelog = "https://github.com/apache/arrow-nanoarrow/blob/main/CHANGELOG.md"

[build-system]
requires = [
"setuptools >= 61.0.0",
"meson-python",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need a version constraint here?

@@ -103,7 +105,7 @@ if get_option('ipc')
)
nanoarrow_ipc_dep = declare_dependency(include_directories: [incdir],
link_with: nanoarrow_ipc_lib,
dependencies: [nanoarrow_dep, flatcc_dep])
dependencies: [nanoarrow_dep])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on the difference but yes, the flatcc headers are deliberately not included in nanoarrow's public headers such that a library using nanoarrow does not need flatcc/* files in the include path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants