Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jun 23, 2025

Which issue does this PR close?

Rationale for this change

At its core, if someone isn't using / modifying the pyarrow integration for arrow-rs they shouldn't have to install / configure python to get the tests working in arrow-rs

Running cargo test --workspace now also runs tests that require python to be setup and the pyarrow module to be installed. This is problematic because:

  1. Some people may not have that environment setup
  2. Apparently you can not use virtualenvs with py03 in Mac due to Can not link to virtual environment on OS X PyO3/pyo3#1741

The second item was very confusing for me while I tried to debug what going on as I ket getting an error about pyarrow not being installed, even though it was installed in my venv:

thread 'test_to_pyarrow' panicked at arrow-pyarrow/tests/pyarrow.rs:43:6:
called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ModuleNotFoundError'>, value: ModuleNotFoundError("No module named 'pyarrow'"), traceback: None }

What changes are included in this PR?

  1. Move the tests that require pyarrow to be installed into arrow-pyarrow-testing, which is not part of the workspace and thus not run with cargo test --all
  2. Remove cargo test --exclude arrow-pyarrow
  3. Add documentation on rationale and hints about running the test

Frequently Asked Questions

Why not add --exclude arrow-pyarrow to verify_release_candidate.sh?

While the minimal fix would be to add --exclude arrow-pyarrow to verify_release_candidate.sh this requires all users of arrow to remember to add --exclude arrow-pyarrow to their tests even if they don't care about python

Why not in pyarrow-arrow-integration-testing ?

I did not put this test in pyarrow-arrow-integration-testing because that module doesn't compile for me with the stock python install

Somehow python needs to be installed with the ability to make dynamic libraries that I haven't figured out and don't really want to. It seems maybe related to https://pyo3.rs/v0.18.1/getting_started#python (thanks to @Xuanwo for the pointer in PyO3/pyo3#2136 / apache/opendal#1675)

(venv) root@5e8d0406fabe:/arrow-rs/arrow-pyarrow-integration-testing# cargo test --test pyarrow
warning: `/arrow-rs/arrow-pyarrow-integration-testing/.cargo/config` is deprecated in favor of `config.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml`
   Compiling target-lexicon v0.13.2
   Compiling flatbuffers v25.2.10
   Compiling pyo3-build-config v0.24.2
   Compiling arrow-ipc v55.2.0 (/arrow-rs/arrow-ipc)
   Compiling pyo3-macros-backend v0.24.2
   Compiling pyo3-ffi v0.24.2
   Compiling pyo3 v0.24.2
   Compiling pyo3-macros v0.24.2
   Compiling arrow-pyarrow v55.2.0 (/arrow-rs/arrow-pyarrow)
   Compiling arrow v55.2.0 (/arrow-rs/arrow)
   Compiling arrow-pyarrow-integration-testing v0.1.0 (/arrow-rs/arrow-pyarrow-integration-testing)
error: linking with `cc` failed: exit status: 1
  |
  = note:  "cc" "/tmp/rustc0jx15I/symbols.o" "<43 object files omitted>" "-Wl,--as-needed" "-Wl,-Bstatic" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib/{libtest-*,libgetopts-*,libunicode_width-*,librustc_std_workspace_std-*}.rlib" "/arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/{libarrow-7996898a6777f964.rlib,libarrow_row-63508de6e52f4d4d.rlib,libarrow_pyarrow-8b510eeadc952ad2.rlib,libpyo3-c463c3a2243eeab9.rlib,libmemoffset-836dc1ddd866c614.rlib,libpyo3_ffi-fbf18d9f712874be.rlib,libunindent-2b8a456e13fa9700.rlib,libarrow_json-a7b4960a4b4d1cb5.rlib,libsimdutf8-7e080cbee40e41cd.rlib,libserde_json-0288fe0f1ec1bcde.rlib,libindexmap-ebb707a4eec26692.rlib,libequivalent-4762261bc2781d11.rlib,libarrow_ipc-085ebaaded386ff8.rlib,libflatbuffers-1f88fdf138129305.rlib,libarrow_csv-e5d679eef2b85a1b.rlib,libcsv-2288f6dec5308d9c.rlib,libitoa-fa5c9b2503c605f5.rlib,libserde-33ccdec93d601cce.rlib,libcsv_core-122def45831e6a2c.rlib,libarrow_string-17ebd7a5409511da.rlib,libregex-97f4021e65bafbca.rlib,libregex_automata-b62a0db5ace54d45.rlib,libaho_corasick-547ec01718db652c.rlib,libregex_syntax-f3065c7bb7c4592a.rlib,libmemchr-547fa7a4048cbc2e.rlib,libarrow_cast-0b7117723b343c65.rlib,libatoi-c9a52adfe9dd2564.rlib,libryu-243c2c0ae3ed75b4.rlib,libbase64-1cab23258b68443b.rlib,liblexical_core-c2a41d0a6941285f.rlib,liblexical_write_float-9d65854ce5ab8f07.rlib,liblexical_write_integer-894216b914487c18.rlib,liblexical_parse_float-274078b1af50d567.rlib,liblexical_parse_integer-781bcb0a42285559.rlib,liblexical_util-4a71e416d58e0125.rlib,libstatic_assertions-4f12831487497211.rlib,libarrow_arith-00bcbf2ec2eb3322.rlib,libarrow_ord-fcdece9f7e87a9bf.rlib,libarrow_select-31b4c3cfa277427b.rlib,libarrow_array-c2dc23f827508dc6.rlib,libahash-d573f36c088b3179.rlib,libgetrandom-31b11224f8e2ea08.rlib,liblibc-045cef5bc264baa9.rlib,libonce_cell-83f4df333969eacb.rlib,libzerocopy-0db6330db7e4b762.rlib,libhashbrown-2c7527cd2fd4322d.rlib,libchrono-6d1bc7062186f166.rlib,libiana_time_zone-875b10f893e8f81d.rlib,libarrow_data-be090acb3cb83adb.rlib,libarrow_schema-25469a5878e8c886.rlib,libbitflags-2614952e3652d907.rlib,libarrow_buffer-8eb56dc26cbe25a3.rlib,libbytes-8b0f150f04d16150.rlib,libhalf-ed72603b54882276.rlib,libcfg_if-9dbfdc9eaf8f6a2d.rlib,libnum-436acb7880d5b290.rlib,libnum_iter-87f263003ea3e8dd.rlib,libnum_rational-d812b535c653cc6e.rlib,libnum_complex-c12c249f79450951.rlib,libnum_bigint-ff983ebd6646ce72.rlib,libnum_integer-f946a0e48063a631.rlib,libnum_traits-d0a5f363c632fb69.rlib}.rlib" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib/{libstd-*,libpanic_unwind-*,libobject-*,libmemchr-*,libaddr2line-*,libgimli-*,librustc_demangle-*,libstd_detect-*,libhashbrown-*,librustc_std_workspace_alloc-*,libminiz_oxide-*,libadler2-*,libunwind-*,libcfg_if-*,liblibc-*,liballoc-*,librustc_std_workspace_core-*,libcore-*,libcompiler_builtins-*}.rlib" "-Wl,-Bdynamic" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-L" "/tmp/rustc0jx15I/raw-dylibs" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib" "-o" "/arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/pyarrow-00909cd9e7866a35" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-nodefaultlibs"
  = note: some arguments are omitted. use `--verbose` to show all linker arguments
  = note: /usr/bin/ld: /arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/libarrow_pyarrow-8b510eeadc952ad2.rlib(arrow_pyarrow-8b510eeadc952ad2.8xxa5xo5oql7wlj24034o033n.rcgu.o): in function `<pyo3::instance::Borrowed<pyo3::types::tuple::PyTuple> as pyo3::call::PyCallArgs>::call_positional':
          /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/pyo3-0.24.2/src/call.rs:213: undefined reference to `PyObject_Call'

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

@alamb alamb changed the title Alamb/fix pyarrow test Move arrow-pyarrow tests that require pyarrow to be installed into arrow-pyarrow-testing crate Jun 23, 2025
@alamb
Copy link
Contributor Author

alamb commented Jun 23, 2025

Here is an example of these tests running in CI passing: https://github.com/apache/arrow-rs/actions/runs/15824854202/job/44602338829?pr=7742#step:9:93

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/pyarrow.rs (target/debug/deps/pyarrow-67a3f10e3be5d4fb)

running 2 tests
test test_to_pyarrow_byte_view ... ok
test test_to_pyarrow ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.13s

   Doc-tests arrow_pyarrow_testing

source venv/bin/activate
cargo test -p arrow-pyarrow
- name: Run tests
cd arrow-pyarrow-testing
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the key change -- move the tests to a different module

//! pip3 install --break-system-packages pyarrow
//! ```
use arrow_array::builder::{BinaryViewBuilder, StringViewBuilder};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests are unmodified -- I just added some comments about how to run the tests locally

@alamb alamb marked this pull request as ready for review June 23, 2025 13:08
alamb added a commit that referenced this pull request Jun 23, 2025
…yarrow` and `parquet-variant` (#7745)

Note this targets a release branch, not main

I have a different proposed fix for `main`:
- #7742

I will also make a fix for parquet-variant test failures

# Which issue does this PR close?

- Related to #7394
- Related to #7736
- Related to #7746

# Rationale for this change

`cargo test --all` requires python and pyarrow installed, where it did
not in previous versions of arrow due to breaking out `arrow-pyarrow`
into its own crate in
- #7694

Also the new `parquet-variant` crate tests fail as part of the
verification script too -- see
ttps://github.com//issues/7746


In order to get a script that can automatically verify a release
candidate, let's ignore this new module for now.

More details 
- #7736 
- #7742

Note that the arrow-pyarrow tests do run as part of CI and succeed on
this branch


# What changes are included in this PR?

1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as
part of `verify-release-acndidate`


# Testing
I verified locally that `./dev/release/verify-release-candidate.sh
55.2.0 1` passes with this script

# Are there any user-facing changes?
No this is a development process only change
@alamb alamb requested a review from Copilot June 24, 2025 16:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the testing setup for the pyarrow integration by moving tests that require the pyarrow package into a separate crate, thereby decoupling them from the main workspace tests.

  • Relocates pyarrow-dependent tests to the arrow-pyarrow-testing crate
  • Updates Cargo.toml and GitHub workflow files to reflect the new testing layout
  • Adds detailed documentation regarding the environment setup for these tests

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
arrow-pyarrow-testing/tests/pyarrow.rs Adds documentation for tests that require pyarrow to be installed
arrow-pyarrow-testing/src/lib.rs Provides a minimal library for the test crate
arrow-pyarrow-testing/Cargo.toml Defines the test crate and its dependencies
Cargo.toml Updates workspace exclude list to remove arrow-pyarrow-testing from the main tests
.github/workflows/rust.yml Adjusts test commands to run workspace tests without excluding arrow-pyarrow
.github/workflows/integration.yml Adds steps to run tests in arrow-pyarrow-testing and integration testing

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@alamb
Copy link
Contributor Author

alamb commented Jul 7, 2025

@mbrobbel I wonder if you might have time to review this PR?

Copy link
Member

@mbrobbel mbrobbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks @alamb

@alamb
Copy link
Contributor Author

alamb commented Jul 7, 2025

Thank you @mbrobbel 🙏

@alamb alamb merged commit 58b897b into apache:main Jul 7, 2025
32 checks passed
@alamb alamb deleted the alamb/fix_pyarrow-test branch July 7, 2025 15:36
berkaysynnada pushed a commit to synnada-ai/arrow-rs that referenced this pull request Jul 17, 2025
…yarrow` and `parquet-variant` (apache#7745)

Note this targets a release branch, not main

I have a different proposed fix for `main`:
- apache#7742

I will also make a fix for parquet-variant test failures

# Which issue does this PR close?

- Related to apache#7394
- Related to apache#7736
- Related to apache#7746

# Rationale for this change

`cargo test --all` requires python and pyarrow installed, where it did
not in previous versions of arrow due to breaking out `arrow-pyarrow`
into its own crate in
- apache#7694

Also the new `parquet-variant` crate tests fail as part of the
verification script too -- see
ttps://github.com/apache/issues/7746


In order to get a script that can automatically verify a release
candidate, let's ignore this new module for now.

More details 
- apache#7736 
- apache#7742

Note that the arrow-pyarrow tests do run as part of CI and succeed on
this branch


# What changes are included in this PR?

1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as
part of `verify-release-acndidate`


# Testing
I verified locally that `./dev/release/verify-release-candidate.sh
55.2.0 1` passes with this script

# Are there any user-facing changes?
No this is a development process only change
berkaysynnada pushed a commit to synnada-ai/arrow-rs that referenced this pull request Aug 26, 2025
…yarrow` and `parquet-variant` (apache#7745)

Note this targets a release branch, not main

I have a different proposed fix for `main`:
- apache#7742

I will also make a fix for parquet-variant test failures

# Which issue does this PR close?

- Related to apache#7394
- Related to apache#7736
- Related to apache#7746

# Rationale for this change

`cargo test --all` requires python and pyarrow installed, where it did
not in previous versions of arrow due to breaking out `arrow-pyarrow`
into its own crate in
- apache#7694

Also the new `parquet-variant` crate tests fail as part of the
verification script too -- see
ttps://github.com/apache/issues/7746


In order to get a script that can automatically verify a release
candidate, let's ignore this new module for now.

More details 
- apache#7736 
- apache#7742

Note that the arrow-pyarrow tests do run as part of CI and succeed on
this branch


# What changes are included in this PR?

1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as
part of `verify-release-acndidate`


# Testing
I verified locally that `./dev/release/verify-release-candidate.sh
55.2.0 1` passes with this script

# Are there any user-facing changes?
No this is a development process only change
berkaysynnada pushed a commit to synnada-ai/arrow-rs that referenced this pull request Aug 26, 2025
…yarrow` and `parquet-variant` (apache#7745)

Note this targets a release branch, not main

I have a different proposed fix for `main`:
- apache#7742

I will also make a fix for parquet-variant test failures

# Which issue does this PR close?

- Related to apache#7394
- Related to apache#7736
- Related to apache#7746

# Rationale for this change

`cargo test --all` requires python and pyarrow installed, where it did
not in previous versions of arrow due to breaking out `arrow-pyarrow`
into its own crate in
- apache#7694

Also the new `parquet-variant` crate tests fail as part of the
verification script too -- see
ttps://github.com/apache/issues/7746


In order to get a script that can automatically verify a release
candidate, let's ignore this new module for now.

More details 
- apache#7736 
- apache#7742

Note that the arrow-pyarrow tests do run as part of CI and succeed on
this branch


# What changes are included in this PR?

1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as
part of `verify-release-acndidate`


# Testing
I verified locally that `./dev/release/verify-release-candidate.sh
55.2.0 1` passes with this script

# Are there any user-facing changes?
No this is a development process only change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_to_pyarrow tests fail during release verification

2 participants