Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37381: [Python][CI][Packaging] Enable ORC in Windows wheels and Appveyor CI #40609

Merged
merged 2 commits into from
Mar 20, 2024

Conversation

wgtmac
Copy link
Member

@wgtmac wgtmac commented Mar 17, 2024

Rationale for this change

The pyarrow orc reader always crashes when it tries to create an internal orc reader. This is caused by failing to read tz database on the local host. This also disables windows wheel build when ARROW_ORC is turned on.

What changes are included in this PR?

Download IANA timezone database on the test host and explicitly setting TZDIR to make the CIs happy.

Are these changes tested?

Make sure all python wheel windows CIs pass.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the awaiting review Awaiting review label Mar 17, 2024
@wgtmac wgtmac force-pushed the fix_orc_segfault branch from b200344 to 0c91267 Compare March 17, 2024 06:28
@wgtmac

This comment was marked as outdated.

@kou

This comment was marked as outdated.

@wgtmac wgtmac marked this pull request as draft March 18, 2024 05:29
@wgtmac wgtmac force-pushed the fix_orc_segfault branch from 222db36 to 2318d9a Compare March 19, 2024 02:03
@wgtmac wgtmac force-pushed the fix_orc_segfault branch from 2318d9a to 8c1d0d4 Compare March 19, 2024 02:24
@wgtmac wgtmac changed the title WIP: Fix ORC segfault in pyarrow test GH-36026: Fix ORC test segfault in the python wheel windows test Mar 19, 2024
@wgtmac

This comment was marked as outdated.

This comment was marked as outdated.

@wgtmac

This comment was marked as outdated.

@apache apache deleted a comment from github-actions bot Mar 19, 2024
@apache apache deleted a comment from github-actions bot Mar 19, 2024
@apache apache deleted a comment from github-actions bot Mar 19, 2024
@apache apache deleted a comment from github-actions bot Mar 19, 2024
@apache apache deleted a comment from github-actions bot Mar 19, 2024
@apache apache deleted a comment from github-actions bot Mar 19, 2024

This comment was marked as outdated.

@apache apache deleted a comment from github-actions bot Mar 19, 2024
@kou kou changed the title GH-36026: Fix ORC test segfault in the python wheel windows test GH-36026: [Python] Fix ORC test segfault in the python wheel windows test Mar 19, 2024
@wgtmac
Copy link
Member Author

wgtmac commented Mar 19, 2024

C:\>curl https://cygwin.osuosl.org/noarch/release/tzdata/tzdata-2024a-1.tar.xz --output tzdata.tar.gz   || exit /B 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 51  451k   51  234k    0     0   234k      0  0:00:01 --:--:--  0:00:01  300k
100  451k  100  451k    0     0   451k      0  0:00:01 --:--:--  0:00:01  466k
C:\>mkdir C:\Users\ContainerAdministrator\Downloads\test\tzdata 
C:\>tar --extract --file tzdata.tar.gz --directory C:\Users\ContainerAdministrator\Downloads\test\tzdata 
tar: Error opening archive: Can't initialize filter; unable to run program "xz -d -qq"

Do you know any approach to unzip tar.xz file on Windows in the CI build? @kou

@wgtmac wgtmac marked this pull request as ready for review March 19, 2024 20:19
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

dev/tasks/python-wheels/github.windows.yml Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting merge Awaiting merge awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting committer review Awaiting committer review awaiting review Awaiting review awaiting merge Awaiting merge labels Mar 20, 2024
@wgtmac
Copy link
Member Author

wgtmac commented Mar 20, 2024

@github-actions crossbow submit wheel-windows*

Copy link

Revision: 1a2634b

Submitted crossbow builds: ursacomputing/crossbow @ actions-2e59d7101d

Task Status
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp38-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 20, 2024
@wgtmac
Copy link
Member Author

wgtmac commented Mar 20, 2024

Revision: 1a2634b

Submitted crossbow builds: ursacomputing/crossbow @ actions-2e59d7101d

Task Status
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp38-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

Step 6/8 : RUN choco install -r -y --no-progress python --version=%PYTHON_VERSION%
 ---> Running in 99f097760eef
Installing the following packages:
python
By installing you accept licenses for the packages.
python not installed. An error occurred during installation:
 The remote server returned an error: (504) Gateway Timeout. Gateway Time-out
python package files install completed. Performing other installation steps.
The install of python was NOT successful.
python not installed. An error occurred during installation:
 The remote server returned an error: (504) Gateway Timeout. Gateway Time-out

Chocolatey installed 0/1 packages. 1 packages failed.
 See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).

Failures
 - python (exited 1) - python not installed. An error occurred during installation:
 The remote server returned an error: (504) Gateway Timeout. Gateway Time-out
The command 'cmd /S /C choco install -r -y --no-progress python --version=%PYTHON_VERSION%' returned a non-zero code: 1
Error: docker build --build-arg BUILDKIT_INLINE_CACHE=1 --build-arg python=3.9 -f D:\a\crossbow\crossbow\arrow\ci/docker/python-wheel-windows-test-vs2019.dockerfile -t ghcr.io/ursacomputing/arrow:python-3.9-wheel-windows-test-vs2019-2024-03-19 D:\a\crossbow\crossbow\arrow exited with non-zero exit code 1
docker pull ghcr.io/ursacomputing/arrow:python-3.9-wheel-windows-test-vs2019-2024-03-19 exited with non-zero exit code 1
Error: Process completed with exit code 1.

Failed with above logs. I don't think we need to rerun them.

@wgtmac
Copy link
Member Author

wgtmac commented Mar 20, 2024

Failure in AMD64 Ubuntu 20.04 R 4.3 Force-Tests true is also unrelated.

══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test-duckdb.R:286:5'): to_duckdb passing a connection ───────────────
Error in `unclass(unclass(table_four)$lazy_query$x)$table`: $ operator is invalid for atomic vectors

[ FAIL 1 | WARN 0 | SKIP 22 | PASS 8265 ]
Error: Test failures
Execution halted

1 error ✖ | 0 warnings ✔ | 2 notes ✖
1
Error: `docker-compose --file /home/runner/work/arrow/arrow/docker-compose.yml run --rm -e TZ=MART -e ARROW_R_FORCE_TESTS=true ubuntu-r` exited with a non-zero exit code 1, see the process log above.

@wgtmac
Copy link
Member Author

wgtmac commented Mar 20, 2024

@kou Do you want to take another pass?

@kou kou merged commit 0eefb07 into apache:main Mar 20, 2024
52 of 53 checks passed
@kou kou removed the awaiting changes Awaiting changes label Mar 20, 2024
@kou
Copy link
Member

kou commented Mar 20, 2024

Thanks!

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Mar 20, 2024

I am not sure that we should enable ORC in the wheels if this essentially always segfaults (on first use, or before you know what to do) for users.

At least we could add a check on our side, if on windows, that the path is available, and if not raise an informative error to the user instead of segfaulting

@pitrou
Copy link
Member

pitrou commented Mar 20, 2024

Right. It seems that the ORC project should first make it possible to use without the timezone data files.

@jorisvandenbossche jorisvandenbossche changed the title GH-36026: [Python] Fix ORC test segfault in the python wheel windows test GH-37381: [Python][CI][Packaging] Enable ORC in Windows wheels and Appveyor CI Mar 20, 2024
@wgtmac
Copy link
Member Author

wgtmac commented Mar 20, 2024

This PR aims to enable ORC in the wheels CI in case there is any new regression. I will work on the ORC side to make it robust as much as possible.

@jorisvandenbossche
Copy link
Member

For CI we also have Appveyor, which was enabled here as well. But the wheels is what we deliver to our users. If we enable ORC there, I think we need to make this more robust on our side on the short term (or disable it again in the wheels until this is fixed in ORC and we use that version)

@wgtmac
Copy link
Member Author

wgtmac commented Mar 20, 2024

But the wheels is what we deliver to our users.

Sorry I'm not familiar with that. It sounds that we need to revert this change at least for wheels.

Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 0eefb07.

There were 8 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants