Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-33697: [CI][Python] Nightly test for PySpark 3.2.0 fail with AttributeError on numpy.bool #33714

Merged
merged 11 commits into from
Mar 1, 2023

Conversation

AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented Jan 17, 2023

Rationale for this change

Fix for nightly integration tests with PySpark 3.2.0 failure.

What changes are included in this PR?

NumPy version pin in docker-compose.yml.

Are these changes tested?

Will test on the open PR with the CI.

Are there any user-facing changes?

No.

@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #33697 has been automatically assigned in GitHub to PR creator.

@AlenkaF
Copy link
Member Author

AlenkaF commented Jan 17, 2023

@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0

@github-actions
Copy link

Revision: 8e3ab26

Submitted crossbow builds: ursacomputing/crossbow @ actions-94e388e574

Task Status
test-conda-python-3.8-spark-v3.2.0 Github Actions

@AlenkaF
Copy link
Member Author

AlenkaF commented Jan 17, 2023

@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0

@github-actions
Copy link

Revision: 70430e9

Submitted crossbow builds: ursacomputing/crossbow @ actions-7fd30999ea

Task Status
test-conda-python-3.8-spark-v3.2.0 Github Actions

@AlenkaF
Copy link
Member Author

AlenkaF commented Jan 17, 2023

@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0

1 similar comment
@AlenkaF
Copy link
Member Author

AlenkaF commented Jan 17, 2023

@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0

@github-actions
Copy link

Revision: 427191d

Submitted crossbow builds: ursacomputing/crossbow @ actions-431f284696

Task Status
test-conda-python-3.8-spark-v3.2.0 Github Actions

@AlenkaF AlenkaF marked this pull request as ready for review January 17, 2023 12:39
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AlenkaF for the PR! I have some questions because the nightly build with both spark master and 3.1.2 are already working without pinning numpy. I've pinged @kiszk as he's a spark committer because I am not sure if this will ever get backported to 3.2.0

Comment on lines 25 to 28
# https://github.com/apache/arrow/issues/33697
# numpy version pin should be removed with new apache spark release
# that includes https://github.com/apache/spark/pull/37817
ARG numpy=1.23
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already test spark master nightly, this is the current testing combination:

{% for python_version, spark_version, test_pyarrow_only in [("3.7", "v3.1.2", "false"),
                                                            ("3.8", "v3.2.0", "false"),
                                                            ("3.9", "master", "false")] %}

And the build for spark master is currently passing: https://github.com/ursacomputing/crossbow/actions/runs/3934958561/jobs/6730195747#step:5:10
Maybe we can add the numpy version to the task definition only for 3.2.0 and if it is different than latest install the pinned version. I am thinking on something like:

{% for python_version, spark_version, test_pyarrow_only, numpy_version in [("3.7", "v3.1.2", "false", "latest"),
                                                            ("3.8", "v3.2.0", "false", "1.23"),
                                                            ("3.9", "master", "false", "latest")] %}

And the corresponding if to validate if we have to install numpy or not?
@kiszk you are spark committer. I suppose this fix won't get backported to spark 3.2.0 and we have to pin numpy always for it? Should we update the tasks for our nightlies to test with spark 3.3.0 maybe remove 3.2.0?

@AlenkaF AlenkaF force-pushed the gh-33697-ci-pyspark branch from 427191d to d86c6a9 Compare February 23, 2023 11:09
@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 23, 2023

@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0

@github-actions
Copy link

Revision: d86c6a9

Submitted crossbow builds: ursacomputing/crossbow @ actions-94d502001c

Task Status
test-conda-python-3.8-spark-v3.2.0 Github Actions

@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 23, 2023

@github-actions crossbow submit test-conda-python-3.8-spark-v3.2.0

@github-actions
Copy link

Revision: 7f76899

Submitted crossbow builds: ursacomputing/crossbow @ actions-b7b5dc6d5d

Task Status
test-conda-python-3.8-spark-v3.2.0 Github Actions

@AlenkaF AlenkaF requested a review from raulcd February 23, 2023 16:17
@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 23, 2023

@github-actions crossbow submit test-conda-python--spark-

@github-actions
Copy link

Revision: 7f76899

Submitted crossbow builds: ursacomputing/crossbow @ actions-8c12467062

Task Status
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9-spark-master Github Actions

@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 27, 2023

@github-actions crossbow submit test-conda-python--spark-

@github-actions
Copy link

Revision: b1b776d

Submitted crossbow builds: ursacomputing/crossbow @ actions-5192a832e2

Task Status
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9-spark-master Github Actions

@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 27, 2023

@github-actions crossbow submit test-conda-python--spark-

@github-actions
Copy link

Revision: 9743842

Submitted crossbow builds: ursacomputing/crossbow @ actions-d30511e746

Task Status
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9-spark-master Github Actions

@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 27, 2023

@github-actions crossbow submit test-conda-python--spark-

@github-actions
Copy link

Revision: efdf9fc

Submitted crossbow builds: ursacomputing/crossbow @ actions-268559d178

Task Status
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9-spark-master Github Actions

@jorisvandenbossche
Copy link
Member

Looking at the logs, I think the line "#7 7.578 /bin/bash: /arrow/ci/scripts/install_numpy.sh: Permission denied" is the problem (the file wasn't copied, so installing numpy using the nonexisting file fails)

However, I don't see any difference with what we already do in conda-python-pandas.dockerfile (that also does such a copy), so not directly sure why that is going wrong.

@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 28, 2023

Looking at the logs, I think the line "#7 7.578 /bin/bash: /arrow/ci/scripts/install_numpy.sh: Permission denied" is the problem (the file wasn't copied, so installing numpy using the nonexisting file fails)

However, I don't see any difference with what we already do in conda-python-pandas.dockerfile (that also does such a copy), so not directly sure why that is going wrong.

Yeah, I am trying out different things locally but none work 🤷‍♀️ Also asked Raul for help, if he has any ideas what could be the issue.

@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 28, 2023

@github-actions crossbow submit test-conda-python--spark-

@github-actions github-actions bot added the awaiting review Awaiting review label Feb 28, 2023
@github-actions
Copy link

Revision: b1b3b99

Submitted crossbow builds: ursacomputing/crossbow @ actions-5f44f20302

Task Status
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9-spark-master Github Actions

@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 28, 2023

@github-actions crossbow submit test-conda-python--spark-

@github-actions
Copy link

Revision: d56f4b7

Submitted crossbow builds: ursacomputing/crossbow @ actions-cce4baca43

Task Status
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9-spark-master Github Actions

@AlenkaF
Copy link
Member Author

AlenkaF commented Mar 1, 2023

@raulcd the fix is working now, thank you!
The CI failure is not related (maybe I need a rebase?).

ci/scripts/install_numpy.sh Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Mar 1, 2023
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 1, 2023
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AlenkaF I am going to trigger a final run just to validate and will merge once it finishes.

@raulcd
Copy link
Member

raulcd commented Mar 1, 2023

@github-actions crossbow submit test-conda-python--spark-

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Mar 1, 2023
@github-actions
Copy link

github-actions bot commented Mar 1, 2023

Revision: 1ebe276

Submitted crossbow builds: ursacomputing/crossbow @ actions-fd4f9f54ae

Task Status
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9-spark-master Github Actions

@raulcd raulcd merged commit 4c1448e into apache:main Mar 1, 2023
@AlenkaF AlenkaF deleted the gh-33697-ci-pyspark branch March 1, 2023 14:07
@ursabot
Copy link

ursabot commented Mar 2, 2023

Benchmark runs are scheduled for baseline = f9a1d19 and contender = 4c1448e. 4c1448e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.15% ⬆️0.06%] test-mac-arm
[Finished ⬇️0.51% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.16% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 4c1448e8 ec2-t3-xlarge-us-east-2
[Finished] 4c1448e8 test-mac-arm
[Finished] 4c1448e8 ursa-i9-9960x
[Finished] 4c1448e8 ursa-thinkcentre-m75q
[Finished] f9a1d198 ec2-t3-xlarge-us-east-2
[Finished] f9a1d198 test-mac-arm
[Finished] f9a1d198 ursa-i9-9960x
[Finished] f9a1d198 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting merge Awaiting merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI][Python] Nightly test for PySpark 3.2.0 fail with AttributeError on numpy.bool
5 participants