[SPARK-53931][INFRA][PYTHON] Fix scheduled job for numpy 2.1.3#52633
[SPARK-53931][INFRA][PYTHON] Fix scheduled job for numpy 2.1.3#52633zhengruifeng wants to merge 2 commits intoapache:masterfrom
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @zhengruifeng . I was also worrying about that failed CI, but didn't get a chance.
|
For this one, do you think we need to document somewhere this incompatibility because our minimum
|
|
@zhengruifeng, I have a silly question about Python deps management - I see that many Python deps are declared without a version, or with a range version(half-bounded, e.g. This means that if we do not specify the dependency version, or only specify the lower bound of the dependency version, PySpark may not work once a new major version of the dependency is released. This becomes a problem if users want to create a venv for older PySpark versions (in practice, EOLed versions of Spark are used widely and upgrading is not timely). I wonder if PySpark can pin all Python deps in a fixed version(or at least a bounded range version, e.g. |
@pan3793 the reason to use lower bounds currently, most workflows are testing against latests version; and we have two workflow against the minimum versions in which the versions of key packages (numpy/pyarrow/pandas) are pinned
But I personally think maybe we should use a fixed version |
@dongjoon-hyun I am not sure since it is a pyarrow bug introduced in 19.0.0 and fixed in 19.0.1. |
@zhengruifeng, that makes a lot of sense! |
|
merged to master to restore the CI |
|
Thank you. In that case, it looks okay to me, too. It doesn't need us to pay more attention.
|
### What changes were proposed in this pull request? Fix scheduled job for numpy 2.1.3 ### Why are the changes needed? to fix https://github.com/apache/spark/actions/runs/18538043179/job/52838303733 it was caused by a bug in 19.0.0, see apache/arrow#45283 ### Does this PR introduce _any_ user-facing change? no, infra-only ### How was this patch tested? PR builder with ``` default: '{"PYSPARK_IMAGE_TO_TEST": "numpy-213", "PYTHON_TO_TEST": "python3.11"}' ``` see https://github.com/zhengruifeng/spark/actions/runs/18527303212/job/52801019275 ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#52633 from zhengruifeng/restore_numpy_213. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
What changes were proposed in this pull request?
Fix scheduled job for numpy 2.1.3
Why are the changes needed?
to fix https://github.com/apache/spark/actions/runs/18538043179/job/52838303733
it was caused by a bug in 19.0.0, see apache/arrow#45283
Does this PR introduce any user-facing change?
no, infra-only
How was this patch tested?
PR builder with
see https://github.com/zhengruifeng/spark/actions/runs/18527303212/job/52801019275
Was this patch authored or co-authored using generative AI tooling?
no