[SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs #33356

HyukjinKwon · 2021-07-15T07:16:25Z

What changes were proposed in this pull request?

This PR proposes to use Python 3.9 in documentation and linter at GitHub Actions. This PR also contains the fixes for mypy check (introduced by Python 3.9 upgrade)

python/pyspark/sql/pandas/_typing/protocols/frame.pyi:64: error: Name "np.ndarray" is not defined
python/pyspark/sql/pandas/_typing/protocols/frame.pyi:91: error: Name "np.recarray" is not defined
python/pyspark/sql/pandas/_typing/protocols/frame.pyi:165: error: Name "np.ndarray" is not defined
python/pyspark/pandas/categorical.py:82: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/categorical.py:109: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered"
python/pyspark/ml/linalg/__init__.pyi:184: error: Return type "ndarray[Any, Any]" of "toArray" incompatible with return type "NoReturn" in supertype "Matrix"
python/pyspark/ml/linalg/__init__.pyi:217: error: Return type "ndarray[Any, Any]" of "toArray" incompatible with return type "NoReturn" in supertype "Matrix"
python/pyspark/pandas/typedef/typehints.py:163: error: Module has no attribute "bool"; maybe "bool_" or "bool8"?
python/pyspark/pandas/typedef/typehints.py:174: error: Module has no attribute "float"; maybe "float_", "cfloat", or "float96"?
python/pyspark/pandas/typedef/typehints.py:180: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"?
python/pyspark/pandas/ml.py:81: error: Value of type variable "_DTypeScalar_co" of "dtype" cannot be "object"
python/pyspark/pandas/indexing.py:1649: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"?
python/pyspark/pandas/indexing.py:1656: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"?
python/pyspark/pandas/frame.py:4969: error: Function "numpy.array" is not valid as a type
python/pyspark/pandas/frame.py:4969: note: Perhaps you need "Callable[...]" or a callback protocol?
python/pyspark/pandas/frame.py:4970: error: Function "numpy.array" is not valid as a type
python/pyspark/pandas/frame.py:4970: note: Perhaps you need "Callable[...]" or a callback protocol?
python/pyspark/pandas/frame.py:7402: error: "List[Any]" has no attribute "tolist"
python/pyspark/pandas/series.py:1030: error: Module has no attribute "_NoValue"
python/pyspark/pandas/series.py:1031: error: Module has no attribute "_NoValue"
python/pyspark/pandas/indexes/category.py:159: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/indexes/category.py:180: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered"
python/pyspark/pandas/namespace.py:2036: error: Argument 1 to "column_name" has incompatible type "float"; expected "str"
python/pyspark/pandas/mlflow.py:59: error: Incompatible types in assignment (expression has type "Type[floating[Any]]", variable has type "str")
python/pyspark/pandas/data_type_ops/categorical_ops.py:43: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/data_type_ops/categorical_ops.py:43: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered"
python/pyspark/pandas/data_type_ops/categorical_ops.py:56: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/tests/test_typedef.py:70: error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py:77: error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py:85: error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py:100: error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py:108: error: Name "np.float" is not defined
python/pyspark/mllib/clustering.pyi:152: error: Incompatible types in assignment (expression has type "ndarray[Any, Any]", base class "KMeansModel" defined the type as "List[ndarray[Any, Any]]")
python/pyspark/mllib/classification.pyi:93: error: Signature of "predict" incompatible with supertype "LinearClassificationModel"
Found 32 errors in 15 files (checked 315 source files)
1

Why are the changes needed?

Python 3.6 is deprecated at SPARK-35938

Does this PR introduce any user-facing change?

No. Maybe static analysis, etc. by some type hints but they are really non-breaking..

How was this patch tested?

I manually checked by GitHub Actions build in forked repository.

HyukjinKwon · 2021-07-15T07:19:43Z

python/pyspark/mllib/classification.pyi

    def __init__(self, weights: Vector, intercept: float) -> None: ...
-    @overload
+    @overload  # type: ignore
    def predict(self, x: VectorLike) -> float64: ...


I can't follow why it complains:

python/pyspark/mllib/classification.pyi:93: error: Signature of "predict" incompatible with supertype "LinearClassificationModel"

Presumably because of different variable name? we can't change for compatibility reason.

HyukjinKwon · 2021-07-15T07:20:20Z

python/pyspark/pandas/indexing.py

    ) -> Tuple[Optional[Column], Optional[int], Optional[int]]:
        sdf = self._internal.spark_frame

-        if any(isinstance(key, (int, np.int, np.int64, np.int32)) and key < 0 for key in rows_sel):


np.int is now same is Python built-in int

np.float == float, etc. too

HyukjinKwon · 2021-07-15T07:20:39Z

python/pyspark/pandas/ml.py

    # TODO, it should be more robust.
    accepted_types = {
-        np.dtype(dt)
+        np.dtype(dt)  # type: ignore


I couldn't figure out how to fix:

python/pyspark/pandas/ml.py:81: error: Value of type variable "_DTypeScalar_co" of "dtype" cannot be "object"

HyukjinKwon · 2021-07-15T07:22:01Z

cc @xinrong-databricks and @itholic too FYI

dongjoon-hyun · 2021-07-15T07:35:54Z

Thank you for making a PR, @HyukjinKwon !

.github/workflows/build_and_test.yml

dongjoon-hyun

+1, LGTM (with one minor comment. Pending CIs)

SparkQA · 2021-07-15T07:50:47Z

Test build #141059 has finished for PR 33356 at commit e35f765.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-15T08:29:00Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45574/

dongjoon-hyun · 2021-07-15T08:35:25Z

Jenkins seems to fail because it's still running with python3.6.

Will test against the following Python executables: ['python3.6', 'pypy3']

HyukjinKwon · 2021-07-15T08:57:02Z

test failure should be unrelated.

HyukjinKwon · 2021-07-15T08:57:28Z

let me make sure Jenkins tests pass though before merging!

SparkQA · 2021-07-15T09:02:41Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45574/

SparkQA · 2021-07-15T09:30:25Z

Test build #141067 has finished for PR 33356 at commit 35f6ca7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-15T09:48:27Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45582/

SparkQA · 2021-07-15T10:21:34Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45582/

HyukjinKwon · 2021-07-15T11:43:51Z

++ python -S -c 'import random; print(random.randrange(100000, 999999))'
./dev/test-dependencies.sh: line 55: python: command not found

The test failure will be fixed in #33368. I am making a separate PR because that PR has to be backported.

dongjoon-hyun · 2021-07-15T15:00:37Z

I reviewed and merged #33368 first.

dongjoon-hyun · 2021-07-15T15:01:24Z

Merged to master for Apache Spark 3.3.

dongjoon-hyun · 2021-07-15T15:02:09Z

It looks fine~ I'll merged it~ :)

HyukjinKwon · 2021-07-16T02:41:02Z

Actually, let me merge this into Spark 3.2 branch too just to reduce conflicts. 3.6 is deprecated in branch-3.2 so upgrading should be fine .. I guess ..

…o 3.9 in GitHub Actions' linter/docs This PR proposes to use Python 3.9 in documentation and linter at GitHub Actions. This PR also contains the fixes for mypy check (introduced by Python 3.9 upgrade) ``` python/pyspark/sql/pandas/_typing/protocols/frame.pyi:64: error: Name "np.ndarray" is not defined python/pyspark/sql/pandas/_typing/protocols/frame.pyi:91: error: Name "np.recarray" is not defined python/pyspark/sql/pandas/_typing/protocols/frame.pyi:165: error: Name "np.ndarray" is not defined python/pyspark/pandas/categorical.py:82: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories" python/pyspark/pandas/categorical.py:109: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered" python/pyspark/ml/linalg/__init__.pyi:184: error: Return type "ndarray[Any, Any]" of "toArray" incompatible with return type "NoReturn" in supertype "Matrix" python/pyspark/ml/linalg/__init__.pyi:217: error: Return type "ndarray[Any, Any]" of "toArray" incompatible with return type "NoReturn" in supertype "Matrix" python/pyspark/pandas/typedef/typehints.py:163: error: Module has no attribute "bool"; maybe "bool_" or "bool8"? python/pyspark/pandas/typedef/typehints.py:174: error: Module has no attribute "float"; maybe "float_", "cfloat", or "float96"? python/pyspark/pandas/typedef/typehints.py:180: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"? python/pyspark/pandas/ml.py:81: error: Value of type variable "_DTypeScalar_co" of "dtype" cannot be "object" python/pyspark/pandas/indexing.py:1649: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"? python/pyspark/pandas/indexing.py:1656: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"? python/pyspark/pandas/frame.py:4969: error: Function "numpy.array" is not valid as a type python/pyspark/pandas/frame.py:4969: note: Perhaps you need "Callable[...]" or a callback protocol? python/pyspark/pandas/frame.py:4970: error: Function "numpy.array" is not valid as a type python/pyspark/pandas/frame.py:4970: note: Perhaps you need "Callable[...]" or a callback protocol? python/pyspark/pandas/frame.py:7402: error: "List[Any]" has no attribute "tolist" python/pyspark/pandas/series.py:1030: error: Module has no attribute "_NoValue" python/pyspark/pandas/series.py:1031: error: Module has no attribute "_NoValue" python/pyspark/pandas/indexes/category.py:159: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories" python/pyspark/pandas/indexes/category.py:180: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered" python/pyspark/pandas/namespace.py:2036: error: Argument 1 to "column_name" has incompatible type "float"; expected "str" python/pyspark/pandas/mlflow.py:59: error: Incompatible types in assignment (expression has type "Type[floating[Any]]", variable has type "str") python/pyspark/pandas/data_type_ops/categorical_ops.py:43: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories" python/pyspark/pandas/data_type_ops/categorical_ops.py:43: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered" python/pyspark/pandas/data_type_ops/categorical_ops.py:56: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories" python/pyspark/pandas/tests/test_typedef.py:70: error: Name "np.float" is not defined python/pyspark/pandas/tests/test_typedef.py:77: error: Name "np.float" is not defined python/pyspark/pandas/tests/test_typedef.py:85: error: Name "np.float" is not defined python/pyspark/pandas/tests/test_typedef.py:100: error: Name "np.float" is not defined python/pyspark/pandas/tests/test_typedef.py:108: error: Name "np.float" is not defined python/pyspark/mllib/clustering.pyi:152: error: Incompatible types in assignment (expression has type "ndarray[Any, Any]", base class "KMeansModel" defined the type as "List[ndarray[Any, Any]]") python/pyspark/mllib/classification.pyi:93: error: Signature of "predict" incompatible with supertype "LinearClassificationModel" Found 32 errors in 15 files (checked 315 source files) 1 ``` Python 3.6 is deprecated at SPARK-35938 No. Maybe static analysis, etc. by some type hints but they are really non-breaking.. I manually checked by GitHub Actions build in forked repository. Closes #33356 from HyukjinKwon/SPARK-36146. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit a71dd6a) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

dongjoon-hyun · 2021-07-16T05:19:18Z

Ur, @HyukjinKwon . This requires [SPARK-36165][INFRA] Fix SQL doc generation in GitHub Action.
I'll backport it, too.

HyukjinKwon · 2021-07-16T05:24:03Z

oops. thanks you @dongjoon-hyun.

dongjoon-hyun · 2021-07-16T05:33:49Z

No problem~ Actually, it was my bad which didn't check it clearly at this PR. ;)

HyukjinKwon added 2 commits July 15, 2021 13:57

Upgrade Python version from 3.6 to higher version in GitHub linter

36fc730

Upgrade Python version from 3.6 to higher version in GitHub linter

e35f765

github-actions bot added CORE INFRA ML MLLIB PYTHON SQL labels Jul 15, 2021

HyukjinKwon commented Jul 15, 2021

View reviewed changes

HyukjinKwon requested review from dongjoon-hyun, ueshin and zero323 and removed request for zero323 July 15, 2021 07:21

HyukjinKwon changed the title ~~[SPARK-36146][PYTHON][INFRA][TEST] Upgrade Python version from 3.6 to higher version in GitHub linter~~ [SPARK-36146][PYTHON][INFRA][TEST] Upgrade Python version from 3.6 to 3.9 in GitHub linter/docs Jul 15, 2021

HyukjinKwon changed the title ~~[SPARK-36146][PYTHON][INFRA][TEST] Upgrade Python version from 3.6 to 3.9 in GitHub linter/docs~~ [SPARK-36146][PYTHON][INFRA][TEST] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs Jul 15, 2021

dongjoon-hyun reviewed Jul 15, 2021

View reviewed changes

.github/workflows/build_and_test.yml Outdated Show resolved Hide resolved

dongjoon-hyun approved these changes Jul 15, 2021

View reviewed changes

Address comments

35f6ca7

HyukjinKwon changed the title ~~[SPARK-36146][PYTHON][INFRA][TEST] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs~~ [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs Jul 15, 2021

dongjoon-hyun closed this in a71dd6a Jul 15, 2021

HyukjinKwon deleted the SPARK-36146 branch January 4, 2022 00:53

[SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs #33356

[SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs #33356

Uh oh!

Conversation

HyukjinKwon commented Jul 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HyukjinKwon Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jul 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Jul 15, 2021

Uh oh!

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 15, 2021

Uh oh!

SparkQA commented Jul 15, 2021

Uh oh!

dongjoon-hyun commented Jul 15, 2021

Uh oh!

HyukjinKwon commented Jul 15, 2021

Uh oh!

HyukjinKwon commented Jul 15, 2021

Uh oh!

SparkQA commented Jul 15, 2021

Uh oh!

SparkQA commented Jul 15, 2021

Uh oh!

SparkQA commented Jul 15, 2021

Uh oh!

SparkQA commented Jul 15, 2021

Uh oh!

HyukjinKwon commented Jul 15, 2021

Uh oh!

dongjoon-hyun commented Jul 15, 2021

Uh oh!

dongjoon-hyun commented Jul 15, 2021

Uh oh!

dongjoon-hyun commented Jul 15, 2021

Uh oh!

HyukjinKwon commented Jul 16, 2021

Uh oh!

dongjoon-hyun commented Jul 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Jul 16, 2021

Uh oh!

dongjoon-hyun commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HyukjinKwon commented Jul 15, 2021 •

edited

Loading

HyukjinKwon commented Jul 15, 2021 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun commented Jul 16, 2021 •

edited

Loading