-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs #33356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| def __init__(self, weights: Vector, intercept: float) -> None: ... | ||
| @overload | ||
| @overload # type: ignore | ||
| def predict(self, x: VectorLike) -> float64: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't follow why it complains:
python/pyspark/mllib/classification.pyi:93: error: Signature of "predict" incompatible with supertype "LinearClassificationModel"
Presumably because of different variable name? we can't change for compatibility reason.
| ) -> Tuple[Optional[Column], Optional[int], Optional[int]]: | ||
| sdf = self._internal.spark_frame | ||
|
|
||
| if any(isinstance(key, (int, np.int, np.int64, np.int32)) and key < 0 for key in rows_sel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.int is now same is Python built-in int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.float == float, etc. too
| # TODO, it should be more robust. | ||
| accepted_types = { | ||
| np.dtype(dt) | ||
| np.dtype(dt) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out how to fix:
python/pyspark/pandas/ml.py:81: error: Value of type variable "_DTypeScalar_co" of "dtype" cannot be "object"
|
cc @xinrong-databricks and @itholic too FYI |
|
Thank you for making a PR, @HyukjinKwon ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (with one minor comment. Pending CIs)
|
Test build #141059 has finished for PR 33356 at commit
|
|
Kubernetes integration test starting |
|
Jenkins seems to fail because it's still running with |
|
test failure should be unrelated. |
|
let me make sure Jenkins tests pass though before merging! |
|
Kubernetes integration test status success |
|
Test build #141067 has finished for PR 33356 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
The test failure will be fixed in #33368. I am making a separate PR because that PR has to be backported. |
|
I reviewed and merged #33368 first. |
|
Merged to master for Apache Spark 3.3. |
|
It looks fine~ I'll merged it~ :) |
|
Actually, let me merge this into Spark 3.2 branch too just to reduce conflicts. 3.6 is deprecated in branch-3.2 so upgrading should be fine .. I guess .. |
…o 3.9 in GitHub Actions' linter/docs This PR proposes to use Python 3.9 in documentation and linter at GitHub Actions. This PR also contains the fixes for mypy check (introduced by Python 3.9 upgrade) ``` python/pyspark/sql/pandas/_typing/protocols/frame.pyi:64: error: Name "np.ndarray" is not defined python/pyspark/sql/pandas/_typing/protocols/frame.pyi:91: error: Name "np.recarray" is not defined python/pyspark/sql/pandas/_typing/protocols/frame.pyi:165: error: Name "np.ndarray" is not defined python/pyspark/pandas/categorical.py:82: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories" python/pyspark/pandas/categorical.py:109: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered" python/pyspark/ml/linalg/__init__.pyi:184: error: Return type "ndarray[Any, Any]" of "toArray" incompatible with return type "NoReturn" in supertype "Matrix" python/pyspark/ml/linalg/__init__.pyi:217: error: Return type "ndarray[Any, Any]" of "toArray" incompatible with return type "NoReturn" in supertype "Matrix" python/pyspark/pandas/typedef/typehints.py:163: error: Module has no attribute "bool"; maybe "bool_" or "bool8"? python/pyspark/pandas/typedef/typehints.py:174: error: Module has no attribute "float"; maybe "float_", "cfloat", or "float96"? python/pyspark/pandas/typedef/typehints.py:180: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"? python/pyspark/pandas/ml.py:81: error: Value of type variable "_DTypeScalar_co" of "dtype" cannot be "object" python/pyspark/pandas/indexing.py:1649: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"? python/pyspark/pandas/indexing.py:1656: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"? python/pyspark/pandas/frame.py:4969: error: Function "numpy.array" is not valid as a type python/pyspark/pandas/frame.py:4969: note: Perhaps you need "Callable[...]" or a callback protocol? python/pyspark/pandas/frame.py:4970: error: Function "numpy.array" is not valid as a type python/pyspark/pandas/frame.py:4970: note: Perhaps you need "Callable[...]" or a callback protocol? python/pyspark/pandas/frame.py:7402: error: "List[Any]" has no attribute "tolist" python/pyspark/pandas/series.py:1030: error: Module has no attribute "_NoValue" python/pyspark/pandas/series.py:1031: error: Module has no attribute "_NoValue" python/pyspark/pandas/indexes/category.py:159: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories" python/pyspark/pandas/indexes/category.py:180: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered" python/pyspark/pandas/namespace.py:2036: error: Argument 1 to "column_name" has incompatible type "float"; expected "str" python/pyspark/pandas/mlflow.py:59: error: Incompatible types in assignment (expression has type "Type[floating[Any]]", variable has type "str") python/pyspark/pandas/data_type_ops/categorical_ops.py:43: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories" python/pyspark/pandas/data_type_ops/categorical_ops.py:43: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered" python/pyspark/pandas/data_type_ops/categorical_ops.py:56: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories" python/pyspark/pandas/tests/test_typedef.py:70: error: Name "np.float" is not defined python/pyspark/pandas/tests/test_typedef.py:77: error: Name "np.float" is not defined python/pyspark/pandas/tests/test_typedef.py:85: error: Name "np.float" is not defined python/pyspark/pandas/tests/test_typedef.py:100: error: Name "np.float" is not defined python/pyspark/pandas/tests/test_typedef.py:108: error: Name "np.float" is not defined python/pyspark/mllib/clustering.pyi:152: error: Incompatible types in assignment (expression has type "ndarray[Any, Any]", base class "KMeansModel" defined the type as "List[ndarray[Any, Any]]") python/pyspark/mllib/classification.pyi:93: error: Signature of "predict" incompatible with supertype "LinearClassificationModel" Found 32 errors in 15 files (checked 315 source files) 1 ``` Python 3.6 is deprecated at SPARK-35938 No. Maybe static analysis, etc. by some type hints but they are really non-breaking.. I manually checked by GitHub Actions build in forked repository. Closes #33356 from HyukjinKwon/SPARK-36146. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit a71dd6a) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
|
Ur, @HyukjinKwon . This requires |
|
oops. thanks you @dongjoon-hyun. |
|
No problem~ Actually, it was my bad which didn't check it clearly at this PR. ;) |
What changes were proposed in this pull request?
This PR proposes to use Python 3.9 in documentation and linter at GitHub Actions. This PR also contains the fixes for mypy check (introduced by Python 3.9 upgrade)
Why are the changes needed?
Python 3.6 is deprecated at SPARK-35938
Does this PR introduce any user-facing change?
No. Maybe static analysis, etc. by some type hints but they are really non-breaking..
How was this patch tested?
I manually checked by GitHub Actions build in forked repository.