-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8 #26194
Conversation
Test build #112392 has finished for PR 26194 at commit
|
Hm .. let me investigate test failures further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK pending tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good reason to upgrade. Would now be a good time to stop including the file and use the official package or as a zip file in pyspark/lib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine. I've tried this locally with 3.8 beta before.
@BryanCutler, yea, I think we should. I will take a look separately if you don't mind. It will need a fix here and there. One side concern is that it's pretty difficult to use it as the official package. We can try it with a zip I suspect .. |
retest this please |
Test build #112421 has finished for PR 26194 at commit
|
Test build #112424 has finished for PR 26194 at commit
|
Test build #112429 has finished for PR 26194 at commit
|
Test build #112432 has finished for PR 26194 at commit
|
Test build #112438 has finished for PR 26194 at commit
|
The test failure is by cloudpipe/cloudpickle#278 . Let me stick to 1.1.1 for now since we didn't drop Python 2 support yet. |
Test build #112441 has finished for PR 26194 at commit
|
Test build #112443 has finished for PR 26194 at commit
|
Last commit is just adding metadata which does not affect pip check. I double checked the Python linter pass. Merged to master. Thanks, @srowen, @BryanCutler and @viirya |
So, does it mean that I have to upgrade Spark and PySpark to 3.0 to use Python3.8? I tried to run with Spark 2.3.2 and PySpark 3.0.0 but it failed. |
Yup. |
I updated SPARK-29536 by adding |
Thanks @dongjoon-hyun. |
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
@HyukjinKwon But Getting below error message while initializing pyspark in windows |
Hi, @Neeraj9697 . The following is the result of Apache Spark 3.1.1 on Python 3.9.1 on Mac.
|
Could you file a JIRA issue with your specific information please, @Neeraj9697 ? |
Like the following, we officially documented that some features (like
|
Assuming from the error message, looks like you have multiple PySpark installed in your local and the paths messed up for some reasons. pyspark.cloudpickle is a package now but your error message said that this is a module (before this fix). |
Thanks @HyukjinKwon and @dongjoon-hyun https://gist.github.com/Neeraj9697/f7aa1c6951bd3021eb48ccb919cbfd57 |
Great! Thank you for confirmation, @Neeraj9697 . For your new issue, please install
|
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
What changes were proposed in this pull request?
Inline cloudpickle in PySpark to cloudpickle 1.1.1. See https://github.com/cloudpipe/cloudpickle/blob/v1.1.1/cloudpickle/cloudpickle.py
cloudpipe/cloudpickle#269 was added for Python 3.8 support (fixed from 1.1.0). Using 1.2.2 seems breaking PyPy 2 due to cloudpipe/cloudpickle#278 so this PR currently uses 1.1.1.
Once we drop Python 2, we can switch to the highest version.
Why are the changes needed?
positional-only arguments was newly introduced from Python 3.8 (see https://docs.python.org/3/whatsnew/3.8.html#positional-only-parameters)
Particularly the newly added argument to
types.CodeType
was the problem (https://docs.python.org/3/whatsnew/3.8.html#changes-in-the-python-api):Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manually tested. Note that the optional dependency PyArrow looks not yet supporting Python 3.8; therefore, it was not tested. See "Details" below.
cd python ./run-tests --python-executables=python3.8