[SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8 #26194

HyukjinKwon · 2019-10-21T12:57:28Z

What changes were proposed in this pull request?

Inline cloudpickle in PySpark to cloudpickle 1.1.1. See https://github.com/cloudpipe/cloudpickle/blob/v1.1.1/cloudpickle/cloudpickle.py

cloudpipe/cloudpickle#269 was added for Python 3.8 support (fixed from 1.1.0). Using 1.2.2 seems breaking PyPy 2 due to cloudpipe/cloudpickle#278 so this PR currently uses 1.1.1.

Once we drop Python 2, we can switch to the highest version.

Why are the changes needed?

positional-only arguments was newly introduced from Python 3.8 (see https://docs.python.org/3/whatsnew/3.8.html#positional-only-parameters)

Particularly the newly added argument to types.CodeType was the problem (https://docs.python.org/3/whatsnew/3.8.html#changes-in-the-python-api):

types.CodeType has a new parameter in the second position of the constructor (posonlyargcount) to support positional-only arguments defined in PEP 570. The first argument (argcount) now represents the total number of positional arguments (including positional-only arguments). The new replace() method of types.CodeType can be used to make the code future-proof.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually tested. Note that the optional dependency PyArrow looks not yet supporting Python 3.8; therefore, it was not tested. See "Details" below.

cd python
./run-tests --python-executables=python3.8

Running PySpark tests. Output is in /Users/hyukjin.kwon/workspace/forked/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.8']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Starting test(python3.8): pyspark.ml.tests.test_algorithms
Starting test(python3.8): pyspark.ml.tests.test_feature
Starting test(python3.8): pyspark.ml.tests.test_base
Starting test(python3.8): pyspark.ml.tests.test_evaluation
Finished test(python3.8): pyspark.ml.tests.test_base (12s)
Starting test(python3.8): pyspark.ml.tests.test_image
Finished test(python3.8): pyspark.ml.tests.test_evaluation (14s)
Starting test(python3.8): pyspark.ml.tests.test_linalg
Finished test(python3.8): pyspark.ml.tests.test_feature (23s)
Starting test(python3.8): pyspark.ml.tests.test_param
Finished test(python3.8): pyspark.ml.tests.test_image (22s)
Starting test(python3.8): pyspark.ml.tests.test_persistence
Finished test(python3.8): pyspark.ml.tests.test_param (25s)
Starting test(python3.8): pyspark.ml.tests.test_pipeline
Finished test(python3.8): pyspark.ml.tests.test_linalg (37s)
Starting test(python3.8): pyspark.ml.tests.test_stat
Finished test(python3.8): pyspark.ml.tests.test_pipeline (7s)
Starting test(python3.8): pyspark.ml.tests.test_training_summary
Finished test(python3.8): pyspark.ml.tests.test_stat (21s)
Starting test(python3.8): pyspark.ml.tests.test_tuning
Finished test(python3.8): pyspark.ml.tests.test_persistence (45s)
Starting test(python3.8): pyspark.ml.tests.test_wrapper
Finished test(python3.8): pyspark.ml.tests.test_algorithms (83s)
Starting test(python3.8): pyspark.mllib.tests.test_algorithms
Finished test(python3.8): pyspark.ml.tests.test_training_summary (32s)
Starting test(python3.8): pyspark.mllib.tests.test_feature
Finished test(python3.8): pyspark.ml.tests.test_wrapper (20s)
Starting test(python3.8): pyspark.mllib.tests.test_linalg
Finished test(python3.8): pyspark.mllib.tests.test_feature (32s)
Starting test(python3.8): pyspark.mllib.tests.test_stat
Finished test(python3.8): pyspark.mllib.tests.test_algorithms (70s)
Starting test(python3.8): pyspark.mllib.tests.test_streaming_algorithms
Finished test(python3.8): pyspark.mllib.tests.test_stat (37s)
Starting test(python3.8): pyspark.mllib.tests.test_util
Finished test(python3.8): pyspark.mllib.tests.test_linalg (70s)
Starting test(python3.8): pyspark.sql.tests.test_arrow
Finished test(python3.8): pyspark.sql.tests.test_arrow (1s) ... 53 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_catalog
Finished test(python3.8): pyspark.mllib.tests.test_util (15s)
Starting test(python3.8): pyspark.sql.tests.test_column
Finished test(python3.8): pyspark.sql.tests.test_catalog (24s)
Starting test(python3.8): pyspark.sql.tests.test_conf
Finished test(python3.8): pyspark.sql.tests.test_column (21s)
Starting test(python3.8): pyspark.sql.tests.test_context
Finished test(python3.8): pyspark.ml.tests.test_tuning (125s)
Starting test(python3.8): pyspark.sql.tests.test_dataframe
Finished test(python3.8): pyspark.sql.tests.test_conf (9s)
Starting test(python3.8): pyspark.sql.tests.test_datasources
Finished test(python3.8): pyspark.sql.tests.test_context (29s)
Starting test(python3.8): pyspark.sql.tests.test_functions
Finished test(python3.8): pyspark.sql.tests.test_datasources (32s)
Starting test(python3.8): pyspark.sql.tests.test_group
Finished test(python3.8): pyspark.sql.tests.test_dataframe (39s) ... 3 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf (1s) ... 6 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf_cogrouped_map
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf_cogrouped_map (0s) ... 14 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf_grouped_agg
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf_grouped_agg (1s) ... 15 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf_grouped_map
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf_grouped_map (1s) ... 20 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf_scalar
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf_scalar (1s) ... 49 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_pandas_udf_window
Finished test(python3.8): pyspark.sql.tests.test_pandas_udf_window (1s) ... 14 tests were skipped
Starting test(python3.8): pyspark.sql.tests.test_readwriter
Finished test(python3.8): pyspark.sql.tests.test_functions (29s)
Starting test(python3.8): pyspark.sql.tests.test_serde
Finished test(python3.8): pyspark.sql.tests.test_group (20s)
Starting test(python3.8): pyspark.sql.tests.test_session
Finished test(python3.8): pyspark.mllib.tests.test_streaming_algorithms (126s)
Starting test(python3.8): pyspark.sql.tests.test_streaming
Finished test(python3.8): pyspark.sql.tests.test_serde (25s)
Starting test(python3.8): pyspark.sql.tests.test_types
Finished test(python3.8): pyspark.sql.tests.test_readwriter (38s)
Starting test(python3.8): pyspark.sql.tests.test_udf
Finished test(python3.8): pyspark.sql.tests.test_session (32s)
Starting test(python3.8): pyspark.sql.tests.test_utils
Finished test(python3.8): pyspark.sql.tests.test_utils (17s)
Starting test(python3.8): pyspark.streaming.tests.test_context
Finished test(python3.8): pyspark.sql.tests.test_types (45s)
Starting test(python3.8): pyspark.streaming.tests.test_dstream
Finished test(python3.8): pyspark.sql.tests.test_udf (44s)
Starting test(python3.8): pyspark.streaming.tests.test_kinesis
Finished test(python3.8): pyspark.streaming.tests.test_kinesis (0s) ... 2 tests were skipped
Starting test(python3.8): pyspark.streaming.tests.test_listener
Finished test(python3.8): pyspark.streaming.tests.test_context (28s)
Starting test(python3.8): pyspark.tests.test_appsubmit
Finished test(python3.8): pyspark.sql.tests.test_streaming (60s)
Starting test(python3.8): pyspark.tests.test_broadcast
Finished test(python3.8): pyspark.streaming.tests.test_listener (11s)
Starting test(python3.8): pyspark.tests.test_conf
Finished test(python3.8): pyspark.tests.test_conf (17s)
Starting test(python3.8): pyspark.tests.test_context
Finished test(python3.8): pyspark.tests.test_broadcast (39s)
Starting test(python3.8): pyspark.tests.test_daemon
Finished test(python3.8): pyspark.tests.test_daemon (5s)
Starting test(python3.8): pyspark.tests.test_join
Finished test(python3.8): pyspark.tests.test_context (31s)
Starting test(python3.8): pyspark.tests.test_profiler
Finished test(python3.8): pyspark.tests.test_join (9s)
Starting test(python3.8): pyspark.tests.test_rdd
Finished test(python3.8): pyspark.tests.test_profiler (12s)
Starting test(python3.8): pyspark.tests.test_readwrite
Finished test(python3.8): pyspark.tests.test_readwrite (23s) ... 3 tests were skipped
Starting test(python3.8): pyspark.tests.test_serializers
Finished test(python3.8): pyspark.tests.test_appsubmit (94s)
Starting test(python3.8): pyspark.tests.test_shuffle
Finished test(python3.8): pyspark.streaming.tests.test_dstream (110s)
Starting test(python3.8): pyspark.tests.test_taskcontext
Finished test(python3.8): pyspark.tests.test_rdd (42s)
Starting test(python3.8): pyspark.tests.test_util
Finished test(python3.8): pyspark.tests.test_serializers (11s)
Starting test(python3.8): pyspark.tests.test_worker
Finished test(python3.8): pyspark.tests.test_shuffle (12s)
Starting test(python3.8): pyspark.accumulators
Finished test(python3.8): pyspark.tests.test_util (7s)
Starting test(python3.8): pyspark.broadcast
Finished test(python3.8): pyspark.accumulators (8s)
Starting test(python3.8): pyspark.conf
Finished test(python3.8): pyspark.broadcast (8s)
Starting test(python3.8): pyspark.context
Finished test(python3.8): pyspark.tests.test_worker (19s)
Starting test(python3.8): pyspark.ml.classification
Finished test(python3.8): pyspark.conf (4s)
Starting test(python3.8): pyspark.ml.clustering
Finished test(python3.8): pyspark.context (22s)
Starting test(python3.8): pyspark.ml.evaluation
Finished test(python3.8): pyspark.tests.test_taskcontext (49s)
Starting test(python3.8): pyspark.ml.feature
Finished test(python3.8): pyspark.ml.clustering (43s)
Starting test(python3.8): pyspark.ml.fpm
Finished test(python3.8): pyspark.ml.evaluation (27s)
Starting test(python3.8): pyspark.ml.image
Finished test(python3.8): pyspark.ml.image (8s)
Starting test(python3.8): pyspark.ml.linalg.__init__
Finished test(python3.8): pyspark.ml.linalg.__init__ (0s)
Starting test(python3.8): pyspark.ml.recommendation
Finished test(python3.8): pyspark.ml.classification (63s)
Starting test(python3.8): pyspark.ml.regression
Finished test(python3.8): pyspark.ml.fpm (23s)
Starting test(python3.8): pyspark.ml.stat
Finished test(python3.8): pyspark.ml.stat (30s)
Starting test(python3.8): pyspark.ml.tuning
Finished test(python3.8): pyspark.ml.regression (51s)
Starting test(python3.8): pyspark.mllib.classification
Finished test(python3.8): pyspark.ml.feature (93s)
Starting test(python3.8): pyspark.mllib.clustering
Finished test(python3.8): pyspark.ml.tuning (39s)
Starting test(python3.8): pyspark.mllib.evaluation
Finished test(python3.8): pyspark.mllib.classification (38s)
Starting test(python3.8): pyspark.mllib.feature
Finished test(python3.8): pyspark.mllib.evaluation (25s)
Starting test(python3.8): pyspark.mllib.fpm
Finished test(python3.8): pyspark.mllib.clustering (64s)
Starting test(python3.8): pyspark.mllib.linalg.__init__
Finished test(python3.8): pyspark.ml.recommendation (131s)
Starting test(python3.8): pyspark.mllib.linalg.distributed
Finished test(python3.8): pyspark.mllib.linalg.__init__ (0s)
Starting test(python3.8): pyspark.mllib.random
Finished test(python3.8): pyspark.mllib.feature (36s)
Starting test(python3.8): pyspark.mllib.recommendation
Finished test(python3.8): pyspark.mllib.fpm (31s)
Starting test(python3.8): pyspark.mllib.regression
Finished test(python3.8): pyspark.mllib.random (16s)
Starting test(python3.8): pyspark.mllib.stat.KernelDensity
Finished test(python3.8): pyspark.mllib.stat.KernelDensity (1s)
Starting test(python3.8): pyspark.mllib.stat._statistics
Finished test(python3.8): pyspark.mllib.stat._statistics (25s)
Starting test(python3.8): pyspark.mllib.tree
Finished test(python3.8): pyspark.mllib.regression (44s)
Starting test(python3.8): pyspark.mllib.util
Finished test(python3.8): pyspark.mllib.recommendation (49s)
Starting test(python3.8): pyspark.profiler
Finished test(python3.8): pyspark.mllib.linalg.distributed (53s)
Starting test(python3.8): pyspark.rdd
Finished test(python3.8): pyspark.profiler (14s)
Starting test(python3.8): pyspark.serializers
Finished test(python3.8): pyspark.mllib.tree (30s)
Starting test(python3.8): pyspark.shuffle
Finished test(python3.8): pyspark.shuffle (2s)
Starting test(python3.8): pyspark.sql.avro.functions
Finished test(python3.8): pyspark.mllib.util (30s)
Starting test(python3.8): pyspark.sql.catalog
Finished test(python3.8): pyspark.serializers (17s)
Starting test(python3.8): pyspark.sql.column
Finished test(python3.8): pyspark.rdd (31s)
Starting test(python3.8): pyspark.sql.conf
Finished test(python3.8): pyspark.sql.conf (7s)
Starting test(python3.8): pyspark.sql.context
Finished test(python3.8): pyspark.sql.avro.functions (19s)
Starting test(python3.8): pyspark.sql.dataframe
Finished test(python3.8): pyspark.sql.catalog (16s)
Starting test(python3.8): pyspark.sql.functions
Finished test(python3.8): pyspark.sql.column (27s)
Starting test(python3.8): pyspark.sql.group
Finished test(python3.8): pyspark.sql.context (26s)
Starting test(python3.8): pyspark.sql.readwriter
Finished test(python3.8): pyspark.sql.group (52s)
Starting test(python3.8): pyspark.sql.session
Finished test(python3.8): pyspark.sql.dataframe (73s)
Starting test(python3.8): pyspark.sql.streaming
Finished test(python3.8): pyspark.sql.functions (75s)
Starting test(python3.8): pyspark.sql.types
Finished test(python3.8): pyspark.sql.readwriter (57s)
Starting test(python3.8): pyspark.sql.udf
Finished test(python3.8): pyspark.sql.types (13s)
Starting test(python3.8): pyspark.sql.window
Finished test(python3.8): pyspark.sql.session (32s)
Starting test(python3.8): pyspark.streaming.util
Finished test(python3.8): pyspark.streaming.util (1s)
Starting test(python3.8): pyspark.util
Finished test(python3.8): pyspark.util (0s)
Finished test(python3.8): pyspark.sql.streaming (30s)
Finished test(python3.8): pyspark.sql.udf (27s)
Finished test(python3.8): pyspark.sql.window (22s)
Tests passed in 855 seconds

HyukjinKwon · 2019-10-21T12:59:59Z

cc @BryanCutler, @viirya @ueshin

SparkQA · 2019-10-21T13:10:49Z

Test build #112392 has finished for PR 26194 at commit e8a2acb.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
skeleton_class = type_constructor(name, bases, type_kwargs)
enum_class = metacls.__new__(metacls, name, bases, classdict)

HyukjinKwon · 2019-10-21T13:15:49Z

Hm .. let me investigate test failures further.

srowen

Seems OK pending tests

BryanCutler

Sounds like a good reason to upgrade. Would now be a good time to stop including the file and use the official package or as a zip file in pyspark/lib?

viirya

Should be fine. I've tried this locally with 3.8 beta before.

HyukjinKwon · 2019-10-22T01:35:49Z

@BryanCutler, yea, I think we should. I will take a look separately if you don't mind. It will need a fix here and there. One side concern is that it's pretty difficult to use it as the official package. We can try it with a zip I suspect ..

HyukjinKwon · 2019-10-22T01:36:17Z

retest this please

SparkQA · 2019-10-22T01:55:11Z

Test build #112421 has finished for PR 26194 at commit e8a2acb.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
skeleton_class = type_constructor(name, bases, type_kwargs)
enum_class = metacls.__new__(metacls, name, bases, classdict)

SparkQA · 2019-10-22T02:43:34Z

Test build #112424 has finished for PR 26194 at commit 876fbff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-22T03:01:35Z

Test build #112429 has finished for PR 26194 at commit a897b85.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-22T04:06:15Z

Test build #112432 has finished for PR 26194 at commit fc7ffd3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-22T05:01:04Z

Test build #112438 has finished for PR 26194 at commit cbdf92c.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-10-22T06:00:15Z

The test failure is by cloudpipe/cloudpickle#278 . Let me stick to 1.1.1 for now since we didn't drop Python 2 support yet.

SparkQA · 2019-10-22T06:05:52Z

Test build #112441 has finished for PR 26194 at commit 847079d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-22T06:37:16Z

Test build #112443 has finished for PR 26194 at commit 9b2f117.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-10-22T07:17:30Z

Last commit is just adding metadata which does not affect pip check. I double checked the Python linter pass.

Merged to master.

Thanks, @srowen, @BryanCutler and @viirya

jackhhh · 2020-07-05T16:59:57Z

Spark 3.0 will support. You can try 3.0 preview.

So, does it mean that I have to upgrade Spark and PySpark to 3.0 to use Python3.8? I tried to run with Spark 2.3.2 and PySpark 3.0.0 but it failed.

HyukjinKwon · 2020-07-06T04:09:03Z

Yup.

dongjoon-hyun · 2020-10-14T03:22:02Z

I updated SPARK-29536 by adding 2.4.7 into Affected Versions.

HyukjinKwon · 2020-10-14T03:24:40Z

Thanks @dongjoon-hyun.

gliptak · 2020-11-28T02:06:00Z

capitalone/datacompy#88

Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.

Neeraj9697 · 2021-03-05T19:28:53Z

@HyukjinKwon
I am Using Python 3.9 and cloudpickle.py from below
https://github.com/cloudpipe/cloudpickle/blob/v1.1.1/cloudpickle/cloudpickle.py .

But Getting below error message while initializing pyspark in windows
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec 7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "C:\BigDataLocalSetup\Spark\python\pyspark\shell.py", line 31, in
from pyspark import SparkConf
File "C:\BigDataLocalSetup\Spark\python\lib\pyspark\pyspark_init_.py", line 51, in
from pyspark.context import SparkContext
File "C:\BigDataLocalSetup\Spark\python\lib\pyspark\pyspark\context.py", line 33, in
from pyspark.broadcast import Broadcast, BroadcastPickleRegistry
File "C:\BigDataLocalSetup\Spark\python\lib\pyspark\pyspark\broadcast.py", line 25, in
from pyspark.cloudpickle import print_exec
ImportError: cannot import name 'print_exec' from 'pyspark.cloudpickle' (C:\BigDataLocalSetup\Spark\python\lib\pyspark\pyspark\cloudpickle.py)

dongjoon-hyun · 2021-03-05T19:37:00Z

Hi, @Neeraj9697 . The following is the result of Apache Spark 3.1.1 on Python 3.9.1 on Mac.

Which Spark Version are you using?
Are you reporting that PySpark doesn't work on Windows OS specifically?

$ bin/pyspark
Python 3.9.1 (default, Jan 19 2021, 12:49:23)
[Clang 12.0.0 (clang-1200.0.32.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
21/03/05 11:34:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/

Using Python version 3.9.1 (default, Jan 19 2021 12:49:23)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1614972879606).
SparkSession available as 'spark'.
>>>

dongjoon-hyun · 2021-03-05T19:38:21Z

Could you file a JIRA issue with your specific information please, @Neeraj9697 ?

dongjoon-hyun · 2021-03-05T19:41:54Z

Like the following, we officially documented that some features (like Arrow) doesn't work on Python 3.9 due to its issue. Python 3.9 is not tested throughly in Apache Spark community yet.

https://spark.apache.org/docs/latest/#downloading

For Python 3.9, Arrow optimization and pandas UDFs might not work due to the supported Python versions in Apache Arrow. Please refer to the latest Python Compatibility page.

HyukjinKwon · 2021-03-05T21:03:10Z

Assuming from the error message, looks like you have multiple PySpark installed in your local and the paths messed up for some reasons. pyspark.cloudpickle is a package now but your error message said that this is a module (before this fix).

Neeraj9697 · 2021-03-06T18:13:58Z

Thanks @HyukjinKwon and @dongjoon-hyun
Using Spark 3.1.1 its working fine now!

https://gist.github.com/Neeraj9697/f7aa1c6951bd3021eb48ccb919cbfd57
please help me with this problem also

dongjoon-hyun · 2021-03-07T00:59:20Z

Great! Thank you for confirmation, @Neeraj9697 .

For your new issue, please install python3 in your system.

java.io.IOException: Cannot run program "python3": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(Unknown Source)

Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel. Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.

Upgrade cloudpickle to 1.2.2 to support Python 3.8

e8a2acb

HyukjinKwon changed the title ~~[SPARK-29536][PYTHON] Upgrade cloudpickle to 1.2.2 to support Python 3.8~~ [WIP][SPARK-29536][PYTHON] Upgrade cloudpickle to 1.2.2 to support Python 3.8 Oct 21, 2019

srowen reviewed Oct 21, 2019

View reviewed changes

BryanCutler reviewed Oct 21, 2019

View reviewed changes

viirya reviewed Oct 21, 2019

View reviewed changes

HyukjinKwon added 2 commits October 22, 2019 11:02

Use 1.1.0

876fbff

Try 1.2.0

a897b85

Try e0cd33bba4b993d2133604e0fe300d7200baa624

fc7ffd3

Try f3c3aeae28a650f3ed0ea5f8516e2f090ce44536

cbdf92c

Without f3c3aeae28a650f3ed0ea5f8516e2f090ce44536

847079d

Use 1.1.1 for now

9b2f117

HyukjinKwon changed the title ~~[WIP][SPARK-29536][PYTHON] Upgrade cloudpickle to 1.2.2 to support Python 3.8~~ [SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8 Oct 22, 2019

HyukjinKwon mentioned this pull request Oct 22, 2019

Python 3.8 support databricks/koalas#941

Closed

Add metadata for Python 3.7 in setup.py

1a225e3

HyukjinKwon closed this in 811d563 Oct 22, 2019

xmnlab mentioned this pull request Feb 24, 2020

CI: pyspark 2.* doesn't work on Python 3.8 ibis-project/ibis#2091

Closed

HyukjinKwon deleted the SPARK-29536 branch March 3, 2020 01:17

natekupp mentioned this pull request Mar 16, 2020

Fully enable Python 3.8 tests dagster-io/dagster#1960

Closed

toxicafunk mentioned this pull request Jul 14, 2020

Docker python3.7 treygrainger/ai-powered-search#7

Merged

romainx mentioned this pull request Dec 14, 2020

Fix spark installation for Java 11 and Arrow jupyter/docker-stacks#1198

Merged

sramirez mentioned this pull request Dec 15, 2020

Cloudpickle error after re-building pyspark-notebook with custom spark version jupyter/docker-stacks#1200

Closed

Ennosigaeon mentioned this pull request Jan 15, 2021

misc(Python): Replace python 2 with python 3 spark-jobserver/spark-jobserver#1341

Merged

3 tasks

fpgmaas mentioned this pull request Oct 24, 2023

Bump to quinn 1.0 mrpowers-io/quinn#142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8 #26194

[SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8 #26194

HyukjinKwon commented Oct 21, 2019 •

edited

Loading

HyukjinKwon commented Oct 21, 2019

SparkQA commented Oct 21, 2019

HyukjinKwon commented Oct 21, 2019

srowen left a comment

BryanCutler left a comment

viirya left a comment

HyukjinKwon commented Oct 22, 2019 •

edited

Loading

HyukjinKwon commented Oct 22, 2019

SparkQA commented Oct 22, 2019

SparkQA commented Oct 22, 2019

SparkQA commented Oct 22, 2019

SparkQA commented Oct 22, 2019

SparkQA commented Oct 22, 2019

HyukjinKwon commented Oct 22, 2019 •

edited

Loading

SparkQA commented Oct 22, 2019

SparkQA commented Oct 22, 2019

HyukjinKwon commented Oct 22, 2019

jackhhh commented Jul 5, 2020

HyukjinKwon commented Jul 6, 2020

dongjoon-hyun commented Oct 14, 2020

HyukjinKwon commented Oct 14, 2020

gliptak commented Nov 28, 2020

Neeraj9697 commented Mar 5, 2021

dongjoon-hyun commented Mar 5, 2021

dongjoon-hyun commented Mar 5, 2021

dongjoon-hyun commented Mar 5, 2021 •

edited

Loading

HyukjinKwon commented Mar 5, 2021

Neeraj9697 commented Mar 6, 2021

dongjoon-hyun commented Mar 7, 2021 •

edited

Loading

[SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8 #26194

[SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8 #26194

Conversation

HyukjinKwon commented Oct 21, 2019 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HyukjinKwon commented Oct 21, 2019

SparkQA commented Oct 21, 2019

HyukjinKwon commented Oct 21, 2019

srowen left a comment

Choose a reason for hiding this comment

BryanCutler left a comment

Choose a reason for hiding this comment

viirya left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Oct 22, 2019 • edited Loading

HyukjinKwon commented Oct 22, 2019

SparkQA commented Oct 22, 2019

SparkQA commented Oct 22, 2019

SparkQA commented Oct 22, 2019

SparkQA commented Oct 22, 2019

SparkQA commented Oct 22, 2019

HyukjinKwon commented Oct 22, 2019 • edited Loading

SparkQA commented Oct 22, 2019

SparkQA commented Oct 22, 2019

HyukjinKwon commented Oct 22, 2019

jackhhh commented Jul 5, 2020

HyukjinKwon commented Jul 6, 2020

dongjoon-hyun commented Oct 14, 2020

HyukjinKwon commented Oct 14, 2020

gliptak commented Nov 28, 2020

Neeraj9697 commented Mar 5, 2021

dongjoon-hyun commented Mar 5, 2021

dongjoon-hyun commented Mar 5, 2021

dongjoon-hyun commented Mar 5, 2021 • edited Loading

HyukjinKwon commented Mar 5, 2021

Neeraj9697 commented Mar 6, 2021

dongjoon-hyun commented Mar 7, 2021 • edited Loading

HyukjinKwon commented Oct 21, 2019 •

edited

Loading

HyukjinKwon commented Oct 22, 2019 •

edited

Loading

HyukjinKwon commented Oct 22, 2019 •

edited

Loading

dongjoon-hyun commented Mar 5, 2021 •

edited

Loading

dongjoon-hyun commented Mar 7, 2021 •

edited

Loading