Skip to content

Commit

Permalink
[SPARK-41454][PYTHON] Support Python 3.11
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR aims to support Python 3.11.

### Why are the changes needed?

Python 3.11 is the newest major release of the Python programming language, and it contains many new features and optimizations and Python 3.11.1 is the latest version.

- 2022-12-03 https://www.python.org/downloads/release/python-3111/

And, Spark is affected by one API removal (deprecated at 3.9 and removed at 3.11). Since this is handled by conditionally, there is no regression at the old Python versions.
- https://bugs.python.org/issue40465

### Does this PR introduce _any_ user-facing change?

No, previsouly, this is not supported.

### How was this patch tested?

Manually run the following. Note that this is tested without optional dependencies.
```
$ python/run-tests.py --python-executables python3.11
Will test against the following Python executables: ['python3.11']
Will test the following Python modules: ['pyspark-connect', 'pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-pandas', 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming']
python3.11 python_implementation is CPython
python3.11 version is: Python 3.11.1
Starting test(python3.11): pyspark.ml.tests.test_evaluation (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/ff09022a-f3d3-413b-b15d-261c40d5b048/python3.11__pyspark.ml.tests.test_evaluation__wh9c4y5l.log)
...
Finished test(python3.11): pyspark.sql.streaming.readwriter (88s)
Tests passed in 1138 seconds

...
Skipped tests in pyspark.tests.test_worker with python3.11:
    test_memory_limit (pyspark.tests.test_worker.WorkerMemoryTest.test_memory_limit) ... skipped "Memory limit feature in Python worker is dependent on Python's 'resource' module on Linux; however, not found or not on Linux."
```

Closes #38987 from dongjoon-hyun/SPARK-41454.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
  • Loading branch information
dongjoon-hyun committed Dec 9, 2022
1 parent 3433f2a commit b5a9e1f
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 3 deletions.
9 changes: 6 additions & 3 deletions python/pyspark/shuffle.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,12 @@ def _get_local_dirs(sub):
path = os.environ.get("SPARK_LOCAL_DIRS", "/tmp")
dirs = path.split(",")
if len(dirs) > 1:
# different order in different processes and instances
rnd = random.Random(os.getpid() + id(dirs))
random.shuffle(dirs, rnd.random)
if sys.version_info < (3, 11):
# different order in different processes and instances
rnd = random.Random(os.getpid() + id(dirs))
random.shuffle(dirs, rnd.random)
else:
random.shuffle(dirs)
return [os.path.join(d, "python", str(os.getpid()), sub) for d in dirs]


Expand Down
1 change: 1 addition & 0 deletions python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@ def run(self):
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'Programming Language :: Python :: 3.11',
'Programming Language :: Python :: Implementation :: CPython',
'Programming Language :: Python :: Implementation :: PyPy',
'Typing :: Typed'],
Expand Down

4 comments on commit b5a9e1f

@Michal-Kolomanski
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the date of the new Pyspark release known? Handling Python 3.11 is a thing! I am looking forward to the release 👏

@bjornjorgensen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Michal-Kolomanski

Spark 3.4 release window

Date Event
January 15th 2023 Code freeze. Release branch cut.
Late January 2023 QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged.
February 2023 Release candidates (RC), voting, etc. until final release passes

https://spark.apache.org/versioning-policy.html

@dongjoon-hyun
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I made this PR as a part of Apache Spark 3.4 preparation, @Michal-Kolomanski and @bjornjorgensen .

@Michal-Kolomanski
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @bjornjorgensen and @dongjoon-hyun for the fast response.

Please sign in to comment.