Skip to content

Commit

Permalink
misc(Python): Replace python 2 with python 3
Browse files Browse the repository at this point in the history
Python 2 has reached EOL last year and should not be used anymore. This commit replaces all references to the "python" binary with the more explicit "python3" binary. If desired, the build can still be performed for Python 2 by settings the "PYTHON_EXECUTABLE" environment variable to an appropriate version. Additionally, python wheels are the preferred way to distribute python code (see https://packaging.python.org/discussions/wheel-vs-egg/). This commit additionally builds the job-server-python wheel.

Spark-2.4 does not support python >= 3.8 (see apache/spark#26194) leading to failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues try to state a python executable < 3.8 explicitly.
  • Loading branch information
Marc Zoeller committed Jan 25, 2021
1 parent 15f457d commit 5ac3fe1
Show file tree
Hide file tree
Showing 13 changed files with 69 additions and 26 deletions.
4 changes: 2 additions & 2 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -115,8 +115,8 @@ lazy val jobServerExtrasSettings = revolverSettings ++ Assembly.settings ++ publ
lazy val jobServerApiSettings = Seq(libraryDependencies ++= sparkDeps ++ sparkExtraDeps)

lazy val testPython = taskKey[Unit]("Launch a sub process to run the Python tests")
lazy val buildPython = taskKey[Unit]("Build the python side of python support into an egg")
lazy val buildPyExamples = taskKey[Unit]("Build the examples of python jobs into an egg")
lazy val buildPython = taskKey[Unit]("Build the python side of python support into a wheel and egg")
lazy val buildPyExamples = taskKey[Unit]("Build the examples of python jobs into a wheel and egg")

lazy val jobServerPythonSettings = revolverSettings ++ Assembly.settings ++ publishSettings ++ Seq(
libraryDependencies ++= sparkPythonDeps,
Expand Down
5 changes: 2 additions & 3 deletions ci/install-python-dependencies.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#!/usr/bin/env bash
set -e
pip install --upgrade pip
pip install --user pyhocon
pip3 install --upgrade pip
pip3 install --user pyhocon
pip install --user pycodestyle
pip3 install --user pycodestyle
22 changes: 20 additions & 2 deletions doc/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
- [Running a job](#running-a-job)
- [PythonSessionContext](#pythonsessioncontext)
- [CustomContexts](#customcontexts)
- [Python 2](#python-2)
- [Troubleshooting](#troubleshooting)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

Expand Down Expand Up @@ -45,10 +47,10 @@ A basic config supporting Python might look like:
python {
paths = [
${SPARK_HOME}/python,
"/home/user/spark-jobserver/job-server-extras/job-server-python/target/python/spark_jobserver_python-0.8.0-py2.7.egg"
"/home/user/spark-jobserver/job-server-extras/job-server-python/target/python/spark_jobserver_python-0.10.1-py3-none-any.whl"
]

# The default value in application.conf is "python"
# The default value in application.conf is "python3"
executable = "python3"
}
}
Expand Down Expand Up @@ -266,3 +268,19 @@ The Python support can support arbitrary context types as long as they are based
contexts of your custom type, your Python jobs which use this context must implement an additional method,
`build_context(self, gateway, jvmContext, sparkConf)`, which returns the Python equivalent of the JVM Context object.
For a simple example, see `CustomContextJob` in the `job-server-python` sub-module.


## Python 2
By default, spark jobserver builds all python dependencies, namely `sparkjobserver` and `sjs_python_examples` for
python 3. The packed binaries are also compatible with python 2. If you would like to, you can explicitly build all
libraries for python 2 by setting the environment variable `PYTHON_EXECUTABLE` to a python 2 executable before packaging
spark jobserver.


## Troubleshooting

### TypeError: an integer is required (got type bytes)

Spark-2.4 does not support python >= 3.8 (see [here](https://github.com/apache/spark/pull/26194) for more information).
If you encounter this issue, please verify that you provide a python executable < 3.8 in your
[configuration file](../job-server/src/main/resources/application.conf#L210).
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,8 @@ with BeforeAndAfterAll {
it("should successfully run jobs using python3", WindowsIgnore) {
val factory = new TestPythonSessionContextFactory()
val p3Config = ConfigFactory.parseString(
"""
|python.executable = "python3"
s"""
|python.executable = "${sys.env.getOrElse("PYTHON_EXECUTABLE", "python3")}"
""".stripMargin).withFallback(config)
context = factory.makeContext(sparkConf, p3Config, "test-create")
runSessionTest(factory, context, p3Config)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ object PythonSparkContextFactorySpec {
| "${originalPythonPath.getOrElse("")}"
|]
|
|python.executable = "python"
|python.executable = "${sys.env.getOrElse("PYTHON_EXECUTABLE", "python3")}"
|${JobserverConfig.IS_SPARK_SESSION_HIVE_ENABLED} = true
""".replace("\\", "\\\\") // Windows-compatibility
.stripMargin)
Expand Down
7 changes: 6 additions & 1 deletion job-server-python/src/python/build.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,13 @@ set SJS_VERSION=%1
setlocal
cd %~dp0

python %~dp0%2 build --build-base ../../target/python^
set PYTHON_EXECUTABLE=%2

%PYTHON_EXECUTABLE% %~dp0%3 build --build-base ../../target/python^
egg_info --egg-base ../../target/python^
bdist_egg --bdist-dir /tmp/bdist --dist-dir ../../target/python --skip-build

%PYTHON_EXECUTABLE% %~dp0%3 build --build-base ../../target/python^
bdist_wheel --bdist-dir /tmp/bdist --dist-dir ../../target/python --skip-build

endlocal
7 changes: 6 additions & 1 deletion job-server-python/src/python/build.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
#!/usr/bin/env bash
export SJS_VERSION=$1
python $2 build --build-base ../../target/python \
PYSPARK_PYTHON=$2

$PYSPARK_PYTHON $3 build --build-base ../../target/python \
egg_info --egg-base ../../target/python \
bdist_egg --bdist-dir /tmp/bdist --dist-dir ../../target/python --skip-build

$PYSPARK_PYTHON $3 build --build-base ../../target/python \
bdist_wheel --bdist-dir /tmp/bdist --dist-dir ../../target/python --skip-build
3 changes: 2 additions & 1 deletion job-server-python/src/python/run-tests.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ setlocal
cd %~dp0

set PYTHONPATH=%~dp0;%SPARK_HOME%\python\lib\pyspark.zip;%SPARK_HOME%\python\lib\py4j-0.9-src.zip;%PYTHONPATH%
python test\apitests.py
set PYSPARK_PYTHON=%1
%PYSPARK_PYTHON% test\apitests.py
set exitCode=%ERRORLEVEL%
REM This sleep is here so that all of Spark's shutdown stdout if written before we exit,
REM so that we return cleanly to the command prompt.
Expand Down
4 changes: 3 additions & 1 deletion job-server-python/src/python/run-tests.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/usr/bin/env bash
PYTHONPATH=.:$SPARK_HOME/python/lib/pyspark.zip:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH python test/apitests.py
PYTHONPATH=.:$SPARK_HOME/python/lib/pyspark.zip:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH \
PYSPARK_PYTHON=$1 \
$1 test/apitests.py
exitCode=$?
#This sleep is here so that all of Spark's shutdown stdout if written before we exit,
#so that we return cleanly to the command prompt.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,10 +152,17 @@ class SubprocessSpec extends FunSpec with Matchers with BeforeAndAfter with Befo
)

private def setupPythonProcess(port: String, token: String): scala.sys.process.ProcessBuilder = {
// Spark-2.4 does not support python >= 3.8 (see https://github.com/apache/spark/pull/26194) leading to
// failed test cases (TypeError: an integer is required (got type bytes)). If you encounter these issues
// try to state a python executable < 3.8 explicitly.
// TODO: remove comment after migration to Spark-3.0
val pythonExecutable = sys.env.getOrElse("PYTHON_EXECUTABLE", "python3")
Process(
Seq("python", "-m", "sparkjobserver.subprocess", port, token),
Seq(pythonExecutable, "-m", "sparkjobserver.subprocess", port, token),
None,
"PYTHONPATH" -> pythonPath)
"PYTHONPATH" -> pythonPath,
"PYSPARK_PYTHON" -> pythonExecutable
)
}

describe("The python subprocess") {
Expand Down
3 changes: 2 additions & 1 deletion job-server/src/main/resources/application.conf
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,8 @@ spark {
#Any locations of python libraries to be included on the PYTHONPATH when running Python jobs
paths = []
#The shell command to run when launching the subprocess for python jobs
executable = "python"
#Please note: Spark2 currently only supports Python < 3.8.
executable = "python3"
}

# All the above sections have higher precedence because those properties are added directly to
Expand Down
14 changes: 9 additions & 5 deletions notes/0.11.0.markdown
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
#Scala #akka @ApacheSpark
# Scala #akka @ApacheSpark

## Breaking Changes:
The authentication management has been refactored. As a consequence, existing `shiro` configuration sections have to be
adapted. Previously, `shiro` configuration had to be provided at root level without a prefix. Now, `shiro` configuration
has to be embedded in an additional `access-control` block. An example configuration is available in the
[sources](../job-server/src/main/resources/application.conf#L322).

- The authentication management has been refactored. As a consequence, existing `shiro` configuration sections have to
be adapted. Previously, `shiro` configuration had to be provided at root level without a prefix. Now, `shiro`
configuration has to be embedded in an additional `access-control` block. An example configuration is available in the
[sources](../job-server/src/main/resources/application.conf#L322).
- The default Python version is switched from Python 2 to Python 3 (namely, from `python` to `python3`). If you would
still like to use Python 2, adapt the Python executable in
the [application.conf](../job-server/src/main/resources/application.conf#L210).
9 changes: 5 additions & 4 deletions project/PythonTasks.scala
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,16 @@ import java.io.File
import scala.sys.process.Process

object PythonTasks {
val ext : String = if(System.getProperty("os.name").indexOf("Win") >= 0) "cmd" else "sh"
val ext : String = if (System.getProperty("os.name").indexOf("Win") >= 0) "cmd" else "sh"
val pythonExecutable: String = sys.env.getOrElse("PYTHON_EXECUTABLE", "python3")

def workingDirectory(baseDirectory: File): File =
new File(baseDirectory.getAbsolutePath + Seq("src", "python")
.mkString("/", "/", ""))

def testPythonTask(baseDirectory: File): Unit = {
val cwd = workingDirectory(baseDirectory)
val exitCode = Process(cwd.getAbsolutePath + "/run-tests." + ext, cwd).!
val exitCode = Process(Seq(cwd.getAbsolutePath + "/run-tests." + ext, pythonExecutable), cwd).!
if(exitCode != 0) {
sys.error(s"Running python tests received non-zero exit code $exitCode")
}
Expand All @@ -20,15 +21,15 @@ object PythonTasks {
def buildPythonTask(baseDirectory: File, version: String): Unit = {
val cwd = workingDirectory(baseDirectory)
val exitCode = Process(Seq(cwd.getAbsolutePath + "/build." + ext,
version, "setup.py"), cwd).!
version, pythonExecutable, "setup.py"), cwd).!
if(exitCode != 0) {
sys.error(s"Building python API received non-zero exit code $exitCode")
}
}

def buildExamplesTask(baseDirectory: File, version: String): Unit = {
val cwd = workingDirectory(baseDirectory)
val exitCode = Process(Seq(cwd.getAbsolutePath + "/build." + ext, version,
val exitCode = Process(Seq(cwd.getAbsolutePath + "/build." + ext, version, pythonExecutable,
"setup-examples.py"), cwd).!
if(exitCode != 0) {
sys.error(s"Building python examples received non-zero exit code $exitCode")
Expand Down

0 comments on commit 5ac3fe1

Please sign in to comment.