-
Notifications
You must be signed in to change notification settings - Fork 118
Modified Python Dockerfiles to allow for the submission of Python app… #464
base: branch-2.2-kubernetes
Are you sure you want to change the base?
Modified Python Dockerfiles to allow for the submission of Python app… #464
Conversation
…lications without the --jars parameter
This should be handled by |
I'd also like to trace down the specific problem the existing shell command is hitting. |
@mccheah If it helps, I was able to replicate the error by using
After playing around with this command, I got it to work by just adding a colon in front of the classpath — |
I suppose wildcards aren't being expanded out properly. I wonder if we can instead use |
Yeah, that's the problem. Just tried again with |
I'd like to know the rules of thumb for how Java is treating these inputs and how Bash is expanding them, etc. Though we had a tough time understanding issues like this in e.g. #444 |
I'm not too sure myself. It might be a difference in Bash vs. JRE handling of wildcards. Quoting the wildcard most likely delegates to the JRE for expansion, and adding a colon necessitates parsing of the provided classpath by the JRE and the subsequent expansion of parsed directories. Maybe @ash211 or @erikerlandson could provide some insight on why this is happening and/or what the best way to patch this would be (colons vs. asterisk)? |
FYI, with #462 we should be able to see exactly what command is ultimately being executed by bash from the pod console outputs |
@erikerlandson I may be wrong, but I don't think |
@erikerlandson @mccheah @sahilprasad any progress on this PR? |
@ifilonenko @mccheah @erikerlandson I'm partial to just adding quotes. Cleaner and solves the submission errors. |
@@ -38,7 +38,7 @@ ENV PYSPARK_PYTHON python | |||
ENV PYSPARK_DRIVER_PYTHON python | |||
ENV PYTHONPATH ${SPARK_HOME}/python/:${SPARK_HOME}/python/lib/py4j-0.10.4-src.zip:${PYTHONPATH} | |||
|
|||
CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \ | |||
CMD SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*" && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put {} around SPARK_CLASSPATH?
@ssuchter @erikerlandson @ifilonenko thoughts on my latest commit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a fine way to do it to me, assuming the testing works.
Integration tests are broken - " |
Hey all, I ran into this issue independently yesterday and tracked it down before I saw #409 I think there is a better solution than prepending a ":" to the $SPARK_CLASSPATH value, and that is to quote $SPARK_CLASSPATH when it is used as the value for the -cp flag in the various Dockerfiles under resource-managers/kubernetes/docker-minimal-bundle/src/main/docker (not just driver-py and executor-py, this is potentially a general problem). This is the standard way of preventing the shell from expanding a variable before passing it to a command. So in the driver-py case as an example, the last line of the Dockerfile just becomes: ${JAVA_HOME}/bin/java "${SPARK_DRIVER_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $PYSPARK_PRIMARY $PYSPARK_FILES $SPARK_DRIVER_ARGS That's it, no other change to SPARK_CLASSPATH needed. What's happening with the unmodified Dockerfile (to recap) is that the wildcard list is being expanded as a space seperated list after the -cp flag, and the second value is taken by java as the target to execute. The rest of the line is thrown away. (You can see this by rebuilding the image and inserting an "echo" of the java invocation before the actual command, like so): ...
if ! [ -z ${SPARK_MOUNTED_FILES_FROM_SECRET_DIR+x} ]; then cp -R "$SPARK_MOUNTED_FILES_FROM_SECRET_DIR/." .; fi && \
echo ${JAVA_HOME}/bin/java "${SPARK_DRIVER_JAVA_OPTS[@]}" -cp $SPARK_CLASSPATH-Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $PYSPARK_PRIMARY $PYSPARK_FILES $SPARK_DRIVER_ARGS && \
... You can also verify this behavior easily by downloading the spark distro and playing from a shell: [machine spark-2.2.0-k8s-0.5.0-bin-2.7.3]$ SPARK_CONTEXT=jars/*
[machine spark-2.2.0-k8s-0.5.0-bin-2.7.3]$ echo $SPARK_CONTEXT
jars/activation-1.1.1.jar jars/antlr-2.7.7.jar jars/antlr4-runtime-4.5.3.jar jars/antlr-runtime-3.4.jar jars/aopalliance-1.0.jar jars/aopalliance-repackaged-2.4.0-b34.jar jars/apacheds-i18n-2.0.0-M15.jar jars/apacheds-kerberos-codec-2.0.0-M15.jar ...
[machine spark-2.2.0-k8s-0.5.0-bin-2.7.3]$ echo "$SPARK_CONTEXT"
jars/* The trouble I see with prepending a ":" is that yes, it happens to prevent the shell expansion because the shell doesn't recognize it as a valid path any longer, but it is also undefined behavior for java. It literally seems to imply an empty path on the classpath list. This might confuse noobs and I think clouds the actual issue (and could perpetuate confusion in other code :) ) Why not defeat the wildcard expansion with the standard shell mechanism? I have a PR with changes to the set of docker files that use -cp SPARK_CONTEXT in this way, but I thought I would comment on the original issue here first. I can push that PR if you like. Best, Trevor |
I've been conferring with @tmckayus on this, and the evidence suggests we should just quote the classpath instead of prepending a colon. Either way we ought to get this nit resolved and close out the issue. I'm in favor of pushing the alternate PR for comparison. |
Okay, for comparison the other PR is #541 |
…lications without the --jars parameter. Fixes #409.
Not sure why the singular path provided to
-cp
does not work, but this fixes it. Also may provide extensibility in that aSPARK_CLASSPATH
could be provided in a custom-builtspark-base
image without requiring modification of the individual driver and executor images.cc @erikerlandson @ifilonenko @mccheah