-
Notifications
You must be signed in to change notification settings - Fork 118
Allow specifying non-local files to spark-submit (python files, and R files) #530
base: branch-2.2-kubernetes
Are you sure you want to change the base?
Conversation
… files) when isKubernetes is set (apache-spark-on-k8s#527)
Good catch, thank you for this. I seem to have missed this in my PRs. This LGTM seeing as CLI is passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…uster mode for spark-submit
@felixcheung not sure, tbh. The intent seems to be that |
Thanks, I guess we should test this. Is there a way to call out what should be tested?
|
rerun integration tests please |
PythonRunner.formatPaths(resolvedPyFiles).mkString(",") | ||
} else { | ||
// Ignoring formatting python path in yarn and mesos cluster mode, these two modes | ||
// Ignoring formatting python path in yarn, mesos, and kubernetes cluster mode, these two modes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is too long. beyond 100 characters so it will fail scalastyle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in a621c5f
This looks like a CI/build system error, unrelated to the changes, but I am not able to fully interpret it. |
rerun integration test please |
Any more comments on this and objection merging this? |
ok to merge when tests pass. |
Refers to issue #527, this allows the use of python files, R and also using
--py-files
, when using spark-submit. Previously, the client would deny any non-local URI types when submitting a python job, even though the kubernetes spark initcontainer would be able to fulfill them (for example,gs://
URIs when the GCS connector is present in the initcontainer image).Changing the validation to support this when isKubernetes is set, allows python jobs to use non-local URIs successfully. Only the client (
spark-submit
) requires this change, existing initcontainer images work fine.What changes were proposed in this pull request?
adding
&& !isKubernetesCluster
tocore/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L328
And also for the R files check.
As suggested by @liyinan926 in #527 (comment)
How was this patch tested?
Command:
Before the change, the error was, and the job did not start:
After the change, I ran
./dev/make-distribution.sh --pip --tgz -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.7.3
locally on my macOS dev machine, and then ran it'sspark-submit
, and I was able to submit my python job successfully and obtain results via the logs.