Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Allow specifying non-local files to spark-submit (python files, and R files) #530

Open
wants to merge 4 commits into
base: branch-2.2-kubernetes
Choose a base branch
from

Conversation

paulreimer
Copy link

@paulreimer paulreimer commented Oct 18, 2017

Refers to issue #527, this allows the use of python files, R and also using --py-files, when using spark-submit. Previously, the client would deny any non-local URI types when submitting a python job, even though the kubernetes spark initcontainer would be able to fulfill them (for example, gs:// URIs when the GCS connector is present in the initcontainer image).

Changing the validation to support this when isKubernetes is set, allows python jobs to use non-local URIs successfully. Only the client (spark-submit) requires this change, existing initcontainer images work fine.

What changes were proposed in this pull request?

adding && !isKubernetesCluster to core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L328
And also for the R files check.

As suggested by @liyinan926 in #527 (comment)

How was this patch tested?

Command:

./bin/spark-submit --deploy-mode cluster --master k8s://http://127.0.0.1:8001 --kubernetes-namespace spark --conf spark.kubernetes.driver.docker.image=<custom> --conf spark.kubernetes.executor.docker.image=<custom> --conf spark.kubernetes.initcontainer.docker.image=<custom> --conf spark.executor.instances=3 --conf spark.app.name=spark-pi gs://spark-resource-staging/pi.py 10

Before the change, the error was, and the job did not start:

Error: Only local python files are supported: gs://spark-resource-staging/pi.py
Run with --help for usage help or --verbose for debug output

After the change, I ran ./dev/make-distribution.sh --pip --tgz -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.7.3 locally on my macOS dev machine, and then ran it's spark-submit, and I was able to submit my python job successfully and obtain results via the logs.

@ifilonenko
Copy link
Member

Good catch, thank you for this. I seem to have missed this in my PRs. This LGTM seeing as CLI is passing.

Copy link

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulreimer
Copy link
Author

@felixcheung not sure, tbh. The intent seems to be that isKubernetesCluster should also support that behaviour (not formatting python path since remote file strings are supported), so I've added that in ecfa6f2

@felixcheung
Copy link

felixcheung commented Oct 19, 2017 via email

@ifilonenko
Copy link
Member

rerun integration tests please

PythonRunner.formatPaths(resolvedPyFiles).mkString(",")
} else {
// Ignoring formatting python path in yarn and mesos cluster mode, these two modes
// Ignoring formatting python path in yarn, mesos, and kubernetes cluster mode, these two modes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is too long. beyond 100 characters so it will fail scalastyle

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a621c5f

@paulreimer
Copy link
Author

This looks like a CI/build system error, unrelated to the changes, but I am not able to fully interpret it.

@foxish
Copy link
Member

foxish commented Nov 2, 2017

rerun integration test please

@liyinan926
Copy link
Member

Any more comments on this and objection merging this?

@foxish
Copy link
Member

foxish commented Dec 6, 2017

ok to merge when tests pass.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants