-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-23146][K8S] Support client mode. #21748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Client mode works more or less identically to cluster mode. However, in client mode, the Spark Context needs to be manually bootstrapped with certain properties which would have otherwise been set up by spark-submit in cluster mode. Specifically: - The user must provide a pod name for the driver. This implies that all drivers in client mode must be running inside a pod. This pod is primarily used to create the owner reference graph so that executors are not orphaned if the driver pod is deleted. - The user must provide a host (spark.driver.host) and port (spark.driver.port) that the executors can connect to. When using spark-submit in cluster mode, spark-submit generates the headless service automatically; in client mode, the user is responsible for setting up their own connectivity.
TODO - finish verifying the integration test, and docs. |
test this please |
Test build #92869 has finished for PR 21748 at commit
|
retest this please |
Kubernetes integration test starting |
Test build #92873 has finished for PR 21748 at commit
|
Kubernetes integration test status failure |
Test build #92875 has finished for PR 21748 at commit
|
Test build #92878 has finished for PR 21748 at commit
|
Test build #92879 has finished for PR 21748 at commit
|
Test build #92880 has finished for PR 21748 at commit
|
Test build #92881 has finished for PR 21748 at commit
|
Test build #92882 has finished for PR 21748 at commit
|
retest this please |
Test build #92884 has finished for PR 21748 at commit
|
Test build #92887 has finished for PR 21748 at commit
|
i'm going to kill the ubuntu build and reboot the worker. i'll retrigger when it's back. |
docs/running-on-kubernetes.md
Outdated
and your spark driver's port to `spark.driver.port`. | ||
driver pod to be routable from the executors by a stable hostname. When deploying your headless service, ensure that | ||
the service's label selector will only match the driver pod and no other pods; it is recommended to assign your driver | ||
pod a sufficiently unique label and to use that label in the node selector of the headless service. Specify the driver's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/node selector
/label selector
/.
docs/running-on-kubernetes.md
Outdated
server fails for any reason, these pods will remain in the cluster. The executor processes should exit when they cannot | ||
reach the driver, so the executor pods should not consume resources in the cluster after your application exits. | ||
The driver will look for a pod with the given name in the namespace specified by `spark.kubernetes.namespace`, and | ||
all executor pods will have their owner reference field set to point to that pod. Be careful to avoid setting the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/all executor pods will have their owner reference field set
/a
OwnerReference point to that pod will be added to each of the executor pods.
.
docs/running-on-kubernetes.md
Outdated
The driver will look for a pod with the given name in the namespace specified by `spark.kubernetes.namespace`, and | ||
all executor pods will have their owner reference field set to point to that pod. Be careful to avoid setting the | ||
owner reference to a pod that is not actually that driver pod, or else the executors may be terminated prematurely when | ||
the wrong pod is terminated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/terminated
/deleted
/.
docs/running-on-kubernetes.md
Outdated
actually running in a pod, keep in mind that the executor pods may not be deleted from the cluster when the application | ||
exits. The Spark scheduler attempts to delete these pods, but if the network request to the API server fails for any | ||
reason, these pods will remain in the cluster. The executor processes should exit when they cannot reach the driver, so | ||
the executor pods should not consume resources in the cluster after your application exits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/should not consume resources
/should not consume compute resources (cpus and memory)
/.
docs/running-on-kubernetes.md
Outdated
reach the driver, so the executor pods should not consume resources in the cluster after your application exits. | ||
The driver will look for a pod with the given name in the namespace specified by `spark.kubernetes.namespace`, and | ||
all executor pods will have their owner reference field set to point to that pod. Be careful to avoid setting the | ||
owner reference to a pod that is not actually that driver pod, or else the executors may be terminated prematurely when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/owner reference
/OwnerReference
/ for consistency.
docs/running-on-kubernetes.md
Outdated
If your application is not running inside a pod, or if `spark.driver.pod.name` is not set when your application is | ||
actually running in a pod, keep in mind that the executor pods may not be deleted from the cluster when the application | ||
exits. The Spark scheduler attempts to delete these pods, but if the network request to the API server fails for any | ||
reason, these pods will remain in the cluster. The executor processes should exit when they cannot reach the driver, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/these pods will remain in the cluster
/these pods may not get deleted properly
/. There's a pod-specific GC that deletes terminated pods based on a cluster-wide capacity (by default 12500 pods). It sorts those pods by creation timestamp before deleting them. But this is unpredictable.
Kubernetes integration test starting |
Kubernetes integration test status success |
@liyinan926 did some of my own edits on top of your suggestions for docs wording on the latest patch. |
Test build #93358 has finished for PR 21748 at commit
|
test this please |
Test build #93359 has finished for PR 21748 at commit
|
Anyone know what's happening with this:
|
Test build #93361 has finished for PR 21748 at commit
|
Never mind, think it's recovering now. |
LGTM for the docs updates. |
Test build #93360 has finished for PR 21748 at commit
|
Test build #93357 has finished for PR 21748 at commit
|
Merging in a few hours if no additional comments are raised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! minor goodness for documentation/example IMO would be great to have later
driver pod to be routable from the executors by a stable hostname. When deploying your headless service, ensure that | ||
the service's label selector will only match the driver pod and no other pods; it is recommended to assign your driver | ||
pod a sufficiently unique label and to use that label in the label selector of the headless service. Specify the driver's | ||
hostname via `spark.driver.host` and your spark driver's port to `spark.driver.port`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mccheah as for your comment #21748 (comment) so this manual setup is ok, right?
there are some level of complexity here - perhaps a quick follow up of some sample template/kubectl commands would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah manual setup is fine for now. Think additional docs around how to do all this can be a separate PR.
actually running in a pod, keep in mind that the executor pods may not be properly deleted from the cluster when the | ||
application exits. The Spark scheduler attempts to delete these pods, but if the network request to the API server fails | ||
for any reason, these pods will remain in the cluster. The executor processes should exit when they cannot reach the | ||
driver, so the executor pods should not consume compute resources (cpu and memory) in the cluster after your application |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executor processes should exit when they cannot reach the driver
what's the time out value? is it configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear, it triggers in the onDisconnected
event so I think there's a persistent socket connection that's dropped that causes the exit. So, it should more or less be instantaneous.
Some(new File(Config.KUBERNETES_SERVICE_ACCOUNT_CA_CRT_PATH))) | ||
} else { | ||
(KUBERNETES_AUTH_CLIENT_MODE_PREFIX, | ||
masterURL.substring("k8s://".length()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there's some function for parsing the k8s master url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make such a helper function, currently this logic is done here and in KubernetesClientApplication
Ok after the next build passes I'm going to merge immediately. Thanks for the review. |
Kubernetes integration test starting |
Test build #93551 has finished for PR 21748 at commit
|
Kubernetes integration test status success |
@mccheah the integration tests did not include the |
.withLabels(labels.asJava) | ||
.endMetadata() | ||
.withNewSpec() | ||
.withServiceAccountName("default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mccheah if people use spark-rbac.yaml this will fail. It fails for me. Shouldnt be hardcoded.
Error: "User "system:serviceaccount:spark:default" cannot get pods in the namespace "spark"."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup we can fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a JIRA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created one: https://issues.apache.org/jira/browse/SPARK-24963
What changes were proposed in this pull request?
Support client mode for the Kubernetes scheduler.
Client mode works more or less identically to cluster mode. However, in client mode, the Spark Context needs to be manually bootstrapped with certain properties which would have otherwise been set up by spark-submit in cluster mode. Specifically:
We also change the authentication configuration prefixes for client mode.
How was this patch tested?
Adding an integration test to exercise client mode support.