-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36061][K8S] Add support for PodGroup #34456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #144808 has finished for PR 34456 at commit
|
|
Kubernetes integration test starting |
...tes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
Outdated
Show resolved
Hide resolved
...tes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test status failure |
|
Test build #145065 has finished for PR 34456 at commit
|
|
I moved the PodGroup related API to a proper place, as a extension of k8s-client: fabric8io/kubernetes-client#3580 . |
...tes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
Outdated
Show resolved
Hide resolved
| <type>test-jar</type> | ||
| </dependency> | ||
|
|
||
| <dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Volcano support in k8s-cli would be released at kubernetes-client v5.11
fabric8io/kubernetes-client#3580
TODO: neet to bump kubernetes-client version to latest when it publised.
Line 207 in 7b50cf0
| <kubernetes-client.version>5.10.1</kubernetes-client.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new dependencies are available since 5.11.0 - e.g. https://search.maven.org/artifact/io.fabric8/volcano-model-v1beta1
Latest available is 5.12.1 - https://search.maven.org/artifact/io.fabric8/volcano-model-v1beta1/5.12.1/bundle
| * Pod creation. | ||
| */ | ||
| def getAdditionalKubernetesResources(): Seq[HasMetadata] = Seq.empty | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is submited as a separated PR in #34599 .(1st and 2nd commits)
You could only see the 3rd commit for more clearly understand, that is, we only add pod feature step in this PR.
Codecov Report
@@ Coverage Diff @@
## master #34456 +/- ##
==========================================
- Coverage 90.15% 82.17% -7.99%
==========================================
Files 290 251 -39
Lines 62515 56994 -5521
Branches 9104 9281 +177
==========================================
- Hits 56362 46833 -9529
- Misses 4784 8947 +4163
+ Partials 1369 1214 -155
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
Test build #145675 has finished for PR 34456 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I close this PR since the artifact doesn't exist for now. Please reopen this when you are ready.
[error] sbt.librarymanagement.ResolveException: Error downloading io.fabric8:volcano-model-v1beta1:5.10.1
[error] Not found
[error] Not found
[error] not found: https://maven-central.storage-download.googleapis.com/maven2/io/fabric8/volcano-model-v1beta1/5.10.1/volcano-model-v1beta1-5.10.1.pom
[error] not found: https://repo1.maven.org/maven2/io/fabric8/volcano-model-v1beta1/5.10.1/volcano-model-v1beta1-5.10.1.pom
[error] not found: /home/jenkins/sparkivy/per-executor-caches/11/.m2/repository/io/fabric8/volcano-model-v1beta1/5.10.1/volcano-model-v1beta1-5.10.1.pom
[error] not found: /home/jenkins/sparkivy/per-executor-caches/11/.ivy2/localio.fabric8/volcano-model-v1beta1/5.10.1/ivys/ivy.xml
[error] Error downloading io.fabric8:volcano-client:5.10.1
[error] Not found
[error] Not found
[error] not found: https://maven-central.storage-download.googleapis.com/maven2/io/fabric8/volcano-client/5.10.1/volcano-client-5.10.1.pom
[error] not found: https://repo1.maven.org/maven2/io/fabric8/volcano-client/5.10.1/volcano-client-5.10.1.pom
[error] not found: /home/jenkins/sparkivy/per-executor-caches/11/.m2/repository/io/fabric8/volcano-client/5.10.1/volcano-client-5.10.1.pom
[error] not found: /home/jenkins/sparkivy/per-executor-caches/11/.ivy2/localio.fabric8/volcano-client/5.10.1/ivys/ivy.xml
martin-g
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that it is/will be possible to setup Driver and/or Executor configurations with their own PodGroups.
What about a PodGroup that reserves resources (pods) for all actors (driver + executors) ?
| <type>test-jar</type> | ||
| </dependency> | ||
|
|
||
| <dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new dependencies are available since 5.11.0 - e.g. https://search.maven.org/artifact/io.fabric8/volcano-model-v1beta1
Latest available is 5.12.1 - https://search.maven.org/artifact/io.fabric8/volcano-model-v1beta1/5.12.1/bundle
What changes were proposed in this pull request?
PodGroupis a group of pods with strong association and is mainly used in batch scheduling, is of a Custom Resource Definition (CRD) type in Kubernetes, PodGroup concept which was approved by Kuberentes community in KEP-583 Coscheduling.This patch adds the PodGroup support for Kuberentes:
spark.kubernetes.enablePodGroup, and also adds two configurations (spark.kubernetes.podgroup.min.[cpu|memory]) to helps user specifing min CPU and min Memory for a PodGroup.volcano, will create the PodGroup with minReousrce requirement in Volcano automically, If available resources in the cluster cannot satisfy the requirement, no pod in the PodGroup will be scheduled.scheduling.k8s.io/group-namekey and value s"${kubernetesConf.resourceNamePrefix}-podgroup".Such as, user can use below configuration to request a group of pods with 4 CPU/ 8G Mem as min requirement, the volcano will help user create these pods if the meet the min requirement (4 CPUU, 8G Mem), If available resources in the cluster cannot satisfy the requirement, no pod in the PodGroup will be scheduled.
Why are the changes needed?
Provide feature to request minimum resources before scheduling jobs.
Does this PR introduce any user-facing change?
Yes, add podgroup related configuration.
How was this patch tested?