Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LIVY-702]: Submit Spark apps to Kubernetes #249

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

jahstreet
Copy link
Contributor

@jahstreet jahstreet commented Oct 27, 2019

What changes were proposed in this pull request?

Jira

This PR is one of the PRs in the series related to the splitting of the base PR #167 to multiple PRs to ease and speed up review and merge processes.

This PR proposes a way to submit Spark apps to Kubernetes cluster. Points covered:

  • Submit batch sessions
  • Submit interactive sessions
  • Monitor sessions, collect logs and diagnostics information
  • Restore sessions monitoring after restarts
  • GC created Kubernetes resources
  • Restrict the set of allowed Kubernetes namespaces

How was this patch tested?

Unit tests.

Manual testing with Kubernetes on Docker Desktop for Mac v2.1.0.1.
Environment - Helm charts:

nginx-ingress:
  controller:
    service:
      loadBalancerIP: 127.0.0.1 # my-cluster.example.com IP address (from /etc/hosts)
      loadBalancerSourceRanges: []
cluster-autoscaler:
  enabled: false
oauth2-proxy:
  enabled: false
livy:
  image:
    pullPolicy: Never
    tag: 0.7.0-incubating-spark_2.4.3_2.11-hadoop_3.2.0-dev
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: nginx
      kubernetes.io/tls-acme: "true"
      nginx.ingress.kubernetes.io/rewrite-target: /$1
    path: /livy/?(.*)
    hosts:
    - my-cluster.example.com
    tls:
    - secretName: spark-cluster-tls
      hosts:
      - my-cluster.example.com
  persistence:
    enabled: true
  env:
    LIVY_LIVY_UI_BASE1PATH: {value: "/livy"}
    LIVY_SPARK_KUBERNETES_CONTAINER_IMAGE_PULL1POLICY: {value: "Never"}
    LIVY_SPARK_KUBERNETES_CONTAINER_IMAGE: {value: "sasnouskikh/livy-spark:0.7.0-incubating-spark_2.4.3_2.11-hadoop_3.2.0-dev"}
    LIVY_LIVY_SERVER_SESSION_STATE0RETAIN_SEC: {value: "300s"}
    LIVY_LIVY_SERVER_KUBERNETES_ALLOWED1NAMESPACES: {value: "default,test"}
historyserver:
  enabled: false
jupyterhub:
  enabled: true
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: nginx
      kubernetes.io/tls-acme: "true"
    hosts:
    - my-cluster.example.com
    pathSuffix: ''
    tls:
    - secretName: spark-cluster-tls
      hosts:
      - my-cluster.example.com
  hub:
    baseUrl: /jupyterhub
    publicURL: "https://my-cluster.example.com"
    activeServerLimit: 10
    # $> openssl rand -hex 32
    cookieSecret: 41b85e5f50222b1542cc3b38a51f4d744864acca5e94eeb78c6e8c19d89eb433
    pdb:
      enabled: true
      minAvailable: 0
  proxy:
    # $> openssl rand -hex 32
    secretToken: cc52356e9a19a50861b22e08c92c40b8ebe617192f77edb355b9bf4b74b055de
    pdb:
      enabled: true
      minAvailable: 0
  cull:
    enabled: false
    timeout: 300
    every: 60
  • Interactive sessions - Jupyter notebook on JupyterHub with Sparkmagic
  • Batch sessions - SparkPi:
curl -k -H 'Content-Type: application/json' -X POST \
  -d '{
        "name": "SparkPi-01",
        "className": "org.apache.spark.examples.SparkPi",
        "numExecutors": 2,
        "file": "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar",
        "args": ["10000"],
        "conf": {
            "spark.kubernetes.namespace": "<namespace>"
        }
      }' "https://my-cluster.example.com/livy/batches"

@jahstreet jahstreet changed the title [LIVY-588]: Submit Spark apps to Kubernetes [LIVY-702]: Submit Spark apps to Kubernetes Oct 27, 2019
@codecov-io
Copy link

codecov-io commented Oct 27, 2019

Codecov Report

Merging #249 into master will decrease coverage by 1.52%.
The diff coverage is 34.53%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #249      +/-   ##
============================================
- Coverage     68.19%   66.66%   -1.53%     
- Complexity      964      982      +18     
============================================
  Files           104      105       +1     
  Lines          5952     6252     +300     
  Branches        900      955      +55     
============================================
+ Hits           4059     4168     +109     
- Misses         1314     1483     +169     
- Partials        579      601      +22     
Impacted Files Coverage Δ Complexity Δ
...ain/java/org/apache/livy/rsc/driver/RSCDriver.java 79.33% <0.00%> (-0.67%) 45.00 <0.00> (ø)
...e/livy/server/interactive/InteractiveSession.scala 69.76% <0.00%> (-0.41%) 51.00 <0.00> (ø)
...rc/main/scala/org/apache/livy/utils/SparkApp.scala 45.23% <5.55%> (-30.77%) 1.00 <0.00> (ø)
...main/scala/org/apache/livy/server/LivyServer.scala 33.03% <20.00%> (+<0.01%) 11.00 <1.00> (ø)
...ala/org/apache/livy/utils/SparkKubernetesApp.scala 32.42% <32.42%> (ø) 14.00 <14.00> (?)
rsc/src/main/java/org/apache/livy/rsc/RSCConf.java 88.18% <100.00%> (+0.33%) 9.00 <1.00> (+1.00)
...rver/src/main/scala/org/apache/livy/LivyConf.scala 96.42% <100.00%> (+0.29%) 23.00 <2.00> (+2.00)
.../scala/org/apache/livy/sessions/SessionState.scala 61.11% <0.00%> (ø) 2.00% <0.00%> (ø%)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee7fdfc...e087b39. Read the comment docs.

private var sessionLeakageCheckInterval: Long = _
private val leakedAppTags = new java.util.concurrent.ConcurrentHashMap[String, Long]()

private val leakedAppsGCThread = new Thread() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a GC thread here? RSCDriver will shut down itself if there's no client come in for a while. Please check this code

Copy link
Contributor Author

@jahstreet jahstreet Oct 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this GC thread collects leacked apps in cases when Driver cannot be discovered after its submission (usually when there are not enough resources in the cluster, or some error by accident); look here.
What about RSCDriver shutdown it comes to play only if Spark Driver has been launched, and it works only for Interactive sessions.
You may also think here about Livy GC which collects expired session states, but it can and usually will be configured with bigger timeout and serves another purpose.
Does that make it more clear for you?

kubernetesDiagnostics = ArrayBuffer(e.getMessage)
changeState(SparkApp.State.FAILED)
} finally {
listener.foreach(_.infoChanged(AppInfo(sparkUiUrl = None)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the user expected to access the driver UI ? Without setting up the ingress and surfacing that URL, it may not be very useful. The original patch handled this and think should be part of the basic requirements.

Copy link
Contributor Author

@jahstreet jahstreet Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
listener.foreach(_.infoChanged(AppInfo(sparkUiUrl = None)))

Suggestion: It shouldn't be unset cause it hasn't been set.

As a first iteration this PR provides a way to submit and track the Spark apps by Livy, as well to integrate Interactive sessions with Notebooks (or whatever). To access Spark UI user still needs to handle access on its own so far. The easiest way I guess is to use kubectl port-forward ... manually.

There could be multiple points of view on how is it better to compose base PR splitting or whether it even makes sense to do it, unfortunately... But would be nice to let it go at some point. I propose to create small PRs representing single aspect of the whole project each. We can merge them to bigger ones at any time, but splitting out is a bit more painful. In the meantime you can take a look at the following one : #252 .

Or we can always roll back to the original PR and just refactor it up to the acceptable state.

WDYT?

@jahstreet
Copy link
Contributor Author

@mgaido91 Could you take a look?

@jahstreet
Copy link
Contributor Author

@mgaido91 ping.

@mgaido91
Copy link
Contributor

@jahstreet I am not the best guy to take a look at this honestly. I am reviewing this PR in a few hours, but would be great to have feedbacks also from other people who are more familiar with this part of Livy. cc @vanzin @jerryshao

@jahstreet
Copy link
Contributor Author

@jahstreet I am not the best guy to take a look at this honestly. I am reviewing this PR in a few hours, but would be great to have feedbacks also from other people who are more familiar with this part of Livy. cc @vanzin @jerryshao

Ah, I see. Will try to ping them. Thanks anyway.

@@ -399,7 +399,13 @@ class InteractiveSession(
app = mockApp.orElse {
val driverProcess = client.flatMap { c => Option(c.getDriverProcess) }
.map(new LineBufferedProcess(_, livyConf.getInt(LivyConf.SPARK_LOGS_SIZE)))
driverProcess.map { _ => SparkApp.create(appTag, appId, driverProcess, livyConf, Some(this)) }
driverProcess.map(_ => SparkApp.create(appTag, appId, driverProcess, livyConf, Some(this)))
.orElse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: on the line above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i haven't got the idea. what is nit ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the short for nitpicking.

I meant to put .oElse { immediately after the ( on the line above. It is just a style thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceeds line length then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driverProcess.map(
  _ => SparkApp.create(appTag, appId, driverProcess, livyConf, Some(this))).orElse {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, ok!

if (kubernetesNamespaces.nonEmpty && !kubernetesNamespaces.contains(targetNamespace)) {
throw new IllegalArgumentException(
s"Requested namespace $targetNamespace doesn't match the configured: " +
s"${kubernetesNamespaces.mkString(", ")}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s"${kubernetesNamespaces.mkString(", ")}")
kubernetesNamespaces.mkString(", "))

apps.get(leakedApp.getKey) match {
case Some(seq) =>
seq.foreach(app =>
if (withRetry(kubernetesClient.killApplication(app))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if this return false? at least a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

// kill the app if found it or remove it if exceeding a threshold
val leakedApps = leakedAppTags.entrySet().iterator()
val now = System.currentTimeMillis()
val apps = withRetry(kubernetesClient.getApplications()).groupBy(_.getApplicationTag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see exception handling here....an exception here destroys the thread so so leakage removal works anymore after an exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

}

private[utils] def mapKubernetesState(
kubernetesAppState: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: another indent

Copy link
Contributor Author

@jahstreet jahstreet Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here. could you please explain what do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add other 2 spaces indent here, like:

Suggested change
kubernetesAppState: String,
kubernetesAppState: String,

SparkApp.State.FAILED
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this empty line

}

class SparkKubernetesApp private[utils](
appTag: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent once more

process.map(_.errorLines).getOrElse(ArrayBuffer.empty[String]))) ++
("\nKubernetes Diagnostics: " +: kubernetesDiagnostics)

override def kill(): Unit =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add { and } around methods everywhere, even when not necessary. They help readability.

if (deadline.isOverdue) {
process.foreach(_.destroy())
leakedAppTags.put(appTag, System.currentTimeMillis())
throw new IllegalStateException(s"No Kubernetes application is found with tag" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new IllegalStateException(s"No Kubernetes application is found with tag" +
throw new IllegalStateException("No Kubernetes application is found with tag" +

@jahstreet
Copy link
Contributor Author

jahstreet commented Nov 14, 2019

Build vailure due to Travis:

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received

@jerryshao @vanzin could you take a look please? Your review would be really helpful to let that PR go.

@jahstreet jahstreet requested a review from mgaido91 December 5, 2019 09:44
@jahstreet jahstreet force-pushed the kubernetes-support-initial branch from 825741b to b54eee9 Compare December 5, 2019 09:46
@jahstreet
Copy link
Contributor Author

@yiheng @arunmahadevan @mgaido91
Anything else from your side?

@jahstreet jahstreet force-pushed the kubernetes-support-initial branch from e3740ae to 865aa24 Compare December 5, 2019 09:53
@jahstreet
Copy link
Contributor Author

Rebased

@jahstreet jahstreet force-pushed the kubernetes-support-initial branch from 865aa24 to 4a6501c Compare January 14, 2020 11:46
@jahstreet
Copy link
Contributor Author

Rebased to master.

Copy link
Member

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw your email and took some time to check out your incremental changes. With my focus on the UI and Confs side of Livy, I've only reviewed the code changes to existing files (I'll leave reviews of SparkKubernetesApp.scala and SparkKubernetesAppSpec.scala to those with better knowledge and more time). Overall your conf related changes look good, only a small note on an added if block.

@@ -402,6 +402,9 @@ class InteractiveSession(

if (livyConf.isRunningOnYarn() || driverProcess.isDefined) {
Some(SparkApp.create(appTag, appId, driverProcess, livyConf, Some(this)))
} else if (livyConf.isRunningOnKubernetes()) {
// Create SparkKubernetesApp anyway to recover app monitoring on Livy server restart
Some(SparkApp.create(appTag, appId, driverProcess, livyConf, Some(this)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just the same line as 404 why is it in it's own else if block? Wouldn't it make more sense to add || livyConf.isRunningOnKubernetes() to the if on line 403?

Copy link
Contributor Author

@jahstreet jahstreet Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, nice catch, agree. EDIT: resolved

@ghost
Copy link

ghost commented Feb 10, 2020

can we merge this? :)

@jahstreet
Copy link
Contributor Author

can we merge this? :)

I would also love to! Opened for the suggestions on how to get closer to it.

@SarnathK
Copy link

Is there a timeline when this will get integrated with Livy? This would help us run Jupyter on Spark on Kubernetes. Any ETA will be very helpful! Thanks!

@jahstreet
Copy link
Contributor Author

jahstreet commented Mar 28, 2020

Hi @SarnathK , I've tried to contact the community multiple times via mailing lists with no luck to push this forward.
I'm tracking the activity around this work and have a list of patches on top of it in the backlog. Also I'm always ready to provide the full support around on up Livy on Kubernetes.
I could add you to the thread so you could share your use cases with the community to pay more attention to this patch if you don't mind. Don't you?

@ajbozarth
Copy link
Member

@jerryshao do you have bandwidth to review this, I've done a partial review above, but need another pair of eyes.

@jerryshao
Copy link
Contributor

I can take a chance to review this, but I'm not an export of k8s, may not fully understand the pros and cons of the implementation.

@cometta
Copy link

cometta commented Sep 27, 2022

any estimated time frame this ticket can be merged?

@anistal
Copy link

anistal commented Jan 4, 2023

I agree with all the comments: a lot of effort that should be at least considered

Comment on lines +155 to +160
public boolean isRunningOnKubernetes() {
return Optional.ofNullable(get("livy.spark.master"))
.filter(s -> s.startsWith("k8s"))
.isPresent();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function may always return false, since "livy.spark.master" will not get by RSCConf

Copy link
Contributor

@idzikovsky idzikovsky Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems like that's what I've faced in #249 (comment)

However, it seems like it's not affecting functionality, as this function is used while setting RPC_SERVER_ADDRESS here:
https://github.com/apache/incubator-livy/pull/249/files/b87c0cebb65ce7f34e6b4b6b738095be6254cf69#diff-43114318c4b009c2404f7eb326a84c184fb1501a3237c49a771df851d0f6f328R172-R178

And the value of RPC_SERVER_ADDRESS is not used anyway since Livy 0.7 because of things I've explained in #388.

@askhatri
Copy link
Contributor

I have validated this fix in the new git branch. I found that the fix is working as expected. The detailed steps used during the validation are documented at README.md. Should we consider merging the fix?

@lmccay
Copy link
Contributor

lmccay commented Mar 28, 2023

@askhatri - I would say that you can merge this given your review and testing.

@gyogal
Copy link
Contributor

gyogal commented Mar 29, 2023

Thank you @lmccay! This LGTM as well and based on @askhatri 's testing, if there are no objections, I think we should go ahead and merge this. We can address the remaining issues in separate tickets.

@jahstreet
Copy link
Contributor Author

Happy to see things going in this PR 🎉 . Thank you a lot folks for putting the effort in reviewing and testing the change.

@askhatri can I help you with anything to test the work?

I'd love to pay my time to finalize the activity here and then I have something to offer on top of it to contribute with.

@askhatri
Copy link
Contributor

Happy to see things going in this PR 🎉 . Thank you a lot folks for putting the effort in reviewing and testing the change.

@askhatri can I help you with anything to test the work?

I'd love to pay my time to finalize the activity here and then I have something to offer on top of it to contribute with.

Thank you, @jahstreet for offering your help in testing.

During my initial testing, I found that the code is working as expected. Only one observation is that we might need to upgrade spark-on-kubernetes-helm to support Helm Chart 3.x and Kubernetes latest version 1.24 or higher.

@jahstreet
Copy link
Contributor Author

jahstreet commented Apr 3, 2023

Happy to see things going in this PR 🎉 . Thank you a lot folks for putting the effort in reviewing and testing the change.
@askhatri can I help you with anything to test the work?
I'd love to pay my time to finalize the activity here and then I have something to offer on top of it to contribute with.

Thank you, @jahstreet for offering your help in testing.

During my initial testing, I found that the code is working as expected. Only one observation is that we might need to upgrade spark-on-kubernetes-helm to support Helm Chart 3.x and Kubernetes latest version 1.24 or higher.

Looking into it, not 100% sure we can get latest K8s version, given latest Spark 3.3.2 works with Fabric8 client 5.12.2 which aims for K8s <= 1.23.13 (as per compatibility matrix). Will share my findings ...

UPD: seems the latest Spark 3.4.0 already bumped the K8s client version. We are getting there folks ...

@askhatri
Copy link
Contributor

askhatri commented Apr 3, 2023

Happy to see things going in this PR 🎉 . Thank you a lot folks for putting the effort in reviewing and testing the change.
@askhatri can I help you with anything to test the work?
I'd love to pay my time to finalize the activity here and then I have something to offer on top of it to contribute with.

Thank you, @jahstreet for offering your help in testing.
During my initial testing, I found that the code is working as expected. Only one observation is that we might need to upgrade spark-on-kubernetes-helm to support Helm Chart 3.x and Kubernetes latest version 1.24 or higher.

Looking into it, not 100% sure we can get latest K8s version, given latest Spark 3.3.2 works with Fabric8 client 5.12.2 which aims for K8s <= 1.23.13 (as per compatibility matrix). Will share my findings ...

Thank you @jahstreet

@satishdalli
Copy link

satishdalli commented May 9, 2023

Absolutely great work @jahstreet. We are waiting for PR with Helm 3 and k8s 1.24+ versions support. currently, we are using livy with consistent resources. we want to try this in AKS, EKS, and on-prem k8s clusters as a server-less livy.

@lmccay
Copy link
Contributor

lmccay commented May 9, 2023

Hey Folks - I noticed this JIRA is listed for 0.8.0 and would like to get a sense for how far out this may be.
I'm trying to burn down the release blockers so that we can tackle the process of the first release since the reboot.
If I don't hear otherwise, I will move it to 0.9.0 and we can discuss pulling it back in on the JIRA - if need be.

@jahstreet
Copy link
Contributor Author

Hi @lmccay , thank you for keeping eye on it.

I think this PR is already battle tested and good to go. There is already a chain of work done on top of it by me and other people who left feedback on this chunk of work. If we make it a part of master I'm 100% sure I won't be the only one pushing Livy project to the world of K8s. Besides that, seeing the progress after the years of waiting would boost the motivation to continue contributions... So, including the upgrade of dependencies to support latest K8s and Spark versions, I propose to tackle those in separate PRs. How do you feel about merging it to 0.8 and is there anything formal we should do to make it happen?

(resolving the merge conflicts in the meantime)

@idzikovsky
Copy link
Contributor

Yeah. We've been using Livy on K8s with fixes from this PR and PR #252 for near 2 years now, in different configurations. Including different Spark versions starting from 2.4.4 and up to 3.3.1 (however some fixes where made in Livy itself to make it compatible with Spark 3.3).
Everything seems to be fine by now, except for the setup with Istio.
To resolve those issues, we applied fix from #388 and reverted the thing which I described in #249 (comment) and #249 (comment).

But in general everything works fantastic.
Thank you @jahstreet

@lmccay
Copy link
Contributor

lmccay commented May 12, 2023

@jahstreet and @idzikovsky - if we merge this before branching from 0.8.0 and it doesn't cause any issues due to other work not being there yet then I would have no problem with doing so. I'm personally not in a position to +1 the merge. Would someone that has tested it and/or tested with it inplace but not necessarily exercised be able to +1 it as a review?

@jahstreet
Copy link
Contributor Author

@lmccay I think we need to summon a maintainer with K8s expertise, is that something you expect? Do you know any name(s) we can put here and follow-up on that together?

@lmouhib
Copy link

lmouhib commented Sep 20, 2023

@jahstreet The implementation is great. I got couple of remarks/questions on the way it handle the logs and the ingress, the current implementation is opinionated for ngnix (for spark ui) and lokki, while both are widely adopted, how can someone with another ingress controller use their own (withouth modifying your code)? I understand the log might be a bit more challenging to provide the native UI integration, in that case maybe offer the possibility to use a sidecar for log shipping?

For livy there is a block that look up for the hostname, this does not play well with containers/k8s world. Would it be possible to make it more k8s native to work with a service, this would mean we might need to pass a svc as ENV variable. But at least we do not address directly the pod with its ip address directly.

@ozsoyler
Copy link

Hi. Is that possible to merge this into apache-incubator-livy officially on new version "0.9.0"? I guess if it merges, we will be able to use livy on kubernetes for launching our awesome spark jobs! Thanks..

@askhatri
Copy link
Contributor

askhatri commented Jun 7, 2024

Adding Kubernetes support to Apache Livy is a valuable enhancement for the next release. Should we consider merging this into the master branch?

@lmccay
Copy link
Contributor

lmccay commented Jun 7, 2024

@askhatri - yes, I think we should push for an active approval on the merge of this. We can likely mark this as an important contribution for the next release. Let's move this forward now.

@jesinity
Copy link

jesinity commented Jun 7, 2024

I've been following this issue since forever, as it would have been highly beneficial somewhat like 2 companies ago.
The work has been done but it was never merged. Is there maybe some passive obstruction against it? Looks like so.

@idzikovsky
Copy link
Contributor

@jesinity there is no passive obstruction against it here.
The problem is that there are no active Livy maintainers who are willing to review this and/or familiar enough with K8s.

@devstein
Copy link

devstein commented Jun 7, 2024

To help with the K8s review/perspective, I know there are teams that have been using this branch in production for years (cc @jpugliesi).

@askhatri
Copy link
Contributor

I have created a new #451 with the update includes a newer version of the Kubernetes client and adds code to display the Spark UI. CC: @jahstreet

gyogal pushed a commit that referenced this pull request Jul 10, 2024
This pull request (PR) is the foundational PR for adding Kubernetes support in Apache Livy, originally found here (#249). This update includes a newer version of the Kubernetes client and adds code to display the Spark UI.

## Summary of the Proposed Changes

This PR introduces a method to submit Spark applications to a Kubernetes cluster. The key points covered include:

 * Submitting batch sessions
 * Submitting interactive sessions
 * Monitoring sessions, collecting logs, and gathering diagnostic information
 * Restoring session monitoring after restarts
 * Garbage collection (GC) of created Kubernetes resources

JIRA link: https://issues.apache.org/jira/browse/LIVY-702

## How was this patch tested?

 * Unit Tests: The patch has been verified through comprehensive unit tests.
 * Manual Testing: Conducted manual testing using Kubernetes on Docker Desktop.
    *  Environment: Helm charts.

For detailed instructions on testing using Helm charts, please refer to the documentation available at https://github.com/askhatri/livycluster

Co-authored-by: Asif Khatri <asif.khatri@cloudera.com>
Co-authored-by: Alex Sasnouskikh <jahstreetlove@gmail.com>
jimenefe pushed a commit to onedot-data/incubator-livy that referenced this pull request Oct 15, 2024
This pull request (PR) is the foundational PR for adding Kubernetes support in Apache Livy, originally found here (apache#249). This update includes a newer version of the Kubernetes client and adds code to display the Spark UI.

## Summary of the Proposed Changes

This PR introduces a method to submit Spark applications to a Kubernetes cluster. The key points covered include:

 * Submitting batch sessions
 * Submitting interactive sessions
 * Monitoring sessions, collecting logs, and gathering diagnostic information
 * Restoring session monitoring after restarts
 * Garbage collection (GC) of created Kubernetes resources

JIRA link: https://issues.apache.org/jira/browse/LIVY-702

## How was this patch tested?

 * Unit Tests: The patch has been verified through comprehensive unit tests.
 * Manual Testing: Conducted manual testing using Kubernetes on Docker Desktop.
    *  Environment: Helm charts.

For detailed instructions on testing using Helm charts, please refer to the documentation available at https://github.com/askhatri/livycluster

Co-authored-by: Asif Khatri <asif.khatri@cloudera.com>
Co-authored-by: Alex Sasnouskikh <jahstreetlove@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.