-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark submit in operator fails #1277
Comments
I've been trying to get spec:
sparkConfigMap: log4j-props and generating the a cm using: configMapGenerator:
- files:
- config/log4j.properties
name: log4j-props
generatorOptions:
disableNameSuffixHash: true But I can't that to work either:
|
I found that for k8s 1.19.1, the kubernetes-client has to be of a version greater than >= 4.13.1 compatibility-matrix. Looking at the deps in 1.3 branch for spark I see the following: So, kubernetes-client 4.12.0 Is used. So to confirm, it seems that spark does not yet support k8s 1.19. would be great if someone your verify this. |
Seems like an updated dept version was just added to master: |
The issue remains even after testing with building spark from master. Got debug logs set up as well for further details:
|
So the issue is related to fabric8io/kubernetes-client#2212 (comment) In order to make it work. we had to add the following to the spark-operator, driver & executor: env:
- name: HTTP2_DISABLE # https://github.com/fabric8io/kubernetes-client/issues/2212#issuecomment-628551315
value: "true" |
fabric8io/kubernetes-client#3176 (comment) is a good write-up of the root-cause. In short, fabric8's kubernetes-client cannot communicate with a Kubernetes API server where the weak TLS cipher TLS_RSA_WITH_AES_256_GCM_SHA384 has been disabled. Disabling HTTP2 is a work-around. |
@LeonardAukea could You try to run with |
Thanks @slachiewicz , setting KUBERNETES_TLS_VERSIONS=TLSv1.2,TLSv1.3also worked. |
@slachiewicz @nnringit I am facing the same error when submitting spark app to Kubernetes. Could you, please, tell me where I should change or add KUBERNETES_TLS_VERSIONS=TLSv1.2,TLSv1.3? |
Hi, I tried both the option:
`
But both the options are not able to resolve the issue. Can someone suggest me what am I missing? |
@LeonardAukea @DoniyorTuremuratov @slachiewicz I am also facing the same issue with the latest spark-operator... I tried setting both For what its worth, it might be related to the fact that spark-operator still sets up the spark-operator with kubernetes client version 4.12.0 which really only provides full support up to kubernetes version 1.18, with minimal support up to 1.22, and no support for versions 1.23+ compatibility matrix.
The latest kubernetes version is 1.26, with spark 3.3.0 even supporting kubernetes-client 5.12.2. Is there a way to at least make sure that the spark-operator uses kubernetes-client 5.12.2, and try with that to see if that fixes the issue ? Below is my error for visibility:
|
@JunaidChaudry I'm exactly stuck at where you are. Did you get a solution to this problem? |
@LeonardAukea Can you specify how will one add an env var to Spark Operator. I've added the HTTP2_DISABLE var to driver and executor config, but has had no effect. How did you add it to the Operator itself? |
@harshal-zetaris did you enable webhooks? I had to enable webhooks, and configure the |
I had the webhooks enabled, but didn't have the port configured. This solved it #1708 (comment) |
Wow! That worked @JunaidChaudry . However I'm confused as to why. I literally spun a whole new EKS cluster just in March this year and have been using that as our official QA cluster. Deployments there are still going as smooth as butter. I suddenly started getting into precisely this problem after I spun another cluster a couple of days back. The interesting thing is deployments on the old cluster are still working fine. I read through the conversation in your linked issue and indeed the new version of node AMI has been released on May 1, post which this issue started manifesting. Thank You so much for your help. |
I am in the exact same boat as you. It has something to do with the AWS AMI update that was received in late March. I had multiple EKS clusters, with the webhook working out of the box on all of them... UNTIL I restarted my EKS nodes and they started running with the newer AWS AMI version. |
@JunaidChaudry @harshal-zetaris @satyamsah , any luck with fix for the SocketTimeoutException/K8SClientException |
@JunaidChaudry hi, I got the same question with " Operation: [create] for kind: [Pod] with name: [null] in namespace: [spark-operator] failed". I didn't use helm to intall the operator, instead of pulling operator iamge and loading to container platform. I'm not sure if I've been enable webhook. Do you have any idea? Thanks. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it. |
Hi all, I seem to be having some issues with the getting a spark application up and running: hittig issues like this:
I have istio on the cluster hence I also tired the following settings with no avail:
So somehow it seems like the application is not able to communicate with the kubernetes API. the default-editior sa has the following rules:
i also added the authorizationpolicy to allow traffic for for webhook & operator:
If anyone has seen this before or has any valuable pointers. that would be much appreciated.
k8s: 1.19
version: "v1beta2-1.2.3-3.1.1"
chart: 1.1.3
istio: 1.19
This
PROTOCOL_ERROR
might also be a pointer towards the underlying issue:The text was updated successfully, but these errors were encountered: