You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
β I have searched the open/closed issues and my issue is not listed.
I'm trying to install spark-operator on k8s (v1.28), and running in to issues π
spark-controller pod is started but webhook pod is failing -
NAME READY STATUS RESTARTS AGE
pod/spark-operator-controller-688c7c9955-tkdpf 1/1 Running 0 3m15s
pod/spark-operator-webhook-567bd94f66-tg567 0/1 Error 5 (94s ago) 3m15s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/spark-operator-webhook-svc ClusterIP 10.108.242.219 <none> 443/TCP 3m15s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/spark-operator-controller 1/1 1 1 3m15s
deployment.apps/spark-operator-webhook 0/1 1 0 3m15s
NAME DESIRED CURRENT READY AGE
replicaset.apps/spark-operator-controller-688c7c9955 1 1 1 3m15s
replicaset.apps/spark-operator-webhook-567bd94f66 1 1 0 3m15s
Logs from webhook pod -
(base) Karans-MacBook-Pro:~ karanalang$ kc logs -f pod/spark-operator-webhook-567bd94f66-tg567 -n so350
++ id -u
+ uid=185
++ id -g
+ gid=185
+ set +e
++ getent passwd 185
+ uidentry=spark:x:185:185::/home/spark:/bin/sh
+ set -e
+ [[ -z spark:x:185:185::/home/spark:/bin/sh ]]
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator webhook start --zap-log-level=info --namespaces=default --webhook-secret-name=spark-operator-webhook-certs --webhook-secret-namespace=so350 --webhook-svc-name=spark-operator-webhook-svc --webhook-svc-namespace=so350 --webhook-port=443 --mutating-webhook-name=spark-operator-webhook --validating-webhook-name=spark-operator-webhook --enable-metrics=true --metrics-bind-address=:8080 --metrics-endpoint=/metrics --metrics-prefix= --metrics-labels=app_type --leader-election=true --leader-election-lock-name=spark-operator-webhook-lock --leader-election-lock-namespace=so350
Spark Operator Version: 2.0.2+HEAD+unknown
Build Date: 2024-10-11T01:46:23+00:00
Git Commit ID:
Git Tree State: clean
Go Version: go1.23.1
Compiler: gc
Platform: linux/amd64
2024-11-21T20:56:37.838Z INFO webhook/start.go:244 Syncing webhook secret {"name": "spark-operator-webhook-certs", "namespace": "so350"}
2024-11-21T20:56:37.936Z INFO webhook/start.go:258 Writing certificates {"path": "/etc/k8s-webhook-server/serving-certs", "certificate name": "tls.crt", "key name": "tls.key"}
2024-11-21T20:56:38.036Z INFO controller-runtime.builder builder/webhook.go:158 Registering a mutating webhook {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=SparkApplication", "path": "/mutate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-11-21T20:56:38.036Z INFO controller-runtime.webhook webhook/server.go:183 Registering webhook {"path": "/mutate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-11-21T20:56:38.036Z INFO controller-runtime.builder builder/webhook.go:189 Registering a validating webhook {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=SparkApplication", "path": "/validate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-11-21T20:56:38.036Z INFO controller-runtime.webhook webhook/server.go:183 Registering webhook {"path": "/validate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-11-21T20:56:38.036Z INFO controller-runtime.builder builder/webhook.go:158 Registering a mutating webhook {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=ScheduledSparkApplication", "path": "/mutate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-11-21T20:56:38.037Z INFO controller-runtime.webhook webhook/server.go:183 Registering webhook {"path": "/mutate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-11-21T20:56:38.037Z INFO controller-runtime.builder builder/webhook.go:189 Registering a validating webhook {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=ScheduledSparkApplication", "path": "/validate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-11-21T20:56:38.037Z INFO controller-runtime.webhook webhook/server.go:183 Registering webhook {"path": "/validate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-11-21T20:56:38.037Z INFO controller-runtime.builder builder/webhook.go:158 Registering a mutating webhook {"GVK": "/v1, Kind=Pod", "path": "/mutate--v1-pod"}
2024-11-21T20:56:38.037Z INFO controller-runtime.webhook webhook/server.go:183 Registering webhook {"path": "/mutate--v1-pod"}
2024-11-21T20:56:38.037Z INFO controller-runtime.builder builder/webhook.go:204 skip registering a validating webhook, object does not implement admission.Validator or WithValidator wasn't called {"GVK": "/v1, Kind=Pod"}
2024-11-21T20:56:38.037Z INFO webhook/start.go:320 Starting manager
2024-11-21T20:56:38.038Z INFO controller-runtime.metrics server/server.go:205 Starting metrics server
2024-11-21T20:56:38.038Z INFO controller-runtime.metrics server/server.go:244 Serving metrics server {"bindAddress": ":8080", "secure": false}
2024-11-21T20:56:38.039Z INFO manager/server.go:50 starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-11-21T20:56:38.039Z INFO controller-runtime.webhook webhook/server.go:191 Starting webhook server
2024-11-21T20:56:38.039Z INFO webhook/start.go:358 disabling http/2
2024-11-21T20:56:38.039Z INFO controller-runtime.certwatcher certwatcher/certwatcher.go:161 Updated current TLS certificate
2024-11-21T20:56:38.040Z INFO controller-runtime.certwatcher certwatcher/certwatcher.go:115 Starting certificate watcher
2024-11-21T20:56:38.040Z INFO manager/internal.go:534 Stopping and waiting for non leader election runnables
2024-11-21T20:56:38.040Z INFO manager/internal.go:538 Stopping and waiting for leader election runnables
2024-11-21T20:56:38.040Z INFO manager/internal.go:546 Stopping and waiting for caches
2024-11-21T20:56:38.040Z INFO manager/internal.go:550 Stopping and waiting for webhooks
2024-11-21T20:56:38.040Z INFO manager/internal.go:553 Stopping and waiting for HTTP servers
I1121 20:56:38.040581 10 leaderelection.go:250] attempting to acquire leader lease so350/spark-operator-webhook-lock...
2024-11-21T20:56:38.041Z INFO manager/server.go:43 shutting down server {"kind": "health probe", "addr": "[::]:8081"}
2024-11-21T20:56:38.041Z INFO controller-runtime.metrics server/server.go:251 Shutting down metrics server with timeout of 1 minute
2024-11-21T20:56:38.041Z INFO manager/internal.go:557 Wait completed, proceeding to shutdown the manager
E1121 20:56:38.041688 10 leaderelection.go:332] error retrieving resource lock so350/spark-operator-webhook-lock: Get "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/so350/leases/spark-operator-webhook-lock": context canceled
2024-11-21T20:56:38.041Z ERROR webhook/start.go:322 Failed to start manager {"error": "listen tcp :443: bind: permission denied"}
github.com/kubeflow/spark-operator/cmd/operator/webhook.start
/workspace/cmd/operator/webhook/start.go:322
github.com/kubeflow/spark-operator/cmd/operator/webhook.NewStartCommand.func2
/workspace/cmd/operator/webhook/start.go:128
github.com/spf13/cobra.(*Command).execute
/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:989
github.com/spf13/cobra.(*Command).ExecuteC
/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117
github.com/spf13/cobra.(*Command).Execute
/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041
main.main
/workspace/cmd/main.go:27
runtime.main
/usr/local/go/src/runtime/proc.go:272
Pls note - I'd installed v2.0.0-rc.0, it was working fine .. however. running into issues with v2.0.2
Pls help with this.
thanks!
Reproduction Code
No response
Expected behavior
No response
Actual behavior
No response
Environment & Versions
Kubernetes Version: 1.28
Spark Operator Version: 2.0.2
Apache Spark Version: 3.5
Additional context
No response
Impacted by this bug?
Give it a π We prioritize the issues with most π
The text was updated successfully, but these errors were encountered:
@karanalang Please use a non-privileged webhook port (default to 9443) if possible, or you will need to run as root or modify the security context for that we have removed all the capabilities to enhance the container security.
Worth noting I think you want webhook.securityContext rather than webhook.containerSecurityContext. I was able to successfully run on Kind with your Helm values once I changed that.
What happened?
I'm trying to install spark-operator on k8s (v1.28), and running in to issues π
Command -
spark-controller pod is started but webhook pod is failing -
Logs from webhook pod -
Pls note - I'd installed v2.0.0-rc.0, it was working fine .. however. running into issues with v2.0.2
Pls help with this.
thanks!
Reproduction Code
No response
Expected behavior
No response
Actual behavior
No response
Environment & Versions
Additional context
No response
Impacted by this bug?
Give it a π We prioritize the issues with most π
The text was updated successfully, but these errors were encountered: