-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while running JobSet in different environments #357
Comments
I don't think we support building jobset on M1/M2 at the moment. See #237. |
But I also have issues when running it in kind from the v0.3.0 release, see my last section (kind - release v0.3.0 |
Yea, we don't support building it and we don't have a published arm image for jobset. It would be a good contribution if you are interested in looking into it. I'll see if I can reproduce your issue. |
Still it looks strange as I can successfully run the v0.3.0 in kind, also build it successfully locally, I always get issues with certificates in the internal cert-controller |
Yea I can see this also now. Something seems wrong with kind as I don't actually get the jobset controller to start. I also used kind 0.2.0 and I had issues installing Kueue also. |
So I notice that on amd64 I was also seeing issues in main. I thought it was related to latest version of kind and I opened up a PR to update it. I can see that some jobs will fail for this error (deployment is not present and logs in control plane show a fail to create the webhook). However the e2e tests sometimes work but doing the install manually seemed to have issues. I am going to try installing in on a live cluster today and see if it’s an issue with kind. |
do you still have this problem? I think #362 is the fix. |
I can confirm now that it works on Macbook Pro M1 (Sonoma 14.2.1) and kind v0.20.0. I tried version v0.3.1. Thanks to everybody involved in fixing this! |
I tried a couple of approaches to run JobSet and all of them failed with various errors.
Context
Approaches
Local -
make run
kind - config with
internalcert
First I run
make install
which successfully installs the JobSet CRDs.After that I run
kubectl apply --server-side -f config/default
, it fails to create the JobSet Controller Pod:It fails to start because Mutating & Validating Webhook Configuration objects are not patched, plus the certificate does not get generated.
kind - config with
certmanager
If I switch cert management to cert-manager I get slightly better results but still doesn't start.
It fails because the cert-controller is looking for a secret under a different name.
Secret name for cert-controller is hardcoded
jobset-webhook-server-cert
here https://github.com/kubernetes-sigs/jobset/blob/main/pkg/util/cert/cert.go#L26kind - release v0.3.0
If I try to run the guide from here https://github.com/kubernetes-sigs/jobset/blob/main/docs/setup/install.md#install-a-released-version, the objects get created, deployment has 0/1 available replicas, and describing the ReplicaSet I get:
Looks like the Pod Webhooks get created first, and it refuses creation of the Operator Pod as the Webhooks aren't properly configured, i.e. the
caBundle
isn't patched.And if we examine the
manifests.yaml
from thev0.3.0
release we can see that they don't contain thecert-manager.io/inject-ca-from
annotation fromcert-manager
and the Operator Pod cannot start and patch it manually as the Pod Webhooks get created first:Tasks
The text was updated successfully, but these errors were encountered: