Skip to content
This repository has been archived by the owner on Sep 2, 2022. It is now read-only.

Flink Session Cluster Installation is faling #464

Open
sumchak1 opened this issue Jul 16, 2021 · 9 comments
Open

Flink Session Cluster Installation is faling #464

sumchak1 opened this issue Jul 16, 2021 · 9 comments

Comments

@sumchak1
Copy link

based on #356, we have tried all the mentioned steps but still the flink session cluster installation is failed.

First we tried with the below steps and it didn't helped

       kubectl get job cert-job -n flink-operator-system -oyaml > cert-job.yaml
       kubectl delete job cert-job -n flink-operator-system
       kubectl apply -f cert-job.yaml

Again tried by editing the config-map to change the default expires days and it also didn't helped us
| openssl x509 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out ${tmpdir}/server-cert.pem

change to:

| openssl x509 -days 3650 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out ${tmpdir}/server-cert.pem

k delete -f config-map-up1.yaml -n flink-operator-system
configmap "cert-configmap" deleted
 
 
k apply -f config-map-up1.yaml -n flink-operator-system
configmap/cert-configmap created
 
 
 
kubectl get pods -n flink-operator-system
NAME                                                 READY   STATUS    RESTARTS   AGE
flink-operator-controller-manager-848b69b444-8v9l5   2/2     Running   0          43m
 
 
 
k apply -f cert-job-1.yaml -n flink-operator-system
job.batch/cert-job created
 
 
kubectl get pods -n flink-operator-system
NAME                                                 READY   STATUS      RESTARTS   AGE
cert-job-lgxzt                                       0/1     Completed   0          7s
flink-operator-controller-manager-848b69b444-8v9l5   2/2     Running     0          44m
 
 
 kubectl apply -f config/samples/flinkoperator_v1beta1_flinksessioncluster.yaml
Error from server (InternalError): error when creating "config/samples/flinkoperator_v1beta1_flinksessioncluster.yaml": Internal error occurred: failed calling webhook "mflinkcluster.flinkoperator.k8s.io": Post "https://flink-operator-webhook-service.flink-operator-system.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0
@yan234280533
Copy link

It like this issue:

#390

you can have an try

@sumchak1
Copy link
Author

We could solve the issue by creating a SAN certificate for the webhook. But after creating the custom resource for FlinkCluster, it didn't create pod, svc etc. Even, there is no event showing for this custom resource. Please check below:

$ kubectl apply -f flinkoperator_v1beta1_flinksessioncluster.yaml
flinkcluster.flinkoperator.k8s.io/flinksessioncluster-sample created
[sumit@sumit flink]$ 
[sumit@sumit flink]$ kubectl get pods 
NAME                                           READY   STATUS             RESTARTS   AGE
ddpstreamappdevtest-b5684688b-wz55p            0/1     ImagePullBackOff   0          12d
default-sparkoperator-667fff9765-g6trh         1/1     Running            0          12d
hive-1578926540-metastore-6dd4f78f9b-hkgns     1/1     Running            0          12d
hive-1578926540-server2-68d8685996-p5lf7       1/1     Running            0          12d
influxdbd2b0d-58bcdd89fb-xmv5m                 1/1     Running            0          12d
ingress-checker-1626517800-jzn4l               0/1     Completed          0          2d4h
ingress-checker-1626604200-476v6               0/1     Completed          0          28h
ingress-checker-1626690600-mfdzt               0/1     Completed          0          4h27m
logdna-agent-4zbnv                             1/1     Running            2          153d
logdna-agent-65r77                             1/1     Running            14         153d
logdna-agent-6ld8l                             1/1     Running            3          153d
logdna-agent-8g2ch                             1/1     Running            10         153d
logdna-agent-n22dh                             1/1     Running            6          153d
logdna-agent-wdj54                             1/1     Running            5          153d
overprovisioning-6d695dd44c-bvlk6              0/1     Pending            0          9d
overprovisioning-6d695dd44c-cz5j7              1/1     Running            0          12d
overprovisioning-6d695dd44c-j2dt6              1/1     Running            0          5d4h
overprovisioning-6d695dd44c-nxjv5              1/1     Running            0          12d
overprovisioning-6d695dd44c-p5jf9              1/1     Running            0          6d21h
overprovisioning-6d695dd44c-qhsgq              0/1     Pending            0          5d4h
overprovisioning-autoscaler-587ff88c66-5wcd7   1/1     Running            0          12d
privingressapp-8595c5f87d-xjv9c                1/1     Running            0          12d
[sumit@sumit flink]$ kubectl get FlinkCluster
NAME                         AGE
flinksessioncluster-sample   32s
[sumit@sumit flink]$ kubectl describe FlinkCluster flinksessioncluster-sample 
Name:         flinksessioncluster-sample
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"flinkoperator.k8s.io/v1beta1","kind":"FlinkCluster","metadata":{"annotations":{},"name":"flinksessioncluster-sample","names...
API Version:  flinkoperator.k8s.io/v1beta1
Kind:         FlinkCluster
Metadata:
  Creation Timestamp:  2021-07-19T14:57:00Z
  Generation:          1
  Managed Fields:
    API Version:  flinkoperator.k8s.io/v1beta1
    Fields Type:  FieldsV1
    Fields V 1:
      F : Metadata:
        F : Annotations:
          .:
          F : Kubectl . Kubernetes . Io / Last - Applied - Configuration:
      F : Spec:
        .:
        F : Env Vars:
        F : Flink Properties:
          .:
          F : Taskmanager . Number Of Task Slots:
        F : Image:
          .:
          F : Name:
          F : Pull Policy:
        F : Job Manager:
          .:
          F : Access Scope:
          F : Ports:
            .:
            F : Ui:
          F : Resources:
            .:
            F : Limits:
              .:
              F : Cpu:
              F : Memory:
          F : Security Context:
            .:
            F : Run As Group:
            F : Run As User:
        F : Task Manager:
          .:
          F : Replicas:
          F : Resources:
            .:
            F : Limits:
              .:
              F : Cpu:
              F : Memory:
          F : Sidecars:
          F : Volume Mounts:
          F : Volumes:
    Manager:         kubectl
    Operation:       Update
    Time:            2021-07-19T14:56:59Z
  Resource Version:  173782236
  Self Link:         /apis/flinkoperator.k8s.io/v1beta1/namespaces/default/flinkclusters/flinksessioncluster-sample
  UID:               951f98d1-5943-44a1-ba19-f399a1d643ba
Spec:
  Env Vars:
    Name:   FOO
    Value:  bar
  Flink Properties:
    Taskmanager . Number Of Task Slots:  1
  Image:
    Name:         flink:1.8.2
    Pull Policy:  Always
  Job Manager:
    Access Scope:           Cluster
    Memory Off Heap Min:    600M
    Memory Off Heap Ratio:  25
    Ports:
      Blob:    6124
      Query:   6125
      Rpc:     6123
      Ui:      8081
    Replicas:  1
    Resources:
      Limits:
        Cpu:     200m
        Memory:  1Gi
    Security Context:
      Run As Group:    9999
      Run As User:     9999
  Recreate On Update:  true
  Task Manager:
    Memory Off Heap Min:    600M
    Memory Off Heap Ratio:  25
    Ports:
      Data:    6121
      Query:   6125
      Rpc:     6122
    Replicas:  1
    Resources:
      Limits:
        Cpu:     200m
        Memory:  1Gi
    Sidecars:
      Command:
        sleep
        10000
      Image:  alpine
      Name:   sidecar
      Resources:
    Volume Mounts:
      Mount Path:  /cache
      Name:        cache-volume
    Volumes:
      Empty Dir:
      Name:  cache-volume
Events:      <none>

@sumchak1
Copy link
Author

@yan234280533 is there any update on this ?

@toniiiik
Copy link

@sumchak1 I have same problem. Check the manager pod if there is no restarts or OOMKILL events. I increase requests and limits for the memory and after manager start with new resource config the cluster comes up as expected. When I execute kubectl top pod -n flink the manager consumes a bit more memory then 30Mi so default values does not work.

@sumchak1
Copy link
Author

@toniiiik @yan234280533 , I checked the manager logs and updated the role binding based on the error. But I can see soe error in my manager pod . can you please help me to identify what actually the issue is ? Also in flinkcluster log I can see the session cluster status showing as creating in the event section.

$ kubectl get all -n flink-operator-system
NAME                                                     READY   STATUS      RESTARTS   AGE
pod/cert-job-q5hvp                                       0/1     Completed   0          6m31s
pod/flink-operator-controller-manager-848b69b444-j88t2   2/2     Running     0          5m51s

NAME                                                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/flink-operator-controller-manager-metrics-service   ClusterIP   172.21.16.17    <none>        8443/TCP   7d15h
service/flink-operator-webhook-service                      ClusterIP   172.21.43.122   <none>        443/TCP    7d15h

NAME                                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flink-operator-controller-manager   1/1     1            1           7d15h

NAME                                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/flink-operator-controller-manager-848b69b444   1         1         1       7d15h

NAME                 COMPLETIONS   DURATION   AGE
job.batch/cert-job   1/1           5s         6m33s
[sumit@sumit flink]$ 
[sumit@sumit flink]$ 
[sumit@sumit flink]$ kubectl top pod -n flink-operator-system
NAME                                                 CPU(cores)   MEMORY(bytes)   
flink-operator-controller-manager-848b69b444-j88t2   2m           25Mi   

         
[sumit@sumit flink]$ kubectl logs -n flink-operator-system -l app=flink-operator --all-containers
I0722 06:07:33.426511       1 main.go:209] Generating self signed cert as no cert is provided
I0722 06:07:33.570410       1 main.go:242] Listening securely on 0.0.0.0:8443
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90


[sumit@sumit flink]$ kubectl logs flink-operator-controller-manager-848b69b444-j88t2 -n flink-operator-system --all-containers
I0722 06:07:33.426511       1 main.go:209] Generating self signed cert as no cert is provided
I0722 06:07:33.570410       1 main.go:242] Listening securely on 0.0.0.0:8443
W0722 06:07:33.929936       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0722 06:07:35.158574       1 request.go:621] Throttling request took 1.048583353s, request: GET:https://172.21.0.1:443/apis/policy/v1beta1?timeout=32s
2021-07-22T06:07:35.209Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
2021-07-22T06:07:35.209Z	INFO	controller-runtime.builder	Registering a mutating webhook	{"GVK": "flinkoperator.k8s.io/v1beta1, Kind=FlinkCluster", "path": "/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.209Z	INFO	controller-runtime.webhook	registering webhook	{"path": "/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.210Z	INFO	controller-runtime.builder	Registering a validating webhook	{"GVK": "flinkoperator.k8s.io/v1beta1, Kind=FlinkCluster", "path": "/validate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.210Z	INFO	controller-runtime.webhook	registering webhook	{"path": "/validate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.210Z	INFO	setup	Starting manager
2021-07-22T06:07:35.228Z	INFO	controller-runtime.manager	starting metrics server	{"path": "/metrics"}
2021-07-22T06:07:35.228Z	INFO	controller-runtime.webhook.webhooks	starting webhook server
2021-07-22T06:07:35.269Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
2021-07-22T06:07:35.269Z	INFO	controller-runtime.controller	Starting EventSource	{"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:35.269Z	INFO	controller-runtime.webhook	serving webhook server	{"host": "", "port": 443}
2021-07-22T06:07:35.269Z	INFO	controller-runtime.certwatcher	Starting certificate watcher
2021-07-22T06:07:35.408Z	INFO	controller-runtime.controller	Starting EventSource	{"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.089Z	INFO	controller-runtime.controller	Starting EventSource	{"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.190Z	INFO	controller-runtime.controller	Starting EventSource	{"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.291Z	INFO	controller-runtime.controller	Starting EventSource	{"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.591Z	INFO	controller-runtime.controller	Starting Controller	{"controller": "flinkcluster"}
2021-07-22T06:07:36.591Z	INFO	controller-runtime.controller	Starting workers	{"controller": "flinkcluster", "worker count": 1}
2021-07-22T06:07:36.591Z	INFO	controllers.FlinkCluster	============================================================	{"cluster": "default/flinksessioncluster-sample"}
2021-07-22T06:07:36.591Z	INFO	controllers.FlinkCluster	---------- 1. Observe the current state ----------	{"cluster": "default/flinksessioncluster-sample"}
2021-07-22T06:07:36.591Z	INFO	controllers.FlinkCluster	Observed cluster	{"cluster": "default/flinksessioncluster-sample", "cluster": {"kind":"FlinkCluster","apiVersion":"flinkoperator.k8s.io/v1beta1","metadata":{"name":"flinksessioncluster-sample","namespace":"default","selfLink":"/apis/flinkoperator.k8s.io/v1beta1/namespaces/default/flinkclusters/flinksessioncluster-sample","uid":"951f98d1-5943-44a1-ba19-f399a1d643ba","resourceVersion":"173782236","generation":1,"creationTimestamp":"2021-07-19T14:57:00Z","annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"flinkoperator.k8s.io/v1beta1\",\"kind\":\"FlinkCluster\",\"metadata\":{\"annotations\":{},\"name\":\"flinksessioncluster-sample\",\"namespace\":\"default\"},\"spec\":{\"envVars\":[{\"name\":\"FOO\",\"value\":\"bar\"}],\"flinkProperties\":{\"taskmanager.numberOfTaskSlots\":\"1\"},\"image\":{\"name\":\"flink:1.8.2\",\"pullPolicy\":\"Always\"},\"jobManager\":{\"accessScope\":\"Cluster\",\"ports\":{\"ui\":8081},\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}},\"securityContext\":{\"runAsGroup\":9999,\"runAsUser\":9999}},\"taskManager\":{\"replicas\":1,\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}},\"sidecars\":[{\"command\":[\"sleep\",\"10000\"],\"image\":\"alpine\",\"name\":\"sidecar\"}],\"volumeMounts\":[{\"mountPath\":\"/cache\",\"name\":\"cache-volume\"}],\"volumes\":[{\"emptyDir\":{},\"name\":\"cache-volume\"}]}}}\n"},"managedFields":[{"manager":"kubectl","operation":"Update","apiVersion":"flinkoperator.k8s.io/v1beta1","time":"2021-07-19T14:56:59Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:envVars":{},"f:flinkProperties":{".":{},"f:taskmanager.numberOfTaskSlots":{}},"f:image":{".":{},"f:name":{},"f:pullPolicy":{}},"f:jobManager":{".":{},"f:accessScope":{},"f:ports":{".":{},"f:ui":{}},"f:resources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}}},"f:securityContext":{".":{},"f:runAsGroup":{},"f:runAsUser":{}}},"f:taskManager":{".":{},"f:replicas":{},"f:resources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}}},"f:sidecars":{},"f:volumeMounts":{},"f:volumes":{}}}}}]},"spec":{"image":{"name":"flink:1.8.2","pullPolicy":"Always"},"jobManager":{"replicas":1,"accessScope":"Cluster","ports":{"rpc":6123,"blob":6124,"query":6125,"ui":8081},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapRatio":25,"memoryOffHeapMin":"600M","securityContext":{"runAsUser":9999,"runAsGroup":9999}},"taskManager":{"replicas":1,"ports":{"data":6121,"rpc":6122,"query":6125},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapRatio":25,"memoryOffHeapMin":"600M","volumes":[{"name":"cache-volume","emptyDir":{}}],"volumeMounts":[{"name":"cache-volume","mountPath":"/cache"}],"sidecars":[{"name":"sidecar","image":"alpine","command":["sleep","10000"],"resources":{}}]},"envVars":[{"name":"FOO","value":"bar"}],"flinkProperties":{"taskmanager.numberOfTaskSlots":"1"},"recreateOnUpdate":true},"status":{"state":"","components":{"configMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}}}
2021-07-22T06:07:36.829Z	INFO	controllers.FlinkCluster	Observed controllerRevisions	{"cluster": "default/flinksessioncluster-sample", "controllerRevisions": "[{name: flinksessioncluster-sample-84fdb95d89, revision: 1},]"}
2021-07-22T06:07:37.568Z	INFO	controllers.FlinkCluster	Observed configMap	{"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.568Z	INFO	controllers.FlinkCluster	Deployment not found	{"cluster": "default/flinksessioncluster-sample", "component": "JobManager"}
2021-07-22T06:07:37.568Z	INFO	controllers.FlinkCluster	Observed JobManager StatefulSet	{"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.568Z	INFO	controllers.FlinkCluster	Observed JobManager service	{"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.708Z	INFO	controllers.FlinkCluster	Observed JobManager ingress	{"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.708Z	INFO	controllers.FlinkCluster	Deployment not found	{"cluster": "default/flinksessioncluster-sample", "component": "TaskManager"}
2021-07-22T06:07:37.708Z	INFO	controllers.FlinkCluster	Observed TaskManager StatefulSet	{"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.710Z	INFO	controllers.FlinkCluster	---------- 2. Update cluster status ----------	{"cluster": "default/flinksessioncluster-sample"}
2021-07-22T06:07:37.711Z	INFO	controllers.FlinkCluster	Cluster state changed	{"cluster": "default/flinksessioncluster-sample", "current": "", "new": "Creating"}
2021-07-22T06:07:37.728Z	INFO	controllers.FlinkCluster	FlinkCluster revision status changed	{"cluster": "default/flinksessioncluster-sample", "current": "currentRevision: , nextRevision: , collisionCount: <nil>", "new": "currentRevision: flinksessioncluster-sample-84fdb95d89-1, nextRevision: flinksessioncluster-sample-84fdb95d89-1, collisionCount: <nil>"}
2021-07-22T06:07:37.728Z	INFO	controllers.FlinkCluster	Status changed	{"cluster": "default/flinksessioncluster-sample", "old": {"state":"","components":{"configMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}, "new": {"state":"Creating","components":{"configMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}},"currentRevision":"flinksessioncluster-sample-84fdb95d89-1","nextRevision":"flinksessioncluster-sample-84fdb95d89-1"}}
2021-07-22T06:07:37.729Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"FlinkCluster","namespace":"default","name":"flinksessioncluster-sample","uid":"951f98d1-5943-44a1-ba19-f399a1d643ba","apiVersion":"flinkoperator.k8s.io/v1beta1","resourceVersion":"173782236"}, "reason": "StatusUpdate", "message": "Cluster status: Creating"}
2021-07-22T06:07:37.788Z	ERROR	controllers.FlinkCluster	Failed to update cluster status	{"cluster": "default/flinksessioncluster-sample", "error": "FlinkCluster.flinkoperator.k8s.io \"flinksessioncluster-sample\" is invalid: [status.components.jobManagerDeployment: Required value, status.components.taskManagerDeployment: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
	/root/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterHandler).reconcile
	/workspace/controllers/flinkcluster_controller.go:162
github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterReconciler).Reconcile
	/workspace/controllers/flinkcluster_controller.go:82
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90
2021-07-22T06:07:37.788Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "flinkcluster", "request": "default/flinksessioncluster-sample", "error": "FlinkCluster.flinkoperator.k8s.io \"flinksessioncluster-sample\" is invalid: [status.components.jobManagerDeployment: Required value, status.components.taskManagerDeployment: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
	/root/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
	/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90

$ kubectl describe flinkcluster flinksessioncluster-sample -n flink-operator-system
Name:         flinksessioncluster-sample
Namespace:    flink-operator-system
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"flinkoperator.k8s.io/v1beta1","kind":"FlinkCluster","metadata":{"annotations":{},"name":"flinksessioncluster-sample","names...
API Version:  flinkoperator.k8s.io/v1beta1
Kind:         FlinkCluster
Metadata:
  Creation Timestamp:  2021-07-22T06:09:20Z
  Generation:          1
  Managed Fields:
    API Version:  flinkoperator.k8s.io/v1beta1
    Fields Type:  FieldsV1
    Fields V 1:
      F : Metadata:
        F : Annotations:
          .:
          F : Kubectl . Kubernetes . Io / Last - Applied - Configuration:
      F : Spec:
        .:
        F : Env Vars:
        F : Flink Properties:
          .:
          F : Taskmanager . Number Of Task Slots:
        F : Image:
          .:
          F : Name:
          F : Pull Policy:
        F : Job Manager:
          .:
          F : Access Scope:
          F : Ports:
            .:
            F : Ui:
          F : Resources:
            .:
            F : Limits:
              .:
              F : Cpu:
              F : Memory:
          F : Security Context:
            .:
            F : Run As Group:
            F : Run As User:
        F : Task Manager:
          .:
          F : Replicas:
          F : Resources:
            .:
            F : Limits:
              .:
              F : Cpu:
              F : Memory:
          F : Sidecars:
          F : Volume Mounts:
          F : Volumes:
    Manager:         kubectl
    Operation:       Update
    Time:            2021-07-22T06:09:20Z
  Resource Version:  175069579
  Self Link:         /apis/flinkoperator.k8s.io/v1beta1/namespaces/flink-operator-system/flinkclusters/flinksessioncluster-sample
  UID:               18bd7904-a582-4433-a731-23d37813b1fd
Spec:
  Env Vars:
    Name:   FOO
    Value:  bar
  Flink Properties:
    Taskmanager . Number Of Task Slots:  1
  Image:
    Name:         flink:1.8.2
    Pull Policy:  Always
  Job Manager:
    Access Scope:           Cluster
    Memory Off Heap Min:    600M
    Memory Off Heap Ratio:  25
    Ports:
      Blob:    6124
      Query:   6125
      Rpc:     6123
      Ui:      8081
    Replicas:  1
    Resources:
      Limits:
        Cpu:     200m
        Memory:  1Gi
    Security Context:
      Run As Group:    9999
      Run As User:     9999
  Recreate On Update:  true
  Task Manager:
    Memory Off Heap Min:    600M
    Memory Off Heap Ratio:  25
    Ports:
      Data:    6121
      Query:   6125
      Rpc:     6122
    Replicas:  1
    Resources:
      Limits:
        Cpu:     200m
        Memory:  1Gi
    Sidecars:
      Command:
        sleep
        10000
      Image:  alpine
      Name:   sidecar
      Resources:
    Volume Mounts:
      Mount Path:  /cache
      Name:        cache-volume
    Volumes:
      Empty Dir:
      Name:  cache-volume
Events:
  Type    Reason        Age                   From           Message
  ----    ------        ----                  ----           -------
  Normal  StatusUpdate  48s (x16 over 3m39s)  FlinkOperator  Cluster status: Creating

@sumchak1
Copy link
Author

sumchak1 commented Jul 22, 2021

I am using https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/config/samples/flinkoperator_v1beta1_flinksessioncluster.yaml to create the flinksessioncluster but it says taskmanager and jobmanager deployment not found .

state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}}}
2021-07-22T06:09:29.162Z	INFO	controllers.FlinkCluster	Observed controllerRevisions	{"cluster": "flink-operator-system/flinksessioncluster-sample", "controllerRevisions": "[{name: flin
ksessioncluster-sample-84fdb95d89, revision: 1},]"}
2021-07-22T06:09:29.162Z	INFO	controllers.FlinkCluster	Observed configMap	{"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z	INFO	controllers.FlinkCluster	Deployment not found	{"cluster": "flink-operator-system/flinksessioncluster-sample", "component": "JobManager"}
2021-07-22T06:09:29.162Z	INFO	controllers.FlinkCluster	Observed JobManager StatefulSet	{"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z	INFO	controllers.FlinkCluster	Observed JobManager service	{"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z	INFO	controllers.FlinkCluster	Observed JobManager ingress	{"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z	INFO	controllers.FlinkCluster	Deployment not found	{"cluster": "flink-operator-system/flinksessioncluster-sample", "component": "TaskManager"}
2021-07-22T06:09:29.163Z	INFO	controllers.FlinkCluster	Observed TaskManager StatefulSet	{"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.164Z	INFO	controllers.FlinkCluster	---------- 2. Update cluster status ----------	{"cluster": "flink-operator-system/flinksessioncluster-sample"}
2021-07-22T06:09:29.164Z	INFO	controllers.FlinkCluster	Cluster state changed	{"cluster": "flink-operator-system/flinksessioncluster-sample", "current": "", "new": "Creating"}
2021-07-22T06:09:29.164Z	INFO	controllers.FlinkCluster	FlinkCluster revision status changed	{"cluster": "flink-operator-system/flinksessioncluster-sample", "current": "currentRevision:
 , nextRevision: , collisionCount: <nil>", "new": "currentRevision: flinksessioncluster-sample-84fdb95d89-1, nextRevision: flinksessioncluster-sample-84fdb95d89-1, collisionCount: <nil>"}
2021-07-22T06:09:29.164Z	INFO	controllers.FlinkCluster	Status changed	{"cluster": "flink-operator-system/flinksessioncluster-sample", "old": {"state":"","components":{"configMap":{"name"
:"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}, "new": {"state":"Creating","components":{"confi
gMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}},"currentRevision":"flinksessionclust
er-sample-84fdb95d89-1","nextRevision":"flinksessioncluster-sample-84fdb95d89-1"}}
2021-07-22T06:09:29.165Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"FlinkCluster","namespace":"flink-operator-system","name":"flinksessioncluster-sample","uid":"18b
d7904-a582-4433-a731-23d37813b1fd","apiVersion":"flinkoperator.k8s.io/v1beta1","resourceVersion":"175069579"}, "reason": "StatusUpdate", "message": "Cluster status: Creating"}
2021-07-22T06:09:29.197Z	ERROR	controllers.FlinkCluster	Failed to update cluster status	{"cluster": "flink-operator-system/flinksessioncluster-sample", "error": "FlinkCluster.flinkoperator
.k8s.io \"flinksessioncluster-sample\" is invalid: [status.components.jobManagerDeployment: Required value, status.components.taskManagerDeployment: Required value]"}

@mishra157
Copy link

@yan234280533 is there any update on this ?

@mishra157
Copy link

@yan234280533 is there any update on this ?

@sv3ndk
Copy link

sv3ndk commented Aug 11, 2021

HI @mishra157, I think the community is moving to the https://github.com/spotify/flink-on-k8s-operator/ fork now (see this discussion: spotify/flink-on-k8s-operator#82) , you probably have a better chance trying with that version and, if the bug is still present, report the issue over there

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants