Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Che operator timing out when creating CheCluster in OpenShift #20487

Closed
jawnsy opened this issue Sep 18, 2021 · 6 comments
Closed

Che operator timing out when creating CheCluster in OpenShift #20487

jawnsy opened this issue Sep 18, 2021 · 6 comments
Labels
area/che-operator Issues and PRs related to Eclipse Che Kubernetes Operator kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. severity/P2 Has a minor but important impact to the usage or development of the system.

Comments

@jawnsy
Copy link

jawnsy commented Sep 18, 2021

Describe the bug

After creating a CheCluster, the controller seems to hang indefinitely. The che-operator container reports the following in its logs:

time="2021-09-18T12:15:10Z" level=info msg="Running exec for 'create Keycloak DB, user, privileges' in the pod 'postgres-7f797d9448-vrx8m'"
time="2021-09-18T12:15:10Z" level=error msg="Error running exec: Internal error occurred: failed calling webhook \"validate-exec.devworkspace-controller.svc\": Post \"https://devworkspace-webhookserver.devworkspace-controller.svc:443/validate?timeout=10s\": service \"devworkspace-webhookserver\" not found, command: [/bin/bash -c OUT=$(psql postgres -tAc \"SELECT 1 FROM pg_roles WHERE rolname='keycloak'\"); if [ $OUT -eq 1 ]; then echo \"DB exists\"; exit 0; fi && psql -c \"CREATE USER keycloak WITH PASSWORD 'oEuaiDBmcqxM'\" && psql -c \"CREATE DATABASE keycloak\" && psql -c \"GRANT ALL PRIVILEGES ON DATABASE keycloak TO keycloak\" && psql -c \"ALTER USER ${POSTGRESQL_USER} WITH SUPERUSER\"]"
time="2021-09-18T12:15:10Z" level=error msg="Internal error occurred: failed calling webhook \"validate-exec.devworkspace-controller.svc\": Post \"https://devworkspace-webhookserver.devworkspace-controller.svc:443/validate?timeout=10s\": service \"devworkspace-webhookserver\" not found"
{"level":"error","ts":1631967310.996203,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","name":"eclipse-che","namespace":"eclipse-che","error":"Internal error occurred: failed calling webhook \"validate-exec.devworkspace-controller.svc\": Post \"https://devworkspace-webhookserver.devworkspace-controller.svc:443/validate?timeout=10s\": service \"devworkspace-webhookserver\" not found","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/che-operator/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/che-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/che-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller...
$ oc version
Client Version: 4.7.0-0.okd-2021-06-19-191547
Server Version: 4.7.0-0.okd-2021-08-22-163618
Kubernetes Version: v1.20.0-1093+4593a24e8fd58d-dirty

Che version

7.36@latest

Steps to reproduce

  1. Install the Che Operator from OperatorHub (version 7.36.1)
  2. Create a CheCluster with default settings

Expected behavior

Following the installation instructions, it seems creating a Che operator with default settings should be OK.

Runtime

OpenShift

Screenshots

No response

Installation method

OperatorHub

Environment

Linux

Eclipse Che Logs

chectl-logs.tar.gz

Additional context

No response

@jawnsy jawnsy added the kind/bug Outline of a bug - must adhere to the bug report template. label Sep 18, 2021
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Sep 18, 2021
@tolusha
Copy link
Contributor

tolusha commented Sep 20, 2021

Possible duplicates #19243
@jawnsy Did you enable devworkspace in a checluster?

/cc @sleshchenko

@jawnsy
Copy link
Author

jawnsy commented Sep 20, 2021

@tolusha Thanks for the quick reply! I was following the instructions in the documentation, which mention using the default settings, so I didn't enable the DevWorkspace stuff. I'm pretty new to Che, so I apologize if this was documented somewhere else.

I did try again using the DevWorkspace operator and image puller, but things still never install.

Here are my CheCluster settings:

apiVersion: org.eclipse.che/v1
kind: CheCluster
metadata:
  name: eclipse-che
  namespace: eclipse-che
spec:
  auth:
    identityProviderURL: ''
    identityProviderRealm: ''
    oAuthSecret: ''
    identityProviderPassword: ''
    oAuthClientName: ''
    initialOpenShiftOAuthUser: true
    identityProviderClientId: ''
    identityProviderAdminUserName: ''
    externalIdentityProvider: false
    openShiftoAuth: true
  database:
    chePostgresDb: ''
    chePostgresHostName: ''
    chePostgresPassword: ''
    chePostgresPort: ''
    chePostgresUser: ''
    externalDb: false
  devWorkspace:
    enable: true
  imagePuller:
    enable: true
  metrics:
    enable: true
  server:
    proxyURL: ''
    cheClusterRoles: ''
    proxyPassword: ''
    nonProxyHosts: ''
    proxyPort: ''
    tlsSupport: true
    allowUserDefinedWorkspaceNamespaces: false
    serverTrustStoreConfigMapName: ''
    proxyUser: ''
    cheWorkspaceClusterRole: ''
    workspaceNamespaceDefault: <username>-che
    serverExposureStrategy: ''
    gitSelfSignedCert: false
    cheFlavor: ''
  storage:
    postgresPVCStorageClassName: ''
    preCreateSubPaths: true
    pvcClaimSize: 10Gi
    pvcStrategy: common
    workspacePVCStorageClassName: ''

After enabling the devworkspace thing, the che-operator instead seems to go into CrashLoopBackOff:

      state:
        waiting:
          reason: CrashLoopBackOff
          message: >-
            back-off 5m0s restarting failed container=che-operator
            pod=che-operator-77f76d4cdb-sv9lt_eclipse-che(b7dcbe8a-12bf-428a-81ea-f7d331c3d649)

The logs for that pod don't show much:

{"level":"info","ts":1632141423.2957177,"msg":"Binary info ","Go version":"go1.15.14"}
{"level":"info","ts":1632141423.2957582,"msg":"Binary info ","OS":"linux","Arch":"amd64"}
{"level":"info","ts":1632141423.295763,"msg":"Address ","Metrics":":60000"}
{"level":"info","ts":1632141423.2957666,"msg":"Address ","Probe":":6789"}
{"level":"info","ts":1632141423.2957697,"msg":"Operator is running on ","Infrastructure":"OpenShift v4.x"}
I0920 12:37:04.348632       1 request.go:655] Throttling request took 1.046983251s, request: GET:https://172.30.0.1:443/apis/scheduling.k8s.io/v1?timeout=32s
{"level":"info","ts":1632141427.4579651,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":60000"}
time="2021-09-20T12:37:11Z" level=info msg="Use 'terminationGracePeriodSeconds' 20 sec. from operator deployment."
{"level":"info","ts":1632141431.6351597,"logger":"setup","msg":"starting manager"}
time="2021-09-20T12:37:11Z" level=info msg="Set up process signal handler"
I0920 12:37:11.635625       1 leaderelection.go:243] attempting to acquire leader lease eclipse-che/e79b08a4.org.eclipse.che...
{"level":"info","ts":1632141431.6356351,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
I0920 12:37:27.969148       1 leaderelection.go:253] successfully acquired lease eclipse-che/e79b08a4.org.eclipse.che
{"level":"info","ts":1632141447.9693384,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141447.9693425,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141447.9694028,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheClusterBackup","controller":"checlusterbackup-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1632141447.9693723,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheClusterRestore","controller":"checlusterrestore-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1632141448.0699866,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141448.0701144,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141448.0701604,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheClusterRestore","controller":"checlusterrestore-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1632141448.0702782,"logger":"controller","msg":"Starting Controller","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheClusterRestore","controller":"checlusterrestore-controller"}
{"level":"info","ts":1632141448.0702825,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141448.0704453,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheClusterBackup","controller":"checlusterbackup-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1632141448.070481,"logger":"controller","msg":"Starting Controller","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheClusterBackup","controller":"checlusterbackup-controller"}
{"level":"info","ts":1632141448.1704342,"logger":"controller","msg":"Starting workers","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheClusterRestore","controller":"checlusterrestore-controller","worker count":1}
{"level":"info","ts":1632141448.1705747,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141448.1708722,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141448.271135,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141450.8724709,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141451.0732284,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141451.5738478,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141451.676314,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}
{"level":"info","ts":1632141453.246946,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"org.eclipse.che","reconcilerKind":"CheCluster","controller":"checluster","source":"kind source: /, Kind="}

@svor svor added status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. area/che-operator Issues and PRs related to Eclipse Che Kubernetes Operator severity/P2 Has a minor but important impact to the usage or development of the system. and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Sep 20, 2021
@tolusha
Copy link
Contributor

tolusha commented Sep 20, 2021

Which channel did you use to install eclipse-che from ? I think it was tech-preview channel.

channel

@jawnsy
Copy link
Author

jawnsy commented Sep 20, 2021

I installed from the stable channel I believe, here's my Subscription

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: eclipse-che
  namespace: eclipse-che
  labels:
    operators.coreos.com/eclipse-che.eclipse-che: ''
spec:
  channel: stable
  installPlanApproval: Automatic
  name: eclipse-che
  source: community-operators
  sourceNamespace: openshift-marketplace
  startingCSV: eclipse-che.v7.36.1
status:
  installplan:
    apiVersion: operators.coreos.com/v1alpha1
    kind: InstallPlan
    name: install-9j75x
    uuid: 458b5123-90ff-4a87-8709-be12cea4d3dd
  lastUpdated: '2021-09-20T12:04:19Z'
  installedCSV: eclipse-che.v7.36.1
  currentCSV: eclipse-che.v7.36.1
  installPlanRef:
    apiVersion: operators.coreos.com/v1alpha1
    kind: InstallPlan
    name: install-9j75x
    namespace: eclipse-che
    resourceVersion: '56385216'
    uid: 458b5123-90ff-4a87-8709-be12cea4d3dd
  state: AtLatestKnown
  catalogHealth:
    - catalogSourceRef:
        apiVersion: operators.coreos.com/v1alpha1
        kind: CatalogSource
        name: community-operators
        namespace: openshift-marketplace
        resourceVersion: '56378334'
        uid: 52a13556-2e10-41fe-8aac-cba654585fbf
      healthy: true
      lastUpdated: '2021-09-20T12:04:05Z'
  conditions:
    - lastTransitionTime: '2021-09-20T12:04:05Z'
      message: all available catalogsources are healthy
      reason: AllCatalogSourcesHealthy
      status: 'False'
      type: CatalogSourcesUnhealthy
  installPlanGeneration: 2

@tolusha tolusha removed the status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. label Sep 21, 2021
@tolusha
Copy link
Contributor

tolusha commented Sep 21, 2021

I wasn't able to reproduce the issue.
Probably it is a matter of sequence of some actions.
So, I recommend you:

  1. Remove DevWorkspace using make uninstall command from DevWorkspace Operator repository
  2. Redeploy Eclipse Che with spec.devworkspace.enable: false (if you use the stable channel)

@che-bot
Copy link
Contributor

che-bot commented Mar 20, 2022

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 20, 2022
@che-bot che-bot closed this as completed Mar 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/che-operator Issues and PRs related to Eclipse Che Kubernetes Operator kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. severity/P2 Has a minor but important impact to the usage or development of the system.
Projects
None yet
Development

No branches or pull requests

4 participants