Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devworkspace failed to progress past phase 'Starting' for longer than timeout (5m) #21380

Closed
Divine1 opened this issue May 8, 2022 · 14 comments
Labels
kind/question Questions that haven't been identified as being feature requests or bugs. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. severity/P2 Has a minor but important impact to the usage or development of the system.

Comments

@Divine1
Copy link

Divine1 commented May 8, 2022

Summary

i'm receiving below error while the workspace gets created. please let me know how to increase the timeout
devworkspace failed to progress past phase 'Starting' for longer than timeout (5m)

i'm using chectl version chectl/0.0.20220422-next.08c2079 linux-x64 node-v16.13.2

as of now i have increased the timeout limit for below properties
chectl server:start --k8spoddownloadimagetimeout=1000000 --k8spoderrorrechecktimeout=1000000 --k8spodreadytimeout=1000000 --k8spodwaittimeout=1000000

image

Also, i want to view the logs that gets recorded while the workspace is starting. Please let me know how to view that as well

i'm running the eclipse-che instance in minikube cluster created using minikube start --driver=docker --memory=15000 --cpus=4 --ports="30670,30700,30710,30720,80:80,443:443,31728:31728"

i deployed eclipse-che using chectl server:deploy --platform minikube --installer=operator --debug

Relevant information

No response

@Divine1 Divine1 added the kind/question Questions that haven't been identified as being feature requests or bugs. label May 8, 2022
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label May 8, 2022
@Divine1
Copy link
Author

Divine1 commented May 8, 2022

when i use the devfile.yaml to create a new workspace, i get below error, but i'm unable to find out if ImagePullBackOff is the real issue or if there is someother issue going on.

i did docker pull selenium/standalone-chrome:4.1.2-20220217 in the same machine , the command completed successfully

i'm not sure how to view the additional logs recorded for this scenario. Please let me know how to view logs.

image

below is the devfile.yaml which i used

schemaVersion: 2.1.0
metadata:
  name: cbfsel-repo
projects:
  - name: cbfsel-project
    git:
      checkoutFrom:
        revision: master
      remotes:
        origin: https://gitlab.myorganization.com/myusername/CBF_SEL.git
components:
  - container:
      image: 'quay.io/devfile/universal-developer-image:ubi8-0e189d9'
      memoryLimit: 2G
      endpoints:
        - exposure: none
          name: debug
          protocol: tcp
          targetPort: 5005
        - exposure: public
          name: 8080-http
          protocol: http
          targetPort: 8080
    name: javacontainer
  - container:
      image: 'selenium/standalone-chrome:4.1.2-20220217'
      memoryLimit: 2G
      endpoints:
        - exposure: public
          name: 4444-tcp
          protocol: tcp
          targetPort: 4444
        - exposure: public
          name: 5900-tcp
          protocol: tcp
          targetPort: 5900
        - exposure: public
          name: 7900-http
          protocol: http
          targetPort: 7900
          secure: true
    name: seleniumcontainer
commands:
  - exec:
      commandLine: mvn clean package -DskipTests
      component: javacontainer
      group:
        isDefault: true
        kind: build
      label: 'build project using maven'
      workingDir: '${PROJECT_SOURCE}'
    id: mvnpackage

@tolusha
Copy link
Contributor

tolusha commented May 9, 2022

@Divine1
Set the following field in a CheCluster CR to increase timeout

spec:
  server:
    customCheProperties:
      CHE_INFRA_KUBERNETES_WORKSPACE_START_TIMEOUT_MIN: '10'

@Divine1
Copy link
Author

Divine1 commented May 9, 2022

@tolusha thank you for the details

i was able to fix the selenium container issue posted above. i monitored the workspace deployment,replicaset,pod creation logs using kubectl describe .... command and i was able to identify the issue. i have highlighted the issue i was facing in below screenshot..

i faced the issue due to limited access to selenium docker repository, i was able to use a local repository and add the imagePath to devfile.yaml. this fixed my issue.
image

thanks a lot.

@Divine1 Divine1 closed this as completed May 9, 2022
@Divine1 Divine1 reopened this Jul 8, 2022
@Divine1
Copy link
Author

Divine1 commented Jul 8, 2022

@tolusha
i'm still receiving below error even after adding CHE_INFRA_KUBERNETES_WORKSPACE__START__TIMEOUT__MIN : "15"

the timelimit didnot increase to 15, it still shows 5
image

below is my checluster configuration

apiVersion: org.eclipse.che/v2
kind: CheCluster
metadata:
  name: eclipse-che
spec:
  components:
    cheServer:
      extraProperties:
        CHE_INFRA_KUBERNETES_WORKSPACE__START__TIMEOUT__MIN: "15"
    database:
      externalDb: true
      postgresHostName: sc2-10-186-67-195.eng.vmware.com
      postgresPort: "5432"
  networking:
    auth:
      identityProviderURL: https://dexvmware.com
      oAuthClientName: eclipse-che
      oAuthSecret: sdsdsdsdsd

kubectl get checluster eclipse-che -n eclipse-che -o yaml

apiVersion: org.eclipse.che/v2
kind: CheCluster
metadata:
  creationTimestamp: "2022-07-07T11:59:46Z"
  finalizers:
  - checluster.che.eclipse.org
  - cheGateway.clusterpermissions.finalizers.che.eclipse.org
  - cheWorkspaces.clusterpermissions.finalizers.che.eclipse.org
  - namespaces-editor.permissions.finalizers.che.eclipse.org
  - devWorkspace.permissions.finalizers.che.eclipse.org
  - dashboard.clusterpermissions.finalizers.che.eclipse.org
  generation: 2
  name: eclipse-che
  namespace: eclipse-che
  resourceVersion: "9606800"
  uid: a022c561-9a89-4b23-aa2f-d5cb2a58222c
spec:
  components:
    cheServer:
      debug: true
      extraProperties:
        CHE_INFRA_KUBERNETES_WORKSPACE__START__TIMEOUT__MIN: "15"
      logLevel: INFO
    dashboard: {}
    database:
      credentialsSecretName: postgres-credentials
      externalDb: true
      postgresDb: dbche
      postgresHostName: sc2-10-186-67-195.eng.vmware.com
      postgresPort: "5432"
      pvc:
        claimSize: 1Gi
    devWorkspace: {}
    devfileRegistry: {}
    imagePuller:
      enable: false
      spec: {}
    metrics:
      enable: true
    pluginRegistry: {}
  containerRegistry: {}
  devEnvironments:
    defaultNamespace:
      template: <username>-che
    storage:
      pvcStrategy: common
  networking:
    auth:
      gateway:
        configLabels:
          app: che
          component: che-gateway-config
      identityProviderURL: https://dex-dchelladurai-chejune15.calatrava.vmware.com
      oAuthClientName: eclipse-che
      oAuthSecret: ZXhhbXBsZS1hcHAtc2VjcmV0
    domain: eclipseche-dchelladurai-chejune15.calatrava.vmware.com
    tlsSecretName: che-tls
status:
  chePhase: Active
  cheURL: https://eclipseche-dchelladurai-chejune15.calatrava.vmware.com
  cheVersion: next
  devfileRegistryURL: https://eclipseche-dchelladurai-chejune15.calatrava.vmware.com/devfile-registry
  gatewayPhase: Established
  pluginRegistryURL: https://eclipseche-dchelladurai-chejune15.calatrava.vmware.com/plugin-registry/v3
  workspaceBaseDomain: eclipseche-dchelladurai-chejune15.calatrava.vmware.com

@tolusha
Copy link
Contributor

tolusha commented Jul 8, 2022

@amisevsk

CHE_INFRA_KUBERNETES_WORKSPACE__START__TIMEOUT__MIN
Do we have alternative configuration in devworkspace world?

@tolusha
Copy link
Contributor

tolusha commented Jul 8, 2022

@Divine1
I think timeout happens because of pulling images.
Could you consider using k8s image puller for faster workspace startup?
https://github.com/che-incubator/kubernetes-image-puller-operator

You can enable it in a CheCluster CR:

spec:
  components:
    imagePuller:
      enable: true

@Divine1
Copy link
Author

Divine1 commented Jul 8, 2022

@tolusha
thank you for this detail.
could you please let me know how this will help my scenario? i have couple of docker images in my internalOrganization's docker registry.

i found below details in the link, it says kubernetes-image-puller-operator will pull list of images --- what images are these?
image

@tolusha
Copy link
Contributor

tolusha commented Jul 8, 2022

You can configure the list of images you would like to prepull.

spec:
  components:
    imagePuller:
      enable: true
      spec:
        images: <name_1>=<image_1>;<name_2>=<image_2>

@Divine1
Copy link
Author

Divine1 commented Jul 8, 2022

@tolusha i have already installed the eclipse-che in my k8s cluster

if i edit and update my checluster in eclipse-che namespace, will that be sufficient?
kubectl get checluster -n eclipse-che

@tolusha
Copy link
Contributor

tolusha commented Jul 8, 2022

@Divine1
That's a good question. As far I as remember, if you enable image puller via CheCluster CR, then che-operator will try to deploy image puller via OLM.
To deploy it on k8s cluster without olm, pls follow instruction https://github.com/che-incubator/kubernetes-image-puller-operator#installing-the-operator

btw. thank you to @dmytro-ndp, there is a way how to configure workspace startup timeout
devfile/devworkspace-operator#605

@amisevsk
Copy link
Contributor

amisevsk commented Jul 8, 2022

To configure the start timeout in DevWorkspace Operator, you can use the .config.workspace.progressTimeout field in the DevWorkspaceOperatorConfig:

apiVersion: controller.devfile.io/v1alpha1
kind: DevWorkspaceOperatorConfig
metadata:
  name: devworkspace-operator-config
  namespace: <devworkspace/che installation namespace>
config:
  workspace:
    progressTimeout: 15m # This would set it to 15 minutes, for example

If a DevWorkspaceOperatorConfig named devworkspace-operator-config already exists, you'll likely want to leave other fields in the .config unchanged.

@amisevsk amisevsk added severity/P2 Has a minor but important impact to the usage or development of the system. and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Jul 8, 2022
@Divine1
Copy link
Author

Divine1 commented Jul 9, 2022

@amisevsk

DevWorkspaceOperatorConfig does not exist in eclipse-che namespace and devworkspace-controller namespace.

in which namespace should i create DevWorkspaceOperatorConfig component?

@tolusha
Copy link
Contributor

tolusha commented Jul 10, 2022

@Divine1
Pls try in devworkspace-controller namespace

@che-bot
Copy link
Contributor

che-bot commented Jan 6, 2023

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2023
@che-bot che-bot closed this as completed Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Questions that haven't been identified as being feature requests or bugs. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. severity/P2 Has a minor but important impact to the usage or development of the system.
Projects
None yet
Development

No branches or pull requests

4 participants