Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Che workspaces are not idled #15900

Closed
4 tasks done
alexeykazakov opened this issue Jan 31, 2020 · 9 comments
Closed
4 tasks done

Che workspaces are not idled #15900

alexeykazakov opened this issue Jan 31, 2020 · 9 comments
Labels
area/che-operator Issues and PRs related to Eclipse Che Kubernetes Operator area/security kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.

Comments

@alexeykazakov
Copy link

Describe the bug

I upgraded my Che operator to 7.7.1 and now I see that Che workspace pods are not deleted after timeout.

Che version

Che Operator 7.7.1

Runtime

  • Openshift 4.2.16

Installation method

  • che-operator 7.7.1

Environment

  • Cloud
    • Amazon

Steps to reproduce

  1. Start a workspace.

  2. Close the browser. Do nothing for 30+ minutes

  3. Go to dashboard. The ws is stopped there:
    screenshot-che-toolchain-che apps member crt-stage com-2020 01 (1)

  4. oc get pods // <-workspace pods are still running!
    screenshot-console-openshift-console apps member crt-stage com-2020 01 (1)

  5. Try to start the workspace in Dashboard. It fails with the error:
    screenshot-che-toolchain-che apps member crt-stage com-2020 01 (2)

  6. oc get pods // no workspace pods anymore!
    screenshot-console-openshift-console apps member crt-stage com-2020 01 (2)

  7. Try to start the workspace in Dashboard again. It works this time!
    Repeat from the step 2. All reproducible again.

In the che server logs I see a permission error which seems to be the cause of that issue. Activity-checker doesn't have permissions:

2020-01-30 23:47:51,521[aceSharedPool-1]  [ERROR] [o.e.c.a.w.s.WorkspaceRuntimes 966]   - Error occurred during stopping of runtime 'workspace3d4zr7teeh2cd0dy:null:f0caa6cf-8a56-49ce-951b-2095f1f531ad' by user 'activity-checker'. Error: Error(s) occurs while cleaning up the namespace. Failure executing: GET at: https://172.30.0.1/apis/route.openshift.io/v1/namespaces/alexeykazakov-code/routes?labelSelector=che.workspace_id%3Dworkspace3d4zr7teeh2cd0dy. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. routes.route.openshift.io is forbidden: User "system:serviceaccount:toolchain-che:che" cannot list resource "routes" in API group "route.openshift.io" in the namespace "alexeykazakov-code". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/alexeykazakov-code/services?labelSelector=che.workspace_id%3Dworkspace3d4zr7teeh2cd0dy. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. services is forbidden: User "system:serviceaccount:toolchain-che:che" cannot list resource "services" in API group "" in the namespace "alexeykazakov-code". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/apis/apps/v1/namespaces/alexeykazakov-code/deployments?labelSelector=che.workspace_id%3Dworkspace3d4zr7teeh2cd0dy. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. deployments.apps is forbidden: User "system:serviceaccount:toolchain-che:che" cannot list resource "deployments" in API group "apps" in the namespace "alexeykazakov-code". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/alexeykazakov-code/secrets?labelSelector=che.workspace_id%3Dworkspace3d4zr7teeh2cd0dy. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. secrets is forbidden: User "system:serviceaccount:toolchain-che:che" cannot list resource "secrets" in API group "" in the namespace "alexeykazakov-code". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/alexeykazakov-code/configmaps?labelSelector=che.workspace_id%3Dworkspace3d4zr7teeh2cd0dy. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. configmaps is forbidden: User "system:serviceaccount:toolchain-che:che" cannot list resource "configmaps" in API group "" in the namespace "alexeykazakov-code". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password.
org.eclipse.che.api.workspace.server.spi.InfrastructureException: Error(s) occurs while cleaning up the namespace. Failure executing: GET at: https://172.30.0.1/apis/route.openshift.io/v1/namespaces/alexeykazakov-code/routes?labelSelector=che.workspace_id%3Dworkspace3d4zr7teeh2cd0dy. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. routes.route.openshift.io is forbidden: User "system:serviceaccount:toolchain-che:che" cannot list resource "routes" in API group "route.openshift.io" in the namespace "alexeykazakov-code". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/alexeykazakov-code/services?labelSelector=che.workspace_id%3Dworkspace3d4zr7teeh2cd0dy. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. services is forbidden: User "system:serviceaccount:toolchain-che:che" cannot list resource "services" in API group "" in the namespace "alexeykazakov-code". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/apis/apps/v1/namespaces/alexeykazakov-code/deployments?labelSelector=che.workspace_id%3Dworkspace3d4zr7teeh2cd0dy. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. deployments.apps is forbidden: User "system:serviceaccount:toolchain-che:che" cannot list resource "deployments" in API group "apps" in the namespace "alexeykazakov-code". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/alexeykazakov-code/secrets?labelSelector=che.workspace_id%3Dworkspace3d4zr7teeh2cd0dy. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. secrets is forbidden: User "system:serviceaccount:toolchain-che:che" cannot list resource "secrets" in API group "" in the namespace "alexeykazakov-code". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password. Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/alexeykazakov-code/configmaps?labelSelector=che.workspace_id%3Dworkspace3d4zr7teeh2cd0dy. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. configmaps is forbidden: User "system:serviceaccount:toolchain-che:che" cannot list resource "configmaps" in API group "" in the namespace "alexeykazakov-code". The error may be caused by an expired token or changed password. Update Che server deployment with a new token or password.
	at org.eclipse.che.workspace.infrastructure.kubernetes.namespace.KubernetesNamespace.doRemove(KubernetesNamespace.java:270)
	at org.eclipse.che.workspace.infrastructure.openshift.project.OpenShiftProject.cleanUp(OpenShiftProject.java:155)
	at org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInternalRuntime.internalStop(KubernetesInternalRuntime.java:573)
	at org.eclipse.che.api.workspace.server.spi.InternalRuntime.stop(InternalRuntime.java:177)
	at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StopRuntimeTask.run(WorkspaceRuntimes.java:936)
	at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalRunnable.run(CopyThreadLocalRunnable.java:38)
	at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Initially we installed Che 7.5 or something I think couple of months ago.. Then we kept upgrading it when a new version is available. First time I noticed workspace idling issue in 7.7.0 (workspaces didn't idle at all even in Dashboard) and now with 7.7.1 upgrade I faced this issue.

@alexeykazakov alexeykazakov added the kind/bug Outline of a bug - must adhere to the bug report template. label Jan 31, 2020
@alexeykazakov
Copy link
Author

cc: @ibuziuk

@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Jan 31, 2020
@skabashnyuk
Copy link
Contributor

@alexeykazakov could you also provide CheCluster.yaml and/or the command how did you install Che?

@ibuziuk ibuziuk added severity/P1 Has a major impact to usage or development of the system. team/platform area/che-operator Issues and PRs related to Eclipse Che Kubernetes Operator and removed severity/P1 Has a major impact to usage or development of the system. status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Feb 3, 2020
@skabashnyuk
Copy link
Contributor

I think if @alexeykazakov 's setup uses OpenShift OAuth it might be the case #15906 (comment)

@alexeykazakov
Copy link
Author

Yes, we are using OpenShift OAuth:

apiVersion: org.eclipse.che/v1
kind: CheCluster
spec:
  auth:
    identityProviderURL: 'https://keycloak-toolchain-che.apps.member.crt-stage.com'
    identityProviderRealm: che
    updateAdminPassword: false
    identityProviderPostgresPassword: ...
    oAuthSecret: ...
    identityProviderPassword: ...
    oAuthClientName: eclipse-che-openshift-identity-provider-lmgapz
    identityProviderClientId: che-public
    identityProviderAdminUserName: admin
    externalIdentityProvider: false
    openShiftoAuth: true
  database:
    chePostgresDb: dbche
    chePostgresHostName: postgres
    chePostgresPassword: ...
    chePostgresPort: '5432'
    chePostgresUser: pgche
    externalDb: false
  k8s: {}
  metrics:
    enable: false
  server:
    cheLogLevel: INFO
    customCheProperties:
      CHE_INFRA_KUBERNETES_NAMESPACE_DEFAULT: <username>-code
    externalDevfileRegistry: false
    cheHost: che-toolchain-che.apps.member.crt-stage.com
    selfSignedCert: false
    cheDebug: 'false'
    tlsSupport: true
    allowUserDefinedWorkspaceNamespaces: false
    externalPluginRegistry: false
    cheFlavor: che
  storage:
    preCreateSubPaths: true
    pvcClaimSize: 1Gi
    pvcStrategy: per-workspace
status:
  devfileRegistryURL: 'https://devfile-registry-toolchain-che.apps.member.crt-stage.com'
  keycloakProvisioned: true
  cheClusterRunning: Available
  cheURL: 'https://che-toolchain-che.apps.member.crt-stage.com'
  openShiftoAuthProvisioned: true
  dbProvisioned: true
  cheVersion: 7.7.1
  keycloakURL: 'https://keycloak-toolchain-che.apps.member.crt-stage.com'
  pluginRegistryURL: 'https://plugin-registry-toolchain-che.apps.member.crt-stage.com/v3'

@l0rd
Copy link
Contributor

l0rd commented Feb 25, 2020

Assigning to hosted-che team after internal discussion (the original code was written ~2 years ago by hosted che team).

@ibuziuk
Copy link
Member

ibuziuk commented Feb 25, 2020

@alexeykazakov could you confirm that the issue is reproducible not from the very beginning, but after ~ 24h? Our current assumption is that once the OpenShift token is expired (24h by default) the idling stops working. This is the main reason why the issue was not spotted by QA (we simply do not test long-running deployments)

@alexeykazakov
Copy link
Author

@ibuziuk where do you store the token?
I just upgraded Che Operator to 7.9.0 so there was a new Che server deployment and the bug is still reproducible. But the postrges pod is old.

@l0rd
Copy link
Contributor

l0rd commented Feb 25, 2020

@alexeykazakov we should store the tokens in memory. They are not persisted.
@ibuziuk I had a chat with @rhopp this morning and idling is tested but with OAuth disabled.

@nickboldt nickboldt mentioned this issue Feb 25, 2020
24 tasks
@nickboldt nickboldt mentioned this issue Mar 4, 2020
27 tasks
@ibuziuk ibuziuk added this to the Backlog - Hosted Che milestone Mar 5, 2020
@ibuziuk
Copy link
Member

ibuziuk commented Apr 16, 2020

@alexeykazakov I will close this in favor of #15906
It should be ready in 7.12.0

@ibuziuk ibuziuk closed this as completed Apr 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/che-operator Issues and PRs related to Eclipse Che Kubernetes Operator area/security kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

5 participants