Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for CustomResource openshift & kubernetes devfile components is not working #22137

Closed
cgruver opened this issue Apr 7, 2023 · 17 comments
Labels
area/devworkspace-operator kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. severity/P1 Has a major impact to usage or development of the system.

Comments

@cgruver
Copy link

cgruver commented Apr 7, 2023

Describe the bug

I am attempting to create a devfile which will deploy a Kafka cluster and Kafka topics in the workspace along with the other workspace components.

Following the documentation at https://devfile.io/docs/2.2.0/adding-a-kubernetes-or-openshift-component yields unsuccessful results.

This feature appears to have been enabled by: devfile/devworkspace-operator#961

However, variations on a devfile to implement it have failed.

Che version

7.63@latest

Steps to reproduce

Example Devfile #1:

With this devfile, the workspace silently excludes all of the included components... No errors are obvious.

schemaVersion: 2.2.0
attributes:
  controller.devfile.io/storage-type: per-workspace
metadata:
  name: che-test-workspace
components:
- name: dev-tools
  container: 
    image: image-registry.openshift-image-registry.svc:5000/eclipse-che-images/quarkus:latest
    memoryRequest: 1Gi
    memoryLimit: 6Gi
    cpuRequest: 500m
    cpuLimit: 2000m
    mountSources: true
    sourceMapping: /projects
    args:
      - '-f'
      - /dev/null
    command:
      - tail
    env:
    - name: SHELL
      value: "/bin/zsh"
    volumeMounts:
    - name: m2
      path: /home/user/.m2
- name: ubi
  container:
    args:
      - '-f'
      - /dev/null
    command:
      - tail
    image: registry.access.redhat.com/ubi9/ubi-minimal
    memoryLimit: 64M
    mountSources: true
    sourceMapping: /projects
- volume:
    size: 4Gi
  name: projects
- volume:
    size: 2Gi
  name: m2
- name: kafka-cluster
  openshift:
    deployByDefault: true
    inlined: |
      apiVersion: kafka.strimzi.io/v1beta2
      kind: Kafka
      metadata:
        name: che-demo
        labels:
          app: che-demo
      spec:
        kafka:
          config:
            offsets.topic.replication.factor: 1
            transaction.state.log.replication.factor: 1
            transaction.state.log.min.isr: 1
            inter.broker.protocol.version: '3.4'
          version: 3.4.0
          storage:
            size: 1Gi
            deleteClaim: true
            type: persistent-claim
          replicas: 1
          listeners:
            - name: plain
              port: 9092
              type: internal
              tls: false
            - name: tls
              port: 9093
              type: internal
              tls: true
        entityOperator:
          topicOperator: {}
          userOperator: {}
        zookeeper:
          storage:
            deleteClaim: true
            size: 1Gi
            type: persistent-claim
          replicas: 1
- name: kafka-topic
  openshift:
    deployByDefault: true
    inlined: |
      apiVersion: kafka.strimzi.io/v1beta2
      kind: KafkaTopic
      metadata:
        name: che-demo
        labels:
          strimz.io/cluster: che-demo
      spec:
        config:
          retention.ms: 604800000
          segment.bytes: 1073741824
        partitions: 10
        replicas: 1
        topicName: che-demo
commands:
- exec:
    commandLine: "cp /home/user/.kube/config /projects/config"
    component: dev-tools
    group:
      kind: run
    label: Copy Kubeconfig
    workingDir: '/'
  id: copy-kubeconfig

Example Devfile #2:

With this devfile, the workspace deploys with the correct container components, but there is no obvious way to run the apply commands. Further, the apply commands cannot be created with the deploy group as that group does not appear to be implemented. Note: you have to remove the group entries from the apply commands for this example to not throw an error.

schemaVersion: 2.2.0
attributes:
  controller.devfile.io/storage-type: per-workspace
metadata:
  name: che-test-workspace
components:
- name: dev-tools
  container: 
    image: image-registry.openshift-image-registry.svc:5000/eclipse-che-images/quarkus:latest
    memoryRequest: 1Gi
    memoryLimit: 6Gi
    cpuRequest: 500m
    cpuLimit: 2000m
    mountSources: true
    sourceMapping: /projects
    args:
      - '-f'
      - /dev/null
    command:
      - tail
    env:
    - name: SHELL
      value: "/bin/zsh"
    volumeMounts:
    - name: m2
      path: /home/user/.m2
- name: ubi
  container:
    args:
      - '-f'
      - /dev/null
    command:
      - tail
    image: registry.access.redhat.com/ubi9/ubi-minimal
    memoryLimit: 64M
    mountSources: true
    sourceMapping: /projects
- volume:
    size: 4Gi
  name: projects
- volume:
    size: 2Gi
  name: m2
- name: kafka-cluster
  openshift:
    inlined: |
      apiVersion: kafka.strimzi.io/v1beta2
      kind: Kafka
      metadata:
        name: che-demo
        labels:
          app: che-demo
      spec:
        kafka:
          config:
            offsets.topic.replication.factor: 1
            transaction.state.log.replication.factor: 1
            transaction.state.log.min.isr: 1
            inter.broker.protocol.version: '3.4'
          version: 3.4.0
          storage:
            size: 1Gi
            deleteClaim: true
            type: persistent-claim
          replicas: 1
          listeners:
            - name: plain
              port: 9092
              type: internal
              tls: false
            - name: tls
              port: 9093
              type: internal
              tls: true
        entityOperator:
          topicOperator: {}
          userOperator: {}
        zookeeper:
          storage:
            deleteClaim: true
            size: 1Gi
            type: persistent-claim
          replicas: 1
- name: kafka-topic
  openshift:
    inlined: |
      apiVersion: kafka.strimzi.io/v1beta2
      kind: KafkaTopic
      metadata:
        name: che-demo
        labels:
          strimz.io/cluster: che-demo
      spec:
        config:
          retention.ms: 604800000
          segment.bytes: 1073741824
        partitions: 10
        replicas: 1
        topicName: che-demo
commands:
- exec:
    commandLine: "cp /home/user/.kube/config /projects/config"
    component: dev-tools
    group:
      kind: run
    label: Copy Kubeconfig
    workingDir: '/'
  id: copy-kubeconfig
- apply:
    component: kafka-cluster
    group:
      kind: deploy
    label: deploy-kafka-cluster
  id: kafka-cluster
- apply:
    component: kafka-topic
    group:
      kind: deploy
    label: kafka-topic
  id: kafka-topic

Expected behavior

Workspace deployed with Kafka cluster and Topic

Runtime

OpenShift

Screenshots

No response

Installation method

OperatorHub

Environment

macOS

Eclipse Che Logs

No response

Additional context

The Strimzi Operator is installed with cluster scope.

@cgruver cgruver added the kind/bug Outline of a bug - must adhere to the bug report template. label Apr 7, 2023
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Apr 7, 2023
@l0rd l0rd added severity/P1 Has a major impact to usage or development of the system. area/devworkspace-operator and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Apr 11, 2023
@l0rd
Copy link
Contributor

l0rd commented Apr 11, 2023

@amisevsk can you please have a look

@amisevsk
Copy link
Contributor

I'm looking into the first example (with deployByDefault: true). For the second example, I believe it's expected that the editor will provide some way of applying the resources, e.g. via oc apply, as it's an interactive action and not something we can do with the DevWorkspace Operator. Perhaps we need an issue for supporting this in editors?

@amisevsk
Copy link
Contributor

amisevsk commented Apr 11, 2023

For the first Devfile sample, the dashboard hits a 403 error when attempting to patch the DevWorkspace, but ignores it and does not show it to the user (created issue: #22145)

Attempting to manually apply the same patch as the dashboard gives a more useful message:

Error from server (devworkspace controller serviceaccount does not have permissions 
to manage kind Kafka defined in component kafka-cluster -- an administrator needs 
to grant the devworkspace operator permissions ('*') kafka.strimzi.io/v1beta1, 
Kind=Kafka to use this DevWorkspace): admission webhook "mutate.devworkspace-controller.svc" 
denied the request: devworkspace controller serviceaccount does not have permissions to 
manage kind Kafka defined in component kafka-cluster -- an administrator needs to grant
the devworkspace operator permissions ('*') kafka.strimzi.io/v1beta1, Kind=Kafka to use 
this DevWorkspace

The basic explanation here is that in order to allow the DevWorkspace Operator to manage CRs from this operator, it needs to be granted * permissions on that CR via clusterrole/clusterrolebinding. This is currently required as the operator will be required to create, update, patch, list, watch, etc. the resources in question. We might be able to improve this in a future release (to scope the required permissions down somewhat). I've created issue devfile/devworkspace-operator#1083 to track this.

There may also be an issue with the dashboard (as DWO verifies the user applying the patch has permissions to get/create/update/delete the CR in question) but I cannot verify this at the moment.

@cgruver For this issue in specific, could you try again after creating the following resources on the cluster?

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: devworkspace-controller-admin-kafka
rules:
- apiGroups:
  - kafka.strimzi.io
  resources:
  - kafkas
  - kafkatopics
  verbs:
  - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
  name: devworkspace-controller-admin-kafka
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: devworkspace-controller-admin-kafka
subjects:
- kind: ServiceAccount
  name: devworkspace-controller-serviceaccount
  namespace: dw # Or wherever the DevWorkspace Operator is installed

@cgruver
Copy link
Author

cgruver commented Apr 11, 2023

@amisevsk Yes, I'll apply that RBAC and update here.

@amisevsk
Copy link
Contributor

Testing (very) briefly on OpenShift, I suspect the RBAC will not fix the issue as the user is not granted admin permissions in their namespace:

❯ oc auth can-i create kafkas -n user1-che
no

@amisevsk
Copy link
Contributor

Note: I've updated the clusterrole/clusterrolebinding in the comment above -- I had the incorrect API group for the clusterrole.

Tested on OpenShift with a cluster-admin user and a regular user:

  • As cluster-admin, workspace creation succeeds. However, workspace start ultimately fails, with message

    Error provisioning workspace Kubernetes components: could not process component kafka-cluster: no kind "Kafka" is registered for version "kafka.strimzi.io/v1beta1" in scheme "pkg/runtime/scheme.go:100"

    suggesting that DWO cannot manage CRs no matter what we do at the moment (it doesn't know how to serialize/deserialize them, which makes sense).

  • As a regular, non-cluster-admin user, I continue to get the original issue, except the 403 forbidden is now for user permissions:

    {"statusCode":403,"error":"Forbidden","message":"Unable to patch devworkspace: admission webhook "mutate.devworkspace-controller.svc" denied the request: user user1 does not have permissions to 'get' objects of kind kafka.strimzi.io/v1beta1, Kind=Kafka defined in component kafka-cluster"}

@cgruver
Copy link
Author

cgruver commented Apr 12, 2023

@amisevsk I had the wrong API version in the CRs. I failed to notice that it had recently updated to v1beta2 so your first error above is legit. That API version does not exist.

@cgruver
Copy link
Author

cgruver commented Apr 12, 2023

Never mind. I get the same error after the correction:

Error provisioning workspace Kubernetes components: could not process component kafka-cluster: no kind "Kafka" is registered for version "kafka.strimzi.io/v1beta2" in scheme "pkg/runtime/scheme.go:100"
Workspace stopped due to error

@cgruver
Copy link
Author

cgruver commented Apr 12, 2023

If I grant my user edit permissions to the Che provisioned namespace, then I can successfully create the Kafka resources manually within the workspace.

I'd prefer not to do that thought, because I would like the resources to be managed by the workspace. i.e. shutdown and/or removed when the workspace stops or is deleted.

@amisevsk
Copy link
Contributor

Yeah, the v1beta1 issue was a red herring, the real problem is that DWO doesn't know how to transmit Kafka CRs to the API server. We might need additional handling for custom resources, as this is an issue that will impact any CR on the cluster, not just Kafka.

I think our hands may be tied within the operator here, at least for the time being. I'll try to look into it more when I have some time.

The second flow (devfile no. 2) is still something that should be supported via the editor, though.

@cgruver
Copy link
Author

cgruver commented May 2, 2023

Update:

I validated that the creation of OpenShift resources works if it's something that the service account has permission to create:

- openshift:
    deployByDefault: true
    inlined: |
      kind: ConfigMap
      apiVersion: v1
      metadata:
        name: test-config
      data:
        test: "value"
  name: config-map

@l0rd
Copy link
Contributor

l0rd commented May 17, 2023

@amisevsk can you suggest a new title / description for the issue on the DW side please?

@amisevsk amisevsk changed the title Support for devfile 2.2.X spec for openshift & kubernetes objects is not working Support for CustomResource openshift & kubernetes devfile components is not working May 17, 2023
@amisevsk
Copy link
Contributor

I've updated the title to more precisely define the issue (custom resources are not supported in devfile components). Currently, the problem is that within the controller, we require the golang specs for custom resource objects in order to apply them and cache them within the reconcile loop.

However, standard Kubernetes objects should be supported. @cgruver let me know if this is accurate.

@cgruver
Copy link
Author

cgruver commented May 17, 2023

@amisevsk That is accurate

@cgruver
Copy link
Author

cgruver commented Sep 25, 2023

@amisevsk @l0rd

Is this still a backlog item? Or do we need some dependent work to enable CRDs in Dev Spaces?

@amisevsk
Copy link
Contributor

amisevsk commented Oct 2, 2023

@cgruver We're at an impasse on this issue; Go-based Kubernetes operators can only manage resources they "understand" (which basically means the Go structs are included in the project at build time). As a result, supporting arbitrary CR kinds in the operator is not possible using the standard controller-runtime library -- it doesn't know how to compare them, how to apply them, etc.

We may be able to ultimately find a solution that works in general, but it would likely require an entirely different way of dealing with these components. It's technically on the backlog, but near the bottom.

@che-bot
Copy link
Contributor

che-bot commented Mar 30, 2024

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

@che-bot che-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 30, 2024
@che-bot che-bot closed this as completed Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/devworkspace-operator kind/bug Outline of a bug - must adhere to the bug report template. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

4 participants