Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Testing in Progress] Create and provide EFS CSI driver policy to blueprints addons #726

Closed
wants to merge 1 commit into from

Conversation

rrrkharse
Copy link
Contributor

Which issue is resolved by this Pull Request:
Resolves #717

Description of your changes:
The default IAM policy created for the EFS CSI driver service account is does not contain all the required permissions mentioned here.

The Terraform install was updated to create a policy using the above URL and pass that policy as an input to the Blueprints Addons module, in which the EFS CSI driver module would use the policy as an input when creating the IAM role for the EFS CSI driver service account.

Testing:

Tested both manual deployment and modified auto setup script cases.

Auto setup script diff:

ubuntu@ip-172-31-21-105:~/tf-efs/kubeflow-manifests$ git diff tests
diff --git a/tests/e2e/utils/auto-efs-setup.py b/tests/e2e/utils/auto-efs-setup.py
index 13866535..af331100 100755
--- a/tests/e2e/utils/auto-efs-setup.py
+++ b/tests/e2e/utils/auto-efs-setup.py
@@ -14,8 +14,8 @@ def main():

     verify_prerequisites()

-    setup_iam_authorization()
-    setup_efs_driver()
+    # setup_iam_authorization()
+    # setup_efs_driver()
     setup_efs_file_system()
     setup_efs_provisioning()

Verified notebook and volumes were created:
Screen Shot 2023-05-03 at 5 22 15 PM

Screen Shot 2023-05-03 at 5 23 10 PM

PVC yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: 'yes'
    pv.kubernetes.io/bound-by-controller: 'yes'
    volume.beta.kubernetes.io/storage-provisioner: efs.csi.aws.com
    volume.kubernetes.io/selected-node: ip-10-0-107-16.us-west-2.compute.internal
    volume.kubernetes.io/storage-provisioner: efs.csi.aws.com
  creationTimestamp: '2023-05-03T23:07:43+00:00'
  finalizers:
    - kubernetes.io/pvc-protection
  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          'f:accessModes': {}
          'f:resources':
            'f:requests':
              .: {}
              'f:storage': {}
          'f:storageClassName': {}
          'f:volumeMode': {}
      manager: OpenAPI-Generator
      operation: Update
      time: '2023-05-03T23:07:43+00:00'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:volume.kubernetes.io/selected-node': {}
      manager: kube-scheduler
      operation: Update
      time: '2023-05-03T23:07:44+00:00'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:pv.kubernetes.io/bind-completed': {}
            'f:pv.kubernetes.io/bound-by-controller': {}
            'f:volume.beta.kubernetes.io/storage-provisioner': {}
            'f:volume.kubernetes.io/storage-provisioner': {}
        'f:spec':
          'f:volumeName': {}
      manager: kube-controller-manager
      operation: Update
      time: '2023-05-03T23:07:45+00:00'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:accessModes': {}
          'f:capacity':
            .: {}
            'f:storage': {}
          'f:phase': {}
      manager: kube-controller-manager
      operation: Update
      subresource: status
      time: '2023-05-03T23:07:45+00:00'
  name: tf-efs-auto-script-volume
  namespace: kubeflow-user-example-com
  resourceVersion: '126178'
  uid: e09f5782-61bb-49d7-bf67-d975f2e15870
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: efs-sc
  volumeMode: Filesystem
  volumeName: pvc-e09f5782-61bb-49d7-bf67-d975f2e15870
status:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 1Gi
  phase: Bound

Storage class configuration:

ubuntu@ip-172-31-21-105:~/tf-efs/kubeflow-manifests$ kubectl get sc -o yaml
apiVersion: v1
items:
- allowVolumeExpansion: true
  apiVersion: storage.k8s.io/v1
  kind: StorageClass
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"efs-sc"},"mountOptions":["tls"],"parameters":{"directoryPerms":"700","fileSystemId":"fs-07bf42ce7f10f6b20","gid":"100","provisioningMode":"efs-ap","uid":"1000"},"provisioner":"efs.csi.aws.com","reclaimPolicy":"Delete","volumeBindingMode":"WaitForFirstConsumer"}
    creationTimestamp: "2023-05-03T23:03:45Z"
    name: efs-sc
    resourceVersion: "123612"
    uid: 426dd9fa-0014-43d6-83fb-85b682b5fb63
  mountOptions:
  - tls
  parameters:
    directoryPerms: "700"
    fileSystemId: fs-07bf42ce7f10f6b20
    gid: "100"
    provisioningMode: efs-ap
    uid: "1000"
  provisioner: efs.csi.aws.com
  reclaimPolicy: Delete
  volumeBindingMode: WaitForFirstConsumer
- apiVersion: storage.k8s.io/v1
  kind: StorageClass
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp2"},"parameters":{"fsType":"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","volumeBindingMode":"WaitForFirstConsumer"}
      storageclass.kubernetes.io/is-default-class: "false"
    creationTimestamp: "2023-05-03T19:43:30Z"
    name: gp2
    resourceVersion: "84473"
    uid: 9ea8c1ec-5030-467d-895d-4df6cbc9d4d0
  parameters:
    fsType: ext4
    type: gp2
  provisioner: kubernetes.io/aws-ebs
  reclaimPolicy: Delete
  volumeBindingMode: WaitForFirstConsumer
kind: List
metadata:
  resourceVersion: ""

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@AlexandreBrown
Copy link
Contributor

AlexandreBrown commented May 4, 2023

Thank you @rrrkharse for jumping on this!
If the only changes to the auto script are commenting these two lines :

# setup_iam_authorization()
# setup_efs_driver()

Then maybe we could introduce a flag to the auto script (is_terraform_deployment) or something similar (false by default) that we could pass as true if using terraform as deployment option?

if not is_terraform_deployment:
    setup_iam_authorization()
    setup_efs_driver()

Also, is it normal that no service account is created when using terraform? How is the policy being used exactly when we use terraform and comment those two lines?

@surajkota
Copy link
Contributor

surajkota commented May 4, 2023

Shouldnt we fix the upstream instead? should be quick

https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/eed7c6c66376b2d144f00a038271ab53eaa3fcbc/modules/kubernetes-addons/aws-efs-csi-driver/data.tf

To confirm, its failing because of the TagResource permissions?

@surajkota
Copy link
Contributor

surajkota commented May 4, 2023

re:auto script, We can introduce generic flag like --skip-driver-installation instead of terraform-deployment for anyone who has installed driver already if those are the only changes required

@AlexandreBrown
Copy link
Contributor

@surajkota By curiosity, how is the policy being used if there is no service account created? (setup_iam_authorization is commented)

@surajkota
Copy link
Contributor

Service account and IAM Role is created for the controller, are you asking where the code for that is in terraform?

@AlexandreBrown
Copy link
Contributor

@surajkota Ok so the EFS policy created by the terraform deployment is being used by the controler SA? Just looking to understand what replaces the IAM setup of the autoscript when using terraform.

@surajkota
Copy link
Contributor

Its implemented in upstream in eks-blueprints. EFS driver module uses helm add on module, which uses IRSA module

IRSA configuration for EFS CSI driver is defined here - https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/kubernetes-addons/aws-efs-csi-driver/main.tf#L24-L50

@surajkota
Copy link
Contributor

surajkota commented May 5, 2023

auto script might also be depending on eksctl related tags for getting cluster related info like subnets etc. that might need to be changed to using describe_cluster instead or some other generic way. This issue remains with all auto scripts hence we recommended using manual steps for now

@AlexandreBrown
Copy link
Contributor

@surajkota I see, are you sure that's an issue for the EFS script? I don't see any usage of a tag to query any information.

I see describe_cluster being used.

@surajkota
Copy link
Contributor

surajkota commented May 5, 2023

I just meant to give a headsup for potential issues. looks like that's in rds-s3 script(works only with specific cdk or eksctl tags on subnet), efs script was updated in https://github.com/awslabs/kubeflow-manifests/pull/185/files

efs script does check for eksctl being installed on the system, I think we should remove those tools verification since its part of prerequisites and is duplicate code across multiple scripts.

Ideally we would have terraform module for this in future which would make it easy to change file system attributes post creation and address clean up via terraform with no additional code

@AlexandreBrown
Copy link
Contributor

AlexandreBrown commented May 5, 2023

@surajkota great, yes I was aware for rds-s3 thank you.
I think once this PR is merged maybe the doc should be updated to remove the manual steps recommendation or at least reword it so it doesn't sound like auto script won't work for terraform.

I agree with you, ideally in the future EFS should be part of the terraform deployment so that it's easy to cleanup.

I created an issue to keep track of this feature request ( #713 )

@rrrkharse
Copy link
Contributor Author

#731

@rrrkharse rrrkharse closed this May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EFS does not work for terraform deployment when using built-in EFS driver install
3 participants