Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EFS does not work for terraform deployment when using built-in EFS driver install #717

Closed
AlexandreBrown opened this issue May 2, 2023 · 4 comments · Fixed by #731
Closed
Labels
bug Something isn't working work in progress Has been assigned and is in progress

Comments

@AlexandreBrown
Copy link
Contributor

Describe the bug
If we follow the doc and use the manual step (or in my case I modified the auto efs script to only install the file system and create the storageclass), EFS creation succeeds but when creating a test notebook the volume is in pending state forever.

Events:
  Type    Reason                Age               From                         Message
  ----    ------                ----              ----                         -------
  Normal  WaitForFirstConsumer  72s               persistentvolume-controller  waiting for first consumer to be created before binding
  Normal  ExternalProvisioning  2s (x7 over 70s)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by system administrator

Steps To Reproduce
Deploy EFS using the auto setup script trimmed to the equivalent of the manual steps for terraform deployment :

def main():
    header()

    verify_prerequisites()

    setup_efs_file_system()
    setup_efs_provisioning()

    footer()

Environment

  • Kubernetes version 1.25
  • Using EKS (yes/no), if so version? 1.25
  • Kubeflow version v1.7.0
  • AWS build number v1.7.0-aws-b1.0.0
  • AWS service targeted (S3, RDS, etc.) coginito-rds-s3

Screenshots
image

@AlexandreBrown AlexandreBrown added the bug Something isn't working label May 2, 2023
@ananth102
Copy link
Contributor

Hi AlexandreBrown, does your oidc provider have the alpha.eksctl.io/cluster-name tag and is there anything interesting that you see in the efs csi driver logs or on cloudtrail(related to efs). We also recommend following the manual steps for terraform.

@AlexandreBrown
Copy link
Contributor Author

AlexandreBrown commented May 2, 2023

@ryansteakley My OIDC provider (created via terraform deployment I suppose) has the following tag :
image
I added the tag :
image

But it did not change anything :

  Events:
  Type    Reason                Age                From                         Message
  ----    ------                ----               ----                         -------
  Normal  WaitForFirstConsumer  41s                persistentvolume-controller  waiting for first consumer to be created before binding
  Normal  ExternalProvisioning  13s (x3 over 39s)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "efs.csi.aws.com" or manually created by system administrator

We also recommend following the manual steps for terraform.

I modified the auto script to only keep the parts that create the file system (steps that matches the manual steps).
I'm not sure why that would not work.

import argparse
import boto3
import subprocess
import string
import random
import yaml
from shutil import which
from time import sleep


def main():
    header()

    verify_prerequisites()

    setup_efs_file_system()
    setup_efs_provisioning()

    footer()

...

@AlexandreBrown
Copy link
Contributor Author

AlexandreBrown commented May 3, 2023

@ryansteakley From my comprehension, the doc says we have to skip the entire step 1. (so step 1.1 and 1.2) since the text is below 1.
Is this correct or did it meant to say only skip 1.1?
image

@AlexandreBrown
Copy link
Contributor Author

AlexandreBrown commented May 3, 2023

@ryansteakley After further testing it looks like the only way I could get EFS to work was to use the auto script (no manual steps and no skipping of the CSI driver install).
Maybe the driver installed by terraform is not being used or detected? It works with the auto script (untouched from the repo) but it does not work when I do all the steps but the driver install.

The following worked (snippet of my dockerfile):

RUN OIDC_ID=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | cut -d "/" -f5) \
    && AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) \
    && aws iam tag-open-id-connect-provider \
        --open-id-connect-provider-arn "arn:aws:iam::$AWS_ACCOUNT_ID:oidc-provider/oidc.eks.$CLUSTER_REGION.amazonaws.com/id/$OIDC_ID" \
        --tags Key="alpha.eksctl.io/cluster-name",Value="${CLUSTER_NAME}" \
    && python utils/auto-efs-setup.py \
        --region $CLUSTER_REGION \
        --cluster $CLUSTER_NAME \
        --efs_file_system_name $EFS_FILE_SYSTEM_NAME \
        --efs_security_group_name $EFS_SECURITY_GROUP_NAME \
        --efs_throughput_mode elastic \
    && kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}' \
    && kubectl patch storageclass efs-sc -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'  

@AlexandreBrown AlexandreBrown changed the title EFS does not work for terraform deployment EFS does not work for terraform deployment when skipping driver install May 3, 2023
@AlexandreBrown AlexandreBrown changed the title EFS does not work for terraform deployment when skipping driver install EFS does not work for terraform deployment when using built-in EFS driver install May 4, 2023
@rrrkharse rrrkharse added the work in progress Has been assigned and is in progress label May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working work in progress Has been assigned and is in progress
Projects
None yet
3 participants