Add documentation about creating Pipeline Profiles (awslabs#700)

**Which issue is resolved by this Pull Request:** Resolves # **Description of your changes:** Add information about how to create Profiles that use IRSA and have correct s3 bucket access for Pipelines. **Testing:** - [ ] Unit tests pass - [ ] e2e tests pass - Details about new tests (If this PR adds a new feature) - Details about any manual tests performed By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
jsitu777 · Jun 27, 2023 · 6c8dfa0 · 6c8dfa0
1 parent b1d5e1d
commit 6c8dfa0
Show file tree

Hide file tree

Showing 7 changed files with 157 additions and 10 deletions.
diff --git a/website/content/en/docs/deployment/cognito-rds-s3/guide-terraform.md b/website/content/en/docs/deployment/cognito-rds-s3/guide-terraform.md
@@ -144,6 +144,12 @@ Run the following command:
 make deploy
 ```
 
+## Creating Profiles
+A default profile named `kubeflow-user-example-com` for email `user@example.com` has been configured with this deployment. If you are using IRSA as `PIPELINE_S3_CREDENTIAL_OPTION`, any additional profiles that you create will also need to be configured with IRSA and S3 Bucket access. Follow the [pipeline profiles]({{< ref "/docs/deployment/create-profiles-with-iam-role.md" >}}) for instructions on how to create additional profiles.
+
+If you are not using this feature, you can create a profile by just specifying email address of the user.
+
+
 ## Connect to your Kubeflow dashboard
 
 1. Head over to your user pool in the Cognito console and create a user with email `user@example.com` in `Users and groups`. 

diff --git a/website/content/en/docs/deployment/cognito-rds-s3/guide.md b/website/content/en/docs/deployment/cognito-rds-s3/guide.md
@@ -34,9 +34,16 @@ Enable culling for notebooks by following the [instructions]({{< ref "/docs/depl
 2. Deploy Kubeflow.
 
     1. Export your pipeline-s3-credential-option
-    ```bash 
-    export PIPELINE_S3_CREDENTIAL_OPTION=<IRSA/STATIC>
-    ```
+    {{< tabpane persistLang=false >}}
+{{< tab header="IRSA" lang="toml" >}}
+# Pipeline S3 Credential Option to configure 
+export PIPELINE_S3_CREDENTIAL_OPTION="irsa"
+{{< /tab >}}
+{{< tab header="IAM User" lang="toml" >}}
+# Pipeline S3 Credential Option to configure 
+export PIPELINE_S3_CREDENTIAL_OPTION="static"
+{{< /tab >}}
+   {{< /tabpane >}}
 
     1. Install Kubeflow using the following command:
 
@@ -56,6 +63,11 @@ make deploy-kubeflow INSTALLATION_OPTION=helm DEPLOYMENT_OPTION=cognito-rds-s3 P
     1. Create a profile for the user from the user pool
     1. Connect to the central dashboard
 
+## Creating Profiles
+A default profile named `kubeflow-user-example-com` for email `user@example.com` has been configured with this deployment. If you are using IRSA as `PIPELINE_S3_CREDENTIAL_OPTION`, any additional profiles that you create will also need to be configured with IRSA and S3 Bucket access. Follow the [pipeline profiles]({{< ref "/docs/deployment/create-profiles-with-iam-role.md" >}}) for instructions on how to create additional profiles.
+
+If you are not using this feature, you can create a profile by just specifying email address of the user.
+
 ## Uninstall Kubeflow
 > Note: Delete all the resources you might have created in your profile namespaces before running these steps.
 1. Run the following commands to delete the profiles, ingress and corresponding ingress managed load balancer

diff --git a/website/content/en/docs/deployment/cognito/manifest/guide.md b/website/content/en/docs/deployment/cognito/manifest/guide.md
@@ -204,8 +204,8 @@ make deploy-kubeflow INSTALLATION_OPTION=helm DEPLOYMENT_OPTION=cognito
     Before connecting to the dashboard:
 
     * Go to the Cognito console and create some users in `Users and groups`. These are the users who will log in to the central dashboard.
-        ![cognito-user-pool-created](https://raw.githubusercontent.com/awslabs/kubeflow-manifests/main/website/content/en/docs/images/cognito/cognito-user-pool-created.png)
-
+        - Create a user with email address `user@example.com`. This user and email address come preconfigured and have a Profile created by default.
+    ![cognito-user-pool-created](https://raw.githubusercontent.com/awslabs/kubeflow-manifests/main/website/content/en/docs/images/cognito/cognito-user-pool-created.png)
     * Create a Profile for a user by following the steps in the [Manual Profile Creation](https://www.kubeflow.org/docs/components/multi-tenancy/getting-started/#manual-profile-creation). 
     The following is a Profile example for reference:
          ```bash

diff --git a/website/content/en/docs/deployment/create-profiles-with-iam-role.md b/website/content/en/docs/deployment/create-profiles-with-iam-role.md
@@ -0,0 +1,120 @@
++++
+title = "Create Profiles with IAM role"
+description = "Use AWS IAM roles for service accounts with Kubeflow Profiles"
+weight = 70
++++
+
+In a multi tenant Kubeflow installation, the pods created by pipelines workflow and the pipelines frontend services run in an user profile namespace. The service account (`default-editor`) used for these pods needs permissions for the S3 bucket used by pipelines to read and write artifacts from S3. When using IRSA (IAM roles for service accounts) as your `PIPELINE_S3_CREDENTIAL_OPTION`, any additional profiles created as part of a multi-user deployment besides the preconfigured `kubeflow-user-example-com` will need to be configured with permissions to S3 bucket using IRSA.
+
+The `default-editor` SA needs to be annotated with an IAM role with sufficient permissions to access your S3 Bucket to run your pipelines. In the below steps we will be configuring a profile an IAM role with restricted access to a specific S3 Bucket using the `AwsIamForServiceAccount` plugin for Profiles. To learn more about the `AwsIamForServiceAccount` plugin for Profiles read the [Profiles component guide]({{< ref "/docs/component-guides/profiles.md" >}}).
+
+> Note: If you choose to run your pipeline with a service account other than the default which is `default-editor`, you must make sure to annotate that service account with an IAM role with sufficient S3 permissions.
+
+## Create a Profile
+
+After installing Kubeflow on AWS with one of the available [deployment options]({{< ref "/docs/deployment" >}}), you can configure Kubeflow Profiles with the following steps:
+
+1. Define the following environment variables:
+
+   The `S3_BUCKET` that is exported should be the same bucket that is used by Kubeflow Pipelines.
+   ```bash
+   # Your cluster name
+   export CLUSTER_NAME=
+   # Your cluster region
+   export CLUSTER_REGION=
+   # The S3 Bucket that is used by Kubeflow Pipelines
+   export S3_BUCKET=
+   # Your AWS Acconut ID
+   export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
+   # Name of the profile to create
+   export PROFILE_NAME=
+   ```
+2. Retrieve OIDC Provider URL
+
+   ```bash
+   aws --region $CLUSTER_REGION eks update-kubeconfig --name $CLUSTER_NAME
+
+   export OIDC_URL=$(aws eks describe-cluster --region $CLUSTER_REGION --name $CLUSTER_NAME  --query "cluster.identity.oidc.issuer" --output text | cut -c9-)
+   ```
+
+3. Create an IAM trust policy to authorize federated requests from the OIDC provider.
+
+   ```bash
+
+   cat <<EOF > trust.json
+   {
+   "Version": "2012-10-17",
+   "Statement": [
+       {
+       "Effect": "Allow",
+       "Principal": {
+           "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_URL}"
+       },
+       "Action": "sts:AssumeRoleWithWebIdentity",
+       "Condition": {
+           "StringEquals": {
+           "${OIDC_URL}:aud": "sts.amazonaws.com",
+           "${OIDC_URL}:sub": "system:serviceaccount:kubeflow-user-example-com:default-editor"
+           }
+       }
+       }
+   ]
+   }
+   EOF
+   ```
+
+4. [Create an IAM policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) with access to the S3 bucket where pipeline artifacts will be stored. The following policy grants full access to the S3 bucket, you can scope it down by giving read, write and GetBucketLocation permissions.
+    ```bash
+    printf '{
+        "Version": "2012-10-17",
+        "Statement": [
+        {
+            "Effect": "Allow",
+            "Action": "s3:*",
+            "Resource": [
+                "arn:aws:s3:::${S3_BUCKET}",
+                "arn:aws:s3::::${S3_BUCKET}/*"
+                  ]
+               }
+            ]
+         }
+          ' > ./s3_policy.json
+    ```
+5. [Create an IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) for the Profile using the scoped policy from the previous step.
+
+   ```bash
+    aws iam create-role --role-name $PROFILE_NAME-$CLUSTER_NAME-role --assume-role-policy-document file://trust.json
+
+    aws --region $CLUSTER_REGION iam put-role-policy --role-name $PROFILE_NAME-$CLUSTER_NAME-role --policy-name kf-$PROFILE_NAME-pipeline-s3 --policy-document file://s3_policy.json  
+    ```
+
+6. Create a user in your configured auth provider (e.g. Cognito or Dex).
+
+   Export the user as an environment variable. 
+
+   ```bash
+   export PROFILE_USER=""
+   ```
+
+7. Create a Profile using the `PROFILE_NAME`.
+
+> Note: annotateOnly has been set to true. This means that the Profile Controller will not mutate your IAM Role and Policy.
+   ```bash
+   cat <<EOF > profile_iam.yaml
+   apiVersion: kubeflow.org/v1
+   kind: Profile
+   metadata:
+     name: ${PROFILE_NAME}
+   spec:
+     owner:
+       kind: User
+       name: ${PROFILE_USER}
+     plugins:
+     - kind: AwsIamForServiceAccount
+       spec:
+         awsIamRole: $(aws iam get-role --role-name $PROFILE_NAME-$CLUSTER_NAME-role --output text --query 'Role.Arn')
+         annotateOnly: true
+   EOF
+
+   kubectl apply -f profile_iam.yaml
+   ```
diff --git a/website/content/en/docs/deployment/rds-s3/guide-terraform.md b/website/content/en/docs/deployment/rds-s3/guide-terraform.md
@@ -119,6 +119,11 @@ Run the following command:
 make deploy
 ```
 
+## Creating Profiles
+A default profile named `kubeflow-user-example-com` for email `user@example.com` has been configured with this deployment. If you are using IRSA as `PIPELINE_S3_CREDENTIAL_OPTION`, any additional profiles that you create will also need to be configured with IRSA and S3 Bucket access. Follow the [pipeline profiles]({{< ref "/docs/deployment/create-profiles-with-iam-role.md" >}}) for instructions on how to create additional profiles.
+
+If you are not using this feature, you can create a profile by just specifying email address of the user.
+
 ## Connect to your Kubeflow dashboard
 
 For information on connecting to your Kubeflow dashboard depending on your deployment environment, see [Port-forward (Terraform deployment)]({{< ref "../connect-kubeflow-dashboard/#port-forward-terraform-deployment" >}}). Then, [log into the Kubeflow UI]({{< ref "../connect-kubeflow-dashboard/#log-into-the-kubeflow-ui" >}}).

diff --git a/website/content/en/docs/deployment/rds-s3/guide.md b/website/content/en/docs/deployment/rds-s3/guide.md
@@ -403,7 +403,6 @@ yq e '.s3.minioServiceRegion = env(CLUSTER_REGION)' -i charts/apps/kubeflow-pipe
 ### (Optional) Configure Culling for Notebooks
 Enable culling for notebooks by following the [instructions]({{< ref "/docs/deployment/configure-notebook-culling.md#" >}}) in configure culling for notebooks guide. 
 
-
 ## 3.0 Build Manifests and install Kubeflow
 
 Once you have the resources ready, you can deploy the Kubeflow manifests for one of the following deployment options:
@@ -458,9 +457,14 @@ Once everything is installed successfully, you can access the Kubeflow Central D
 
 You can now start experimenting and running your end-to-end ML workflows with Kubeflow!
 
-## 4.0 Verify the installation
+## 4.0 Creating Profiles
+A default profile named `kubeflow-user-example-com` for email `user@example.com` has been configured with this deployment. If you are using IRSA as `PIPELINE_S3_CREDENTIAL_OPTION`, any additional profiles that you create will also need to be configured with IRSA and S3 Bucket access. Follow the [pipeline profiles]({{< ref "/docs/deployment/create-profiles-with-iam-role.md" >}}) for instructions on how to create additional profiles.
+
+If you are not using this feature, you can create a profile by just specifying email address of the user.
+
+## 5.0 Verify the installation
 
-### 4.1 Verify RDS
+### 5.1 Verify RDS
 
 1. Connect to your RDS instance from a pod within the cluster with the following command:
 ```bash
@@ -536,7 +540,7 @@ mysql> use kubeflow; show tables;
 mysql> select * from observation_logs;
 ```
 
-### 4.2 Verify S3
+### 5.2 Verify S3
 
 1. Access the Kubeflow Central Dashboard [by logging in to your cluster]({{< ref "/docs/deployment/vanilla/guide.md#connect-to-your-kubeflow-cluster" >}}) and navigate to Kubeflow Pipelines (under Pipelines).
 
@@ -546,7 +550,7 @@ mysql> select * from observation_logs;
 
 4. Verify that the bucket is not empty and was populated by the outputs of the experiment.
 
-## 5.0 Uninstall Kubeflow
+## 6.0 Uninstall Kubeflow
 
 Run the following command to uninstall your Kubeflow deployment:
 

diff --git a/website/content/en/docs/images/cognito/cognito-user-pool-created.png b/website/content/en/docs/images/cognito/cognito-user-pool-created.png