-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup autodeploy for GCP blueprints #5
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
1 similar comment
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
Issue Label Bot is not confident enough to auto-label this issue. |
* Fix some bugs in the blueprints that cropped up while working on setting up continuous auto-deployments using the blueprints (GoogleCloudPlatform#5) Fix some bugs in the documentation. * Fix bugs in the management config for the per namespace components of CNRM. The namespaces of the role bindings wasn't correct so the cnrm manager pod ended up not having appropriate permissions. * Also the scoped namespace of the cnrm manager statefulset needs to be set the managed project not the host project. * Update Makefile to point at kubeflow/manifests master to pull in cert-manager changes. * Add check_domain_length to validate the length of the hostname KF deployment name so that we don't end up exceeding the certificate limits. Check in the blueprint manifests. Clean for PR.
* Fix some bugs in the blueprints that cropped up while working on setting up continuous auto-deployments using the blueprints (GoogleCloudPlatform#5) Fix some bugs in the documentation. * Fix bugs in the management config for the per namespace components of CNRM. The namespaces of the role bindings wasn't correct so the cnrm manager pod ended up not having appropriate permissions. * Also the scoped namespace of the cnrm manager statefulset needs to be set the managed project not the host project. * Update Makefile to point at kubeflow/manifests master to pull in cert-manager changes. * Add check_domain_length to validate the length of the hostname KF deployment name so that we don't end up exceeding the certificate limits.
* Fix some bugs in the blueprints that cropped up while working on setting up continuous auto-deployments using the blueprints (GoogleCloudPlatform#5) Fix some bugs in the documentation. * Fix bugs in the management config for the per namespace components of CNRM. The namespaces of the role bindings wasn't correct so the cnrm manager pod ended up not having appropriate permissions. * Also the scoped namespace of the cnrm manager statefulset needs to be set the managed project not the host project. * Update Makefile to point at kubeflow/manifests master to pull in cert-manager changes. * Add check_domain_length to validate the length of the hostname KF deployment name so that we don't end up exceeding the certificate limits.
auto-deploy is running on cluster I made the service account |
* Fix some bugs in the blueprints that cropped up while working on setting up continuous auto-deployments using the blueprints (#5) Fix some bugs in the documentation. * Fix bugs in the management config for the per namespace components of CNRM. The namespaces of the role bindings wasn't correct so the cnrm manager pod ended up not having appropriate permissions. * Also the scoped namespace of the cnrm manager statefulset needs to be set the managed project not the host project. * Update Makefile to point at kubeflow/manifests master to pull in cert-manager changes. * Add check_domain_length to validate the length of the hostname KF deployment name so that we don't end up exceeding the certificate limits.
* This Tekton pipeline will eventually be used to continually deploy a fresh instance from the blueprint for CI. Reorganize how we are defining reusable Tekton tasks. * Tekton tasks are currently defined in tekton/templates * I reorganized the tekton tasks into kustomize packages * I did this because I want to make it easier to hydrate the tasks for different installs (e.g. different namespaces). * e.g. for auto-deployment we will use namespace auto-deploy but in other settings we might use a different namespace. Start setting up an ACM repo in acm-repo * This will eventually be used to sync our Tekton tasks automatically to our cluster * The idea is to have a single ACM repo to manage all of our CI/CD clusters * A single ACM repo can manage multiple clusters. * We could use ACM cluster selectors to select which target this applies to * So we could eventually reuse this same repo for label-sync configs but only sync label-sync to the cluster where label-sync runs. * Start putting hydrated Tekton pipelines here * ACM isn't actually installed on our cluster yet so we aren't actually syncing the resources yet. Right now we are still applying the manually Update the management cluster to work for autodeployment * Our management cluster needs to grant kf-ci-v1-user@ GSA permissions to create CNRM resources so we can deploy kubeflow. * We do this by adding a K8s RoleBinding binding that GSA to the cnrm-admin ClusterRole in namespace kubeflow-ci-deployment To support GCP blueprints I had to update the test worker image. * Install anthoscli, kpt, and istioctl * Install a newer version of yq (i.e. the yq that is a go binary and not a wrapper around jq). Related to: GoogleCloudPlatform/kubeflow-distribution#5
* This Tekton pipeline will eventually be used to continually deploy a fresh instance from the blueprint for CI. Reorganize how we are defining reusable Tekton tasks. * Tekton tasks are currently defined in tekton/templates * I reorganized the tekton tasks into kustomize packages * I did this because I want to make it easier to hydrate the tasks for different installs (e.g. different namespaces). * e.g. for auto-deployment we will use namespace auto-deploy but in other settings we might use a different namespace. Start setting up an ACM repo in acm-repo * This will eventually be used to sync our Tekton tasks automatically to our cluster * The idea is to have a single ACM repo to manage all of our CI/CD clusters * A single ACM repo can manage multiple clusters. * We could use ACM cluster selectors to select which target this applies to * So we could eventually reuse this same repo for label-sync configs but only sync label-sync to the cluster where label-sync runs. * Start putting hydrated Tekton pipelines here * ACM isn't actually installed on our cluster yet so we aren't actually syncing the resources yet. Right now we are still applying the manually Update the management cluster to work for autodeployment * Our management cluster needs to grant kf-ci-v1-user@ GSA permissions to create CNRM resources so we can deploy kubeflow. * We do this by adding a K8s RoleBinding binding that GSA to the cnrm-admin ClusterRole in namespace kubeflow-ci-deployment To support GCP blueprints I had to update the test worker image. * Install anthoscli, kpt, and istioctl * Install a newer version of yq (i.e. the yq that is a go binary and not a wrapper around jq). Related to: GoogleCloudPlatform/kubeflow-distribution#5
* This Tekton pipeline will eventually be used to continually deploy a fresh instance from the blueprint for CI. Reorganize how we are defining reusable Tekton tasks. * Tekton tasks are currently defined in tekton/templates * I reorganized the tekton tasks into kustomize packages * I did this because I want to make it easier to hydrate the tasks for different installs (e.g. different namespaces). * e.g. for auto-deployment we will use namespace auto-deploy but in other settings we might use a different namespace. Start setting up an ACM repo in acm-repo * This will eventually be used to sync our Tekton tasks automatically to our cluster * The idea is to have a single ACM repo to manage all of our CI/CD clusters * A single ACM repo can manage multiple clusters. * We could use ACM cluster selectors to select which target this applies to * So we could eventually reuse this same repo for label-sync configs but only sync label-sync to the cluster where label-sync runs. * Start putting hydrated Tekton pipelines here * ACM isn't actually installed on our cluster yet so we aren't actually syncing the resources yet. Right now we are still applying the manually Update the management cluster to work for autodeployment * Our management cluster needs to grant kf-ci-v1-user@ GSA permissions to create CNRM resources so we can deploy kubeflow. * We do this by adding a K8s RoleBinding binding that GSA to the cnrm-admin ClusterRole in namespace kubeflow-ci-deployment To support GCP blueprints I had to update the test worker image. * Install anthoscli, kpt, and istioctl * Install a newer version of yq (i.e. the yq that is a go binary and not a wrapper around jq). Related to: GoogleCloudPlatform/kubeflow-distribution#5
* cnrm_clients.py is a quick hack to create a wrapper to make it easier to work with CNRM custom resources. Related to: GoogleCloudPlatform/kubeflow-distribution#5 autodeployments of blueprints
* We want to auto-deploy the GCP blueprint (GoogleCloudPlatform/kubeflow-distribution#5) * We need to add logic and K8s resources to cleanup the blueprints so we don't run out of GCP quota. * Create cleanup_blueprints.py to cleanup auto_deployed blueprints. * Don't put this code in cleanup_ci.py because we want to be able to use fire and possibly python3 (not sure code in cleanup_ci is python3 compatible) * Create a CLI create_context.py to create K8s config contexts. This will be used to get credentials to talk to the cleanup cluster when running on K8s. * Create a Tekton task to run the cleanup script. This is intended as a replacement for our existing K8s job (kubeflow#654). There's a couple reasons to start using Tekton i) We are already using Tekton as part of AutoDeploy infrastructure. ii) We can leverage Tekton to handle git checkouts. iii) Tekton makes it easy to additional steps to do things like create the context. * This is a partial solution. This PR contains a Tekton pipeline that is only running cleanup for the blueprints. * To do all cleanup using Tekton we just need to a step or Task to run the existing cleanup-ci script. The only issue I forsee is that the Tekton pipeline runs in the kf-ci-v1 cluster and will need to be granted access to the kubeflow-testing cluster so we can cleanup Argo workflows in that cluster. * To run the Tekton pipeline regulary we create a cronjob that runs kubectl apply. * cnrm_clients.py is a quick hack to create a wrapper to make it easier to work with CNRM custom resources.
* We want to auto-deploy the GCP blueprint (GoogleCloudPlatform/kubeflow-distribution#5) * We need to add logic and K8s resources to cleanup the blueprints so we don't run out of GCP quota. * Create cleanup_blueprints.py to cleanup auto_deployed blueprints. * Don't put this code in cleanup_ci.py because we want to be able to use fire and possibly python3 (not sure code in cleanup_ci is python3 compatible) * Create a CLI create_context.py to create K8s config contexts. This will be used to get credentials to talk to the cleanup cluster when running on K8s. * Create a Tekton task to run the cleanup script. This is intended as a replacement for our existing K8s job (kubeflow#654). There's a couple reasons to start using Tekton i) We are already using Tekton as part of AutoDeploy infrastructure. ii) We can leverage Tekton to handle git checkouts. iii) Tekton makes it easy to additional steps to do things like create the context. * This is a partial solution. This PR contains a Tekton pipeline that is only running cleanup for the blueprints. * To do all cleanup using Tekton we just need to a step or Task to run the existing cleanup-ci script. The only issue I forsee is that the Tekton pipeline runs in the kf-ci-v1 cluster and will need to be granted access to the kubeflow-testing cluster so we can cleanup Argo workflows in that cluster. * To run the Tekton pipeline regulary we create a cronjob that runs kubectl apply. * cnrm_clients.py is a quick hack to create a wrapper to make it easier to work with CNRM custom resources.
* We want to auto-deploy the GCP blueprint (GoogleCloudPlatform/kubeflow-distribution#5) * We need to add logic and K8s resources to cleanup the blueprints so we don't run out of GCP quota. * Create cleanup_blueprints.py to cleanup auto_deployed blueprints. * Don't put this code in cleanup_ci.py because we want to be able to use fire and possibly python3 (not sure code in cleanup_ci is python3 compatible) * Create a CLI create_context.py to create K8s config contexts. This will be used to get credentials to talk to the cleanup cluster when running on K8s. * Create a Tekton task to run the cleanup script. This is intended as a replacement for our existing K8s job (#654). There's a couple reasons to start using Tekton i) We are already using Tekton as part of AutoDeploy infrastructure. ii) We can leverage Tekton to handle git checkouts. iii) Tekton makes it easy to additional steps to do things like create the context. * This is a partial solution. This PR contains a Tekton pipeline that is only running cleanup for the blueprints. * To do all cleanup using Tekton we just need to a step or Task to run the existing cleanup-ci script. The only issue I forsee is that the Tekton pipeline runs in the kf-ci-v1 cluster and will need to be granted access to the kubeflow-testing cluster so we can cleanup Argo workflows in that cluster. * To run the Tekton pipeline regulary we create a cronjob that runs kubectl apply. * cnrm_clients.py is a quick hack to create a wrapper to make it easier to work with CNRM custom resources.
* We want to auto-deploy the GCP blueprint (GoogleCloudPlatform/kubeflow-distribution#5) * We need to add logic and K8s resources to cleanup the blueprints so we don't run out of GCP quota. * Create cleanup_blueprints.py to cleanup auto_deployed blueprints. * Don't put this code in cleanup_ci.py because we want to be able to use fire and possibly python3 (not sure code in cleanup_ci is python3 compatible) * Create a CLI create_context.py to create K8s config contexts. This will be used to get credentials to talk to the cleanup cluster when running on K8s. * Create a Tekton task to run the cleanup script. This is intended as a replacement for our existing K8s job (kubeflow#654). There's a couple reasons to start using Tekton i) We are already using Tekton as part of AutoDeploy infrastructure. ii) We can leverage Tekton to handle git checkouts. iii) Tekton makes it easy to additional steps to do things like create the context. * This is a partial solution. This PR contains a Tekton pipeline that is only running cleanup for the blueprints. * To do all cleanup using Tekton we just need to a step or Task to run the existing cleanup-ci script. The only issue I forsee is that the Tekton pipeline runs in the kf-ci-v1 cluster and will need to be granted access to the kubeflow-testing cluster so we can cleanup Argo workflows in that cluster. * To run the Tekton pipeline regulary we create a cronjob that runs kubectl apply. * cnrm_clients.py is a quick hack to create a wrapper to make it easier to work with CNRM custom resources.
* Create a blueprint reconciler to autodeploy and reconcile blueprints. * The reconciler decides whether we need to deploy a new blueprint and if it does it creates a Tekton PipelineRun to deploy Kubeflow. * Here are some differences in how we are deploying blueprints vs. kfctl deployments * We are using Tekton PipelineRuns as opposed to K8s jobs to do the deployment * We no longer use deployments.yaml to describe the group of deployments. Instead we just create a PipelineRun.yaml and that provides all the information the reconciler needs e.g. the branch to watch for changes. * Update the flask app to provide information about blueprints. * Include a link to the Tekton dashboard showing the PipelineRun that deployed Kubeflow. * Define a Pipeline to deploy Kubeflow so we don't have to inline the spec in the PipelienRun. * Remove Dockerfile.skaffold; we can use skaffold auto-sync in developer mode. Add a column in the webserver to redirect to the Tekton dashboard for the PipelineRun that deployed it. * GoogleCloudPlatform/kubeflow-distribution#5 Setup autodeploy for gcp blueprints.
* Create a blueprint reconciler to autodeploy and reconcile blueprints. * The reconciler decides whether we need to deploy a new blueprint and if it does it creates a Tekton PipelineRun to deploy Kubeflow. * Here are some differences in how we are deploying blueprints vs. kfctl deployments * We are using Tekton PipelineRuns as opposed to K8s jobs to do the deployment * We no longer use deployments.yaml to describe the group of deployments. Instead we just create a PipelineRun.yaml and that provides all the information the reconciler needs e.g. the branch to watch for changes. * Update the flask app to provide information about blueprints. * Include a link to the Tekton dashboard showing the PipelineRun that deployed Kubeflow. * Define a Pipeline to deploy Kubeflow so we don't have to inline the spec in the PipelienRun. * Remove Dockerfile.skaffold; we can use skaffold auto-sync in developer mode. Add a column in the webserver to redirect to the Tekton dashboard for the PipelineRun that deployed it. * GoogleCloudPlatform/kubeflow-distribution#5 Setup autodeploy for gcp blueprints.
* Create a blueprint reconciler to autodeploy and reconcile blueprints. * The reconciler decides whether we need to deploy a new blueprint and if it does it creates a Tekton PipelineRun to deploy Kubeflow. * Here are some differences in how we are deploying blueprints vs. kfctl deployments * We are using Tekton PipelineRuns as opposed to K8s jobs to do the deployment * We no longer use deployments.yaml to describe the group of deployments. Instead we just create a PipelineRun.yaml and that provides all the information the reconciler needs e.g. the branch to watch for changes. * Update the flask app to provide information about blueprints. * Include a link to the Tekton dashboard showing the PipelineRun that deployed Kubeflow. * Define a Pipeline to deploy Kubeflow so we don't have to inline the spec in the PipelienRun. * Remove Dockerfile.skaffold; we can use skaffold auto-sync in developer mode. Add a column in the webserver to redirect to the Tekton dashboard for the PipelineRun that deployed it. * GoogleCloudPlatform/kubeflow-distribution#5 Setup autodeploy for gcp blueprints.
* Create a blueprint reconciler to autodeploy and reconcile blueprints. * The reconciler decides whether we need to deploy a new blueprint and if it does it creates a Tekton PipelineRun to deploy Kubeflow. * Here are some differences in how we are deploying blueprints vs. kfctl deployments * We are using Tekton PipelineRuns as opposed to K8s jobs to do the deployment * We no longer use deployments.yaml to describe the group of deployments. Instead we just create a PipelineRun.yaml and that provides all the information the reconciler needs e.g. the branch to watch for changes. * Update the flask app to provide information about blueprints. * Include a link to the Tekton dashboard showing the PipelineRun that deployed Kubeflow. * Define a Pipeline to deploy Kubeflow so we don't have to inline the spec in the PipelienRun. * Remove Dockerfile.skaffold; we can use skaffold auto-sync in developer mode. Add a column in the webserver to redirect to the Tekton dashboard for the PipelineRun that deployed it. * GoogleCloudPlatform/kubeflow-distribution#5 Setup autodeploy for gcp blueprints.
This was working so I'm closing this issue. |
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
We should setup the auto-deploy infrastructure to autodeploy from blueprints.
This way we ensure that our GCP blueprint is up to date and working.
The text was updated successfully, but these errors were encountered: