Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a management cluster for GCP blueprints #644

Closed
jlewi opened this issue Apr 28, 2020 · 9 comments · Fixed by #645
Closed

Create a management cluster for GCP blueprints #644

jlewi opened this issue Apr 28, 2020 · 9 comments · Fixed by #645

Comments

@jlewi
Copy link
Contributor

jlewi commented Apr 28, 2020

We should create a management cluster to run and deploy GCP blueprints as well as other test infrastructure.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.99

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 28, 2020

i created the cluster

  • cluster: kf-ci-management
  • project: kubeflow-ci
  • location: us-central1 (regional cluster)

We should check in the configs before closing this issue.

jlewi pushed a commit to jlewi/testing that referenced this issue Apr 28, 2020
* Management cluster is a cluster running Cloud Config connector which
  can be used to create GCP resources.

* This PR checks in the config for cluster kf-ci-management.
  We also setup a namespace to administer resources in project
  kubeflow-ci-deployment

Fix kubeflow#644
k8s-ci-robot pushed a commit that referenced this issue Apr 29, 2020
…645)

* Management cluster is a cluster running Cloud Config connector which
  can be used to create GCP resources.

* This PR checks in the config for cluster kf-ci-management.
  We also setup a namespace to administer resources in project
  kubeflow-ci-deployment

Fix #644
@jlewi
Copy link
Contributor Author

jlewi commented May 5, 2020

There were a couple bugs in the permission setup for CNRM that will be fixed in subsequent PR.

Also one of the problems we ran into was that the service account we used with CNRM lives in project kubeflow-ci-deployment and ended up getting GC'd.

To fix this We can switch to using the kubeflow-testing service account.

Also per #650 I put the project "kubeflow-ci-deployment" into a subfolder. We can use the folder to grant "kubeflow-testing" permission on project "kubeflow-ci-deployment" so that we don't have to worry about it being GCD.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/engprod 0.78

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi jlewi removed the kind/feature label May 5, 2020
jlewi pushed a commit to jlewi/testing that referenced this issue May 5, 2020
* Create a simple script to deploy Kubeflow using the GCP blueprint.
  This is basically just a wrapper around make commands.

  * This is the first step in setting up auto deployments of the GCP
    blueprint for CI purposes.

* Fix some bugs in the management cluster that popped up while testing
  the blueprint

  * Fix CNRM install for the kubeflow-ci-deployment namespace.
    CNRM wasn't properly configured to administer that namespace.
   The appropriate role bindings weren't being created in the correct
  namespaces and the statefulset was using the host project and not the
  managed project.

  * See kubeflow#644 for reference on the management cluster settup

  * We should use the kubeflow-testing@kubeflow-ci service account
    and not a service account owned by project kubeflow-ci-deployment
    as the latter is being GC'd by our cleanup ci scripts which breaks
    the management cluster.

  * Also per kubeflow#644 permissions are now set at the folder level to
    prevent the permissions from being GC'd.
k8s-ci-robot pushed a commit that referenced this issue May 5, 2020
)

* Create a simple script to deploy Kubeflow using the GCP blueprint.
  This is basically just a wrapper around make commands.

  * This is the first step in setting up auto deployments of the GCP
    blueprint for CI purposes.

* Fix some bugs in the management cluster that popped up while testing
  the blueprint

  * Fix CNRM install for the kubeflow-ci-deployment namespace.
    CNRM wasn't properly configured to administer that namespace.
   The appropriate role bindings weren't being created in the correct
  namespaces and the statefulset was using the host project and not the
  managed project.

  * See #644 for reference on the management cluster settup

  * We should use the kubeflow-testing@kubeflow-ci service account
    and not a service account owned by project kubeflow-ci-deployment
    as the latter is being GC'd by our cleanup ci scripts which breaks
    the management cluster.

  * Also per #644 permissions are now set at the folder level to
    prevent the permissions from being GC'd.
jlewi pushed a commit to jlewi/testing that referenced this issue May 15, 2020
* We want to install ACM on the kf-ci-management cluster in project
  kubeflow-ci so that we can start using GitOps to manage CI infrastructure.

* Related to kubeflow#644
jlewi pushed a commit to jlewi/testing that referenced this issue May 15, 2020
* We want to install ACM on the kf-ci-management cluster in project
  kubeflow-ci so that we can start using GitOps to manage CI infrastructure.

* Related to kubeflow#644

* Remove status from the cleanup ci job. This breaks ACM sync.

* Add a cluster selector to ACM so that we only install the auto-deploy
  namespace on the appropriate cluster.

* Add an annotation to all auto-deploy tasks so we only synchronize them to the appropriate cluster.
@jlewi
Copy link
Contributor Author

jlewi commented May 15, 2020

Related to: GoogleCloudPlatform/kubeflow-distribution#13 I don't think we want to run CNRM in namespace mode as it makes it difficult for the management cluster to administer multiple projects.

Uninstall the namespace specific components.

cd test-infra/management/instance
kubectl --context=kf-ci-management delete -f cnrm-install-kubeflow-ci-deployment/

Create a new service service account to administer ci projects.

gcloud --project=kubeflow-ci iam service-accounts create ci-projects-manager

@jlewi
Copy link
Contributor Author

jlewi commented May 15, 2020

Made ci-projects-manager@kubeflow-ci.iam.gserviceaccount.com an owner of folder ci-projects

@jlewi
Copy link
Contributor Author

jlewi commented May 15, 2020

Grant workload identity.

gcloud --project=kubeflow-ci iam service-accounts add-iam-policy-binding \
ci-projects-manager@kubeflow-ci.iam.gserviceaccount.com \
--member="serviceAccount:kubeflow-ci.svc.id.goog[cnrm-system/cnrm-controller-manager]" \
--role="roles/iam.workloadIdentityUser"

@jlewi
Copy link
Contributor Author

jlewi commented May 15, 2020

Delete the old version of CNRM.

 kustomize build ../../upstream/management/cnrm-install/install-system/ | kubectl --context=kf-ci-management delete -f -

@jlewi
Copy link
Contributor Author

jlewi commented May 15, 2020

Install CNRM 1.9 in workload identity mode

kubectl --context=kf-ci-management apply -f crds.yaml 
kubectl --context=kf-ci-management apply -f 0-cnrm-system.yaml 

k8s-ci-robot pushed a commit that referenced this issue May 20, 2020
* Install ACM on the Kubeflow CI management cluster.

* We want to install ACM on the kf-ci-management cluster in project
  kubeflow-ci so that we can start using GitOps to manage CI infrastructure.

* Related to #644

* Remove status from the cleanup ci job. This breaks ACM sync.

* Add a cluster selector to ACM so that we only install the auto-deploy
  namespace on the appropriate cluster.

* Add an annotation to all auto-deploy tasks so we only synchronize them to the appropriate cluster.

* * configsync directory for management cluster should be located in the management directory.

* * Create namespace issue-label-bot-dev in the kf-ci-management
  cluster. This namespace will be used to administer the issue-label-bot-dev
  project.

* Fix annotations.

* * Add tekton to our ACM repo.

  * We will eventually want to use ACM to manage our Tekton installs.
  * On our CI managment cluster we currently don't need tekton. However
    nomos is giving us sync errors because the Tekton CRDs don't exist.

* The CI management cluster shouldn't use KCC in namespace mode.

* Using namespace mode is annoying when creating new projects
  because we need to setup a new service account for every project.

* Much simpler to create a single service account with permission
  to administer a folder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant