Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

empower KSA used in the promoter's prow jobs to use Workload Identity #655

Merged
merged 4 commits into from
Mar 11, 2020

Conversation

listx
Copy link
Contributor

@listx listx commented Mar 10, 2020

This gives the "k8s-prow-builds.svc.id.goog[test-pods/k8s-artifacts-prod]"
service account powers to act as the
"k8s-infra-gcr-promoter@k8s-artifacts-prod.iam.gserviceaccount.com" for
purposes of Workload Identity. This way, the Prow jobs for the promoter (which live in the k8s-prow-builds GKE project, and run in the test-pods namespace with the k8s-artifacts-prod KSA)
won't have to have service account keys injected into them.

/cc @thockin @fejta

More context: kubernetes-sigs/promo-tools#180 (comment)

This gives the "k8s-prow-builds.svc.id.googtest-pods/k8s-artifacts-prod"
Prow account powers to act as the
"k8s-infra-gcr-promoter@k8s-artifacts-prod.iam.gserviceaccount.com" for
purposes of Workload Identity. This way, the Prow jobs for the promoter
won't have to have service account keys injected into them.
@k8s-ci-robot k8s-ci-robot requested review from fejta and thockin March 10, 2020 22:34
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. wg/k8s-infra labels Mar 10, 2020
Copy link
Contributor

@fejta fejta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

  1. typo

  2. some wording nits:

  • There are service accounts for a kubernetes cluster and GCP service accounts.
    • There is no such thing as a prow service account (prow just creates pods)
    • Any pod using the kubernetes service account will authenticate as the specified GCP account (not just prow's pods)
  • The binding just authenticates the user, doesn't control authorization
    • AKA the gcloud command authorizes a (cluster, namespace, name) kubernetes service account identity to authenticate as the specified GCP service account to the rest of GCP
    • Whether or not the GCP SA has access to GCR depends on the normal IAM rights for that GCP SA.

@@ -220,4 +220,9 @@ empower_group_to_admin_artifact_auditor \
empower_artifact_auditor "${PROD_PROJECT}"
empower_artifact_auditor_invoker "${PROD_PROJECT}"

# Special case: empower Prow account to have access to GCR for Workload Identity.
empower_prowacct_for_workload_identity \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should be more like:

  • empower_pods_to_identify_as_gcp_service_account
  • bind_k8s_service_account_to_act_as_gcp_service_account

return 1
fi

prow_acct="$1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider separating out these variables into their parts so its more clear, I'd recommend this:

gke_project=$1
namespace=$2
name=$3
gcp_acct=$4
gcp_project=$5

gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:${gke_project}.svc.id.goog[${namespace}/${name}]" \
  "$gcp_acct" --project "$gcp_project"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don;t think we need to go quite this far.

I would position this function as empower_ksa_to_svcacct(project, k8s_svcacct, gcp_svcacct)

@@ -220,4 +220,9 @@ empower_group_to_admin_artifact_auditor \
empower_artifact_auditor "${PROD_PROJECT}"
empower_artifact_auditor_invoker "${PROD_PROJECT}"

# Special case: empower Prow account to have access to GCR for Workload Identity.
empower_prowacct_for_workload_identity \
"k8s-prow-builds.svc.id.googtest-pods/k8s-artifacts-prod" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8s-prow-builds.svc.id.goog[test-pods/k8s-artifacts-prod] (you need the [])

@@ -149,3 +149,22 @@ function empower_svcacct_to_admin_gcr () {

empower_svcacct_to_admin_gcs_bucket "${group}" "${bucket}"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First note - this does not belong in this file. This file just for very low-level GCR-centric primitives. Put this back in lib.sh until that can be broken up more.

prow_acct="$1"
gcp_acct="$2"

gcloud iam service-accounts add-iam-policy-binding \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need --project, don't you?

return 1
fi

prow_acct="$1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don;t think we need to go quite this far.

I would position this function as empower_ksa_to_svcacct(project, k8s_svcacct, gcp_svcacct)

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 11, 2020
@listx
Copy link
Contributor Author

listx commented Mar 11, 2020

@fejta Thanks for the wording clarifications --- hopefully I captured the intent more accurately. But please take another look!

@dims
Copy link
Member

dims commented Mar 11, 2020

/assign @thockin @justaugustus

@thockin
Copy link
Member

thockin commented Mar 11, 2020

@fejta This says that "test-pods/k8s-artifacts-prod" is the namespace/sa right? That implies that other pods run is the same namespace ("test-pods"), is that right? If so, can you compare this to running in a dedicated namespace that doesn't risk cross-pod leaks of the service account?

infra/gcp/lib.sh Outdated

gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${gke_project}.svc.id.goog[${k8s_namespace}/${gcp_project}]" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than construct the IDNS this way, I'd rather we pass it in fully formed. This ensures that, should that format ever change or extend, we don't need to change this function or duplicate it or anything.

Also I think the assumption that there's one SA per project is dubious - why couldn't we have 2 GSAs in k8s-artifacts-prod that were each used by different prow jobs? This conflates the name of the KSA in the GKE cluster with the name of the project the GSA exists in. It's fine to name them the same, but don't assume they are ALWAYS named the same.

IOW, change args 2 and 3 into a single string, with the documented example "${gke_project}.svc.id.goog[${k8s_namespace}/${k8s_svcacct}]"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An existing example where the service account name in the GKE cluster (k8s-kops-test) does not match the GCP project name (kubernetes-jenkins-pull).

An existing example where we are using mulitple GCP service accounts (resultstore, pusher, deployer) in the same project (k8s-prow).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think the assumption that there's one SA per project is dubious - why couldn't we have 2 GSAs in k8s-artifacts-prod that were each used by different prow jobs? This conflates the name of the KSA in the GKE cluster with the name of the project the GSA exists in. It's fine to name them the same, but don't assume they are ALWAYS named the same.

As I was writing the code, I felt that this assumption was on shaky ground. Thanks for the guidance!

infra/gcp/lib.sh Outdated
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${gke_project}.svc.id.goog[${k8s_namespace}/${gcp_project}]" \
"${gcp_svcacct}" \
--project="${project}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcp_project

infra/gcp/lib.sh Outdated

gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${gke_project}.svc.id.goog[${k8s_namespace}/${gcp_project}]" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An existing example where the service account name in the GKE cluster (k8s-kops-test) does not match the GCP project name (kubernetes-jenkins-pull).

An existing example where we are using mulitple GCP service accounts (resultstore, pusher, deployer) in the same project (k8s-prow).

@fejta
Copy link
Contributor

fejta commented Mar 11, 2020

This says that "test-pods/k8s-artifacts-prod" is the namespace/sa right? That implies that other pods run is the same namespace ("test-pods"), is that right?

Yes and yes (although we could use unit tests to mitigate against configuring jobs to use this SA).

We already do this mitigation for trusted jobs (which should have excluded these jobs from running there, but we updated the unit test to whitelist them).

If so, can you compare this to running in a dedicated namespace that doesn't risk cross-pod leaks of the service account?

Can we use the new #wg-k8s-infra scripting to schedule these pods in a community owned cluster somewhere? It is easy to configure prow to schedule pods in any conformant k8s cluster.

Yes, my request has been from the start (and continues to be) that we run these jobs outside of prow's trusted context, ideally in their own GKE cluster.

The trusted context is for jobs which prow fully trusts, which should not include these jobs: the core prow oncall/maintainer team does not own/review/maintain the code behind these jobs.

These belong in a different trust domain, which means in a different context than the prow control plane, ideally their own cluster.

In particular, pass in the serviceAccount argument as a single string.
@listx
Copy link
Contributor Author

listx commented Mar 11, 2020

Updated... PTAL, esp. how I expanded the comments.

@listx listx changed the title empower prow build acct to use Workload Identity empower KSA used in the promoter's prow jobs to use Workload Identity Mar 11, 2020
@thockin
Copy link
Member

thockin commented Mar 11, 2020

@fejta yes and no. In theory yes. In practice the apiserver IP is firewalled off and so we'd have to think about how to actually do it.

Copy link
Contributor

@fejta fejta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold

LGTM from my end, /hold cancel when you're ready for it to merge.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 11, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 11, 2020
@fejta
Copy link
Contributor

fejta commented Mar 11, 2020

In practice the apiserver IP is firewalled off

Interesting. Which API server? How are things deployed/upgraded/etc when kubectl cannot talk to it?

@thockin
Copy link
Member

thockin commented Mar 11, 2020 via email

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean I can revoke the JSON key we previously issued this SA?

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fejta, listx, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 11, 2020
@listx
Copy link
Contributor Author

listx commented Mar 11, 2020

I think we have to actuate this first, then un-revert kubernetes/test-infra#16463 and see if the ci-k8sio-cip job (or post-k8sio-cip job) still works. Once all the jobs have migrated to WI, then the JSON keys can be revoked.

I think we need to set aside a day to test all of the above; maybe next week? Busy today and most likely tomorrow with kubernetes-sigs/promo-tools#191

@thockin
Copy link
Member

thockin commented Mar 11, 2020

file a followup bug? I know we have other things to finish (auditor -> alerts :)

@thockin thockin removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 11, 2020
@k8s-ci-robot k8s-ci-robot merged commit 9e3204a into kubernetes:master Mar 11, 2020
@listx
Copy link
Contributor Author

listx commented Mar 11, 2020

file a followup bug? I know we have other things to finish (auditor -> alerts :)

kubernetes-sigs/promo-tools#180

@thockin
Copy link
Member

thockin commented Mar 11, 2020

applied

listx pushed a commit to listx/test-infra that referenced this pull request Mar 20, 2020
Ever since kubernetes/k8s.io#655, this job
should already run with an authenticated GCP SA (before the job's code
even runs). This means there is no need to use the keyfile to
authenticate (again), as it is redundant.

We still keep the `-use-service-account` flag for now because the
promoter currently still tries to extract OAUTH tokens as the GCP
account at the very beginning of its run. It's a nice sanity check.

Once we confirm that the OAUTH tokens do still get populated in the
business logic of the job, we don't need to keep this flag any more
because the GCRs that the promoter deals with today are all public (the
promoter was designed to deal with private repos), and so the very
first "read GCR" API calls do not need the OAUTH tokens for just reading
the repos. That takes care of the GCR read API calls.

As for the GCR write calls, the gcrane.Copy() method that perform the
GCR writes do their own authentication dance, but that is separate and
automatic as long as the GCP account is authenticated before that
function call.

This is why the `-use-service-account` flag is marked for future
deprecation in this commit.
listx pushed a commit to listx/test-infra that referenced this pull request Mar 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants