-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
empower KSA used in the promoter's prow jobs to use Workload Identity #655
Conversation
This gives the "k8s-prow-builds.svc.id.googtest-pods/k8s-artifacts-prod" Prow account powers to act as the "k8s-infra-gcr-promoter@k8s-artifacts-prod.iam.gserviceaccount.com" for purposes of Workload Identity. This way, the Prow jobs for the promoter won't have to have service account keys injected into them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
-
typo
-
some wording nits:
- There are service accounts for a kubernetes cluster and GCP service accounts.
- There is no such thing as a prow service account (prow just creates pods)
- Any pod using the kubernetes service account will authenticate as the specified GCP account (not just prow's pods)
- The binding just authenticates the user, doesn't control authorization
- AKA the gcloud command authorizes a (cluster, namespace, name) kubernetes service account identity to authenticate as the specified GCP service account to the rest of GCP
- Whether or not the GCP SA has access to GCR depends on the normal IAM rights for that GCP SA.
infra/gcp/ensure-prod-storage.sh
Outdated
@@ -220,4 +220,9 @@ empower_group_to_admin_artifact_auditor \ | |||
empower_artifact_auditor "${PROD_PROJECT}" | |||
empower_artifact_auditor_invoker "${PROD_PROJECT}" | |||
|
|||
# Special case: empower Prow account to have access to GCR for Workload Identity. | |||
empower_prowacct_for_workload_identity \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should be more like:
empower_pods_to_identify_as_gcp_service_account
bind_k8s_service_account_to_act_as_gcp_service_account
infra/gcp/lib_gcr.sh
Outdated
return 1 | ||
fi | ||
|
||
prow_acct="$1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider separating out these variables into their parts so its more clear, I'd recommend this:
gke_project=$1
namespace=$2
name=$3
gcp_acct=$4
gcp_project=$5
gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:${gke_project}.svc.id.goog[${namespace}/${name}]" \
"$gcp_acct" --project "$gcp_project"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don;t think we need to go quite this far.
I would position this function as empower_ksa_to_svcacct(project, k8s_svcacct, gcp_svcacct)
infra/gcp/ensure-prod-storage.sh
Outdated
@@ -220,4 +220,9 @@ empower_group_to_admin_artifact_auditor \ | |||
empower_artifact_auditor "${PROD_PROJECT}" | |||
empower_artifact_auditor_invoker "${PROD_PROJECT}" | |||
|
|||
# Special case: empower Prow account to have access to GCR for Workload Identity. | |||
empower_prowacct_for_workload_identity \ | |||
"k8s-prow-builds.svc.id.googtest-pods/k8s-artifacts-prod" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k8s-prow-builds.svc.id.goog[test-pods/k8s-artifacts-prod]
(you need the []
)
infra/gcp/lib_gcr.sh
Outdated
@@ -149,3 +149,22 @@ function empower_svcacct_to_admin_gcr () { | |||
|
|||
empower_svcacct_to_admin_gcs_bucket "${group}" "${bucket}" | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First note - this does not belong in this file. This file just for very low-level GCR-centric primitives. Put this back in lib.sh until that can be broken up more.
infra/gcp/lib_gcr.sh
Outdated
prow_acct="$1" | ||
gcp_acct="$2" | ||
|
||
gcloud iam service-accounts add-iam-policy-binding \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need --project, don't you?
infra/gcp/lib_gcr.sh
Outdated
return 1 | ||
fi | ||
|
||
prow_acct="$1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don;t think we need to go quite this far.
I would position this function as empower_ksa_to_svcacct(project, k8s_svcacct, gcp_svcacct)
@fejta Thanks for the wording clarifications --- hopefully I captured the intent more accurately. But please take another look! |
/assign @thockin @justaugustus |
@fejta This says that "test-pods/k8s-artifacts-prod" is the namespace/sa right? That implies that other pods run is the same namespace ("test-pods"), is that right? If so, can you compare this to running in a dedicated namespace that doesn't risk cross-pod leaks of the service account? |
infra/gcp/lib.sh
Outdated
|
||
gcloud iam service-accounts add-iam-policy-binding \ | ||
--role roles/iam.workloadIdentityUser \ | ||
--member "serviceAccount:${gke_project}.svc.id.goog[${k8s_namespace}/${gcp_project}]" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than construct the IDNS this way, I'd rather we pass it in fully formed. This ensures that, should that format ever change or extend, we don't need to change this function or duplicate it or anything.
Also I think the assumption that there's one SA per project is dubious - why couldn't we have 2 GSAs in k8s-artifacts-prod that were each used by different prow jobs? This conflates the name of the KSA in the GKE cluster with the name of the project the GSA exists in. It's fine to name them the same, but don't assume they are ALWAYS named the same.
IOW, change args 2 and 3 into a single string, with the documented example "${gke_project}.svc.id.goog[${k8s_namespace}/${k8s_svcacct}]"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An existing example where the service account name in the GKE cluster (k8s-kops-test
) does not match the GCP project name (kubernetes-jenkins-pull
).
An existing example where we are using mulitple GCP service accounts (resultstore
, pusher
, deployer
) in the same project (k8s-prow
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think the assumption that there's one SA per project is dubious - why couldn't we have 2 GSAs in k8s-artifacts-prod that were each used by different prow jobs? This conflates the name of the KSA in the GKE cluster with the name of the project the GSA exists in. It's fine to name them the same, but don't assume they are ALWAYS named the same.
As I was writing the code, I felt that this assumption was on shaky ground. Thanks for the guidance!
infra/gcp/lib.sh
Outdated
--role roles/iam.workloadIdentityUser \ | ||
--member "serviceAccount:${gke_project}.svc.id.goog[${k8s_namespace}/${gcp_project}]" \ | ||
"${gcp_svcacct}" \ | ||
--project="${project}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcp_project
infra/gcp/lib.sh
Outdated
|
||
gcloud iam service-accounts add-iam-policy-binding \ | ||
--role roles/iam.workloadIdentityUser \ | ||
--member "serviceAccount:${gke_project}.svc.id.goog[${k8s_namespace}/${gcp_project}]" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An existing example where the service account name in the GKE cluster (k8s-kops-test
) does not match the GCP project name (kubernetes-jenkins-pull
).
An existing example where we are using mulitple GCP service accounts (resultstore
, pusher
, deployer
) in the same project (k8s-prow
).
Yes and yes (although we could use unit tests to mitigate against configuring jobs to use this SA). We already do this mitigation for trusted jobs (which should have excluded these jobs from running there, but we updated the unit test to whitelist them).
Can we use the new #wg-k8s-infra scripting to schedule these pods in a community owned cluster somewhere? It is easy to configure prow to schedule pods in any conformant k8s cluster. Yes, my request has been from the start (and continues to be) that we run these jobs outside of prow's trusted context, ideally in their own GKE cluster. The trusted context is for jobs which prow fully trusts, which should not include these jobs: the core prow oncall/maintainer team does not own/review/maintain the code behind these jobs. These belong in a different trust domain, which means in a different context than the prow control plane, ideally their own cluster. |
In particular, pass in the serviceAccount argument as a single string.
Updated... PTAL, esp. how I expanded the comments. |
@fejta yes and no. In theory yes. In practice the apiserver IP is firewalled off and so we'd have to think about how to actually do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold
LGTM from my end, /hold cancel
when you're ready for it to merge.
Interesting. Which API server? How are things deployed/upgraded/etc when |
Cloud shell is how we manage it.
trying to follow "most secure" practices.
…On Wed, Mar 11, 2020 at 2:41 PM Erick Fejta ***@***.***> wrote:
In practice the apiserver IP is firewalled off
Interesting. Which API server? How are things deployed/upgraded/etc when
kubectl cannot talk to it?
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#655 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVBUF5BRMLEIBT34DDDRHAAQBANCNFSM4LFJJMRQ>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean I can revoke the JSON key we previously issued this SA?
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fejta, listx, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I think we have to actuate this first, then un-revert kubernetes/test-infra#16463 and see if the ci-k8sio-cip job (or post-k8sio-cip job) still works. Once all the jobs have migrated to WI, then the JSON keys can be revoked. I think we need to set aside a day to test all of the above; maybe next week? Busy today and most likely tomorrow with kubernetes-sigs/promo-tools#191 |
file a followup bug? I know we have other things to finish (auditor -> alerts :) |
|
applied |
Ever since kubernetes/k8s.io#655, this job should already run with an authenticated GCP SA (before the job's code even runs). This means there is no need to use the keyfile to authenticate (again), as it is redundant. We still keep the `-use-service-account` flag for now because the promoter currently still tries to extract OAUTH tokens as the GCP account at the very beginning of its run. It's a nice sanity check. Once we confirm that the OAUTH tokens do still get populated in the business logic of the job, we don't need to keep this flag any more because the GCRs that the promoter deals with today are all public (the promoter was designed to deal with private repos), and so the very first "read GCR" API calls do not need the OAUTH tokens for just reading the repos. That takes care of the GCR read API calls. As for the GCR write calls, the gcrane.Copy() method that perform the GCR writes do their own authentication dance, but that is separate and automatic as long as the GCP account is authenticated before that function call. This is why the `-use-service-account` flag is marked for future deprecation in this commit.
This is a follow-up to kubernetes/k8s.io#655 and kubernetes#16883.
This gives the "k8s-prow-builds.svc.id.goog[test-pods/k8s-artifacts-prod]"
service account powers to act as the
"k8s-infra-gcr-promoter@k8s-artifacts-prod.iam.gserviceaccount.com" for
purposes of Workload Identity. This way, the Prow jobs for the promoter (which live in the
k8s-prow-builds
GKE project, and run in thetest-pods
namespace with thek8s-artifacts-prod
KSA)won't have to have service account keys injected into them.
/cc @thockin @fejta
More context: kubernetes-sigs/promo-tools#180 (comment)