Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audit followup: k8s-staging-* projects should not have misc services enabled #1675

Closed
spiffxp opened this issue Feb 17, 2021 · 19 comments
Closed
Assignees
Labels
area/audit Audit of project resources, audit followup issues, code in audit/ area/release-eng Issues or PRs related to the Release Engineering subproject kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/release Categorizes an issue or PR as relevant to SIG Release.
Milestone

Comments

@spiffxp
Copy link
Member

spiffxp commented Feb 17, 2021

ref: cf5f5e3 and #1534 (comment)

Staging projects run GCB, they shouldn't need compute, bigquery, iam, monitoring, oslogin, etc...

We should consider explicitly disabling:

  • compute
  • dns
  • bigquery
  • etc.

We could go bigger and explicitly disable anything not intended to be enabled.

How/when to do this? I see a few options:

  • A) run this script prior to an audit (but now the job running audit needs a lot more permissions)
  • B) make this part of ensure-staging-storage.sh (so humans will cleanup when they run the script)
  • C) run ensure-staging-storage.sh periodically (means setting up an SA to do this, and deciding we're ready to trust CI runs of untested bash)

I prefer option B, and then option C. And not "going big" until we've gained confidence that we're not going to foot-gun ourselves.

Would also be curious what a terraform or crossplane approach to "ensure only these services are enabled and no others" looks like.

/wg k8s-infra
/sig release
/area release-eng
/priority important-soon
/milestone v1.21

@k8s-ci-robot k8s-ci-robot added wg/k8s-infra sig/release Categorizes an issue or PR as relevant to SIG Release. labels Feb 17, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Feb 17, 2021
@k8s-ci-robot k8s-ci-robot added area/release-eng Issues or PRs related to the Release Engineering subproject priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Feb 17, 2021
@spiffxp spiffxp changed the title audit followup: k8s-staging-* projects should not have compute enabled audit followup: k8s-staging-* projects should not have misc services enabled Feb 17, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Feb 18, 2021

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 18, 2021
@ameukam ameukam added the area/audit Audit of project resources, audit followup issues, code in audit/ label Mar 25, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Apr 6, 2021

/assign
#1859 implements option B (make this part of ensure-staging-storage.sh (so humans will cleanup when they run the script)), but I've gated service disablement behind an obnoxiously long environment variable, because the current implementation needs to use gcloud service disable --force

@spiffxp
Copy link
Member Author

spiffxp commented Apr 6, 2021

gathered output related to extra services from #1859 (comment) into https://gist.github.com/spiffxp/1cbf779d7dc1c025a445b91909f55bf7

# curl -s https://gist.githubusercontent.com/spiffxp/1cbf779d7dc1c025a445b91909f55bf7/raw/49971cd6699c4dace4f47fcba8c068c212e0ad2e/k8s-staging-service-disable-plan.yaml | yq -y 'with_entries(.value |= .to_disable)'
k8s-staging-artifact-promoter:
  - compute.googleapis.com
  - oslogin.googleapis.com
k8s-staging-cip-test:
  - bigquery.googleapis.com
  - bigquerystorage.googleapis.com
  - cloudapis.googleapis.com
  - clouddebugger.googleapis.com
  - cloudtrace.googleapis.com
  - compute.googleapis.com
  - datastore.googleapis.com
  - monitoring.googleapis.com
  - oslogin.googleapis.com
  - servicemanagement.googleapis.com
  - serviceusage.googleapis.com
  - sql-component.googleapis.com
k8s-staging-cluster-api:
  - bigquery.googleapis.com
  - bigquerystorage.googleapis.com
  - cloudapis.googleapis.com
  - clouddebugger.googleapis.com
  - cloudtrace.googleapis.com
  - compute.googleapis.com
  - datastore.googleapis.com
  - monitoring.googleapis.com
  - oslogin.googleapis.com
  - servicemanagement.googleapis.com
  - serviceusage.googleapis.com
  - sql-component.googleapis.com
k8s-staging-cluster-api-aws:
  - compute.googleapis.com
  - oslogin.googleapis.com
k8s-staging-capi-openstack:
  - compute.googleapis.com
  - oslogin.googleapis.com
k8s-staging-capi-docker:
  - compute.googleapis.com
  - dns.googleapis.com
  - oslogin.googleapis.com
k8s-staging-coredns:
  - bigquery.googleapis.com
  - bigquerystorage.googleapis.com
  - cloudapis.googleapis.com
  - clouddebugger.googleapis.com
  - cloudtrace.googleapis.com
  - compute.googleapis.com
  - datastore.googleapis.com
  - monitoring.googleapis.com
  - oslogin.googleapis.com
  - servicemanagement.googleapis.com
  - serviceusage.googleapis.com
  - sql-component.googleapis.com
k8s-staging-csi:
  - bigquery.googleapis.com
  - bigquerystorage.googleapis.com
  - cloudapis.googleapis.com
  - clouddebugger.googleapis.com
  - cloudtrace.googleapis.com
  - compute.googleapis.com
  - datastore.googleapis.com
  - iam.googleapis.com
  - iamcredentials.googleapis.com
  - monitoring.googleapis.com
  - oslogin.googleapis.com
  - servicemanagement.googleapis.com
  - serviceusage.googleapis.com
  - sql-component.googleapis.com
k8s-staging-ingress-nginx:
  - monitoring.googleapis.com
  - stackdriver.googleapis.com
k8s-staging-kops:
  - bigquery.googleapis.com
  - bigquerystorage.googleapis.com
  - cloudapis.googleapis.com
  - clouddebugger.googleapis.com
  - cloudtrace.googleapis.com
  - datastore.googleapis.com
  - monitoring.googleapis.com
  - servicemanagement.googleapis.com
  - serviceusage.googleapis.com
  - sql-component.googleapis.com

@spiffxp
Copy link
Member Author

spiffxp commented Apr 6, 2021

OK, the above services have been disabled. See #1859 (comment) for the command I ran. I updated https://gist.github.com/spiffxp/1cbf779d7dc1c025a445b91909f55bf7 with log output

@spiffxp
Copy link
Member Author

spiffxp commented Apr 6, 2021

Since I just changed a bunch of staging project permissions I want to see if I can e-mail folks to give them a heads-up #1878

@spiffxp
Copy link
Member Author

spiffxp commented Apr 8, 2021

I cannot, not without being a member of the groups.

@dims
Copy link
Member

dims commented Apr 8, 2021

ah fun!!

@spiffxp
Copy link
Member Author

spiffxp commented Apr 15, 2021

/milestone v1.22

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.21, v1.22 Apr 15, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Jul 15, 2021

/remove-lifecycle stale
Still relevant

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 15, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Aug 4, 2021

/milestone v1.23
/remove-priority soon
/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Aug 4, 2021
@k8s-ci-robot
Copy link
Contributor

@spiffxp: Those labels are not set on the issue: priority/soon

In response to this:

/milestone v1.23
/remove-priority soon
/priority backlog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.22, v1.23 Aug 4, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Sep 2, 2021

/remove-priority important-soon

@k8s-ci-robot k8s-ci-robot removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 2, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Sep 9, 2021

While provisioning a fresh staging project I took the opportunity to look at what extra services were enabled as dependencies. ref: #2688

@spiffxp
Copy link
Member Author

spiffxp commented Sep 9, 2021

Re-running ensure-staging-storage.sh for all projects to get a fresh look at what differs from expected:

# TODO: this is expected as a special case for this project, how to encode this?
k8s-staging-cluster-api-gcp:
  - compute.googleapis.com
  - oslogin.googleapis.com
# TODO: not so sure about this one
k8s-staging-sig-storage:
  - compute.googleapis.com
  - oslogin.googleapis.com

Most everything has containeranalysis.google.com which needs to be explicitly disabled

@spiffxp
Copy link
Member Author

spiffxp commented Sep 24, 2021

#2813 adds the ability to specify additional special case services for cluster-api-gcp and sig-storage and turns on pruning of unexpected services. Once that merges I'll run for all staging projects and then call this done

@spiffxp
Copy link
Member Author

spiffxp commented Sep 24, 2021

Finished a full run of ensure-staging-storage.sh

This legitimately caught and disabled services that were enabled when I browed the UI for k8s-staging-test-infra:

Configuring staging project: k8s-staging-test-infra
  Ensuring project exists: k8s-staging-test-infra
  Ensuring k8s-infra-staging-test-infra@kubernetes.io are project viewers
  Ensuring necessary enabled services staging project: k8s-staging-test-infra
  plan to enable/disable the following services
  to_enable: []
  to_disable:
    - monitoring.googleapis.com
    - stackdriver.googleapis.com
  Operation "operations/acf.p17-958928310150-ffe33fbb-7121-43d7-8ad5-dc9182cc25b4" finished successfully.
  @@ -12,10 +12,8 @@ enabled:
     - cloudkms.googleapis.com
     - containerregistry.googleapis.com
     - logging.googleapis.com
  -  - monitoring.googleapis.com
     - pubsub.googleapis.com
     - secretmanager.googleapis.com
  -  - stackdriver.googleapis.com
     - storage-api.googleapis.com
     - storage-component.googleapis.com
   expected:
  @@ -28,6 +26,4 @@ expected:
     - storage-api.googleapis.com
     - storage-component.googleapis.com
   to_enable: []
  -to_disable:
  -  - monitoring.googleapis.com
  -  - stackdriver.googleapis.com

@spiffxp
Copy link
Member Author

spiffxp commented Sep 24, 2021

/close

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/audit Audit of project resources, audit followup issues, code in audit/ area/release-eng Issues or PRs related to the Release Engineering subproject kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

5 participants