Sunset for k8s.gcr.io repository #4872

dims · 2023-03-04T19:40:40Z

Here are the community blogs and announcements so far around k8s.gcr.io

However we are finding out that the numbers don't add up and we will end up using all the budget we have as our GCP cloud credits well before Dec 31, 2023. So we need to do something more drastic than just the freeze. Please see the thread in #sig-k8s-infra:
https://kubernetes.slack.com/archives/CCK68P2Q2/p1677793138667629?thread_ts=1677709804.935919&cid=CCK68P2Q2

We will need to start by enumerating some of images that carry the biggest cost (storage+network) and removing them from k8s.gcr.io right away (possibly by freeze date - April 3rd). Some data is in the thread, but we will need to revisit the logs and come up with a clear set of images based on some criteria, announce their deletion as well. Note that these specific set of images will still be available in the new registry registry.k8s.io So folks will have to fix their kubernetes manifests / helm charts etc as we mentioned in the 3 urls above.

Thought about deadline for deletion of k8s.gcr.io:
Since the freeze is on April 3rd 2023 (10 days before 1.27 is released) and we expect to send comms out at kubecon EU ( 18 – 21 APRIL ). How about we put the marker on end of June? (So we get 6 months of cost savings on the costs)

Risk: We will end up interrupting clusters that are working right now. Specifically given the traffic patterns, a bunch of these will be in AWS, but is very likely to be anyone who has an older working cluster that they haven't touched in a while.

What i have enumerated above is just the beginning of the discussion. Please feel free to add your thought below, so we can then draft a KEP around it.

The text was updated successfully, but these errors were encountered:

dims · 2023-03-04T19:41:38Z

cc @kubernetes/sig-architecture-leads @kubernetes/sig-release-leads

enj · 2023-03-04T23:17:51Z

@dims can we start by having brownouts of the old registry (they should start immediately)?

sftim · 2023-03-05T00:00:50Z

Let's aim to very clearly communicate a recommended approach (eg: mirror the images that you depend on, or use a pull through cache, or...) and consider the lead time on those comms when we pick a date.

The comms plan does not have to be perfect, it just has to be good enough.

dims · 2023-03-05T00:58:21Z

@sftim agree. Recommended approach, so far:

Option Add reviewable.k8s.io #1 : Please copy images to your own registry and use it from there
Option Add k8s.io Nginx configuration. #2 : If you are not able to do Option Add reviewable.k8s.io #1, then use registry.k8s.io instead of k8s.gcr.io

dims · 2023-03-05T01:00:48Z

@dims can we start by having brownouts of the old registry (they should start immediately)?

@enj yep, agree. The brownout we had in mind was as Arnaud mentioned here:
https://kubernetes.slack.com/archives/CCK68P2Q2/p1677793564552829?thread_ts=1677709804.935919&cid=CCK68P2Q2

enj · 2023-03-05T01:12:32Z

@enj yep, agree. The brownout we had in mind was as Arnaud mentioned here: https://kubernetes.slack.com/archives/CCK68P2Q2/p1677793564552829?thread_ts=1677709804.935919&cid=CCK68P2Q2

@dims I suppose deleting images is one form of brownout... I was more thinking that we have the old registry return 429 errors every day at noon for a few hours. The transient service disruption will get people's attention.

dims · 2023-03-05T01:31:45Z

@enj k8s.gcr.io is GCR based and has only a few folks left to take care of it. Last year some helpful folks tried to setup redirects (automatic from k8s.gcr.io to registry.k8s.io) in a small portion and ran into snags, so we can't do much over there other than delete images.

Details are in this thread: https://kubernetes.slack.com/archives/CCK68P2Q2/p1666725317568709

enj · 2023-03-05T01:47:58Z

@dims makes sense. One suggestion that may also not be implementable would be to temporarily delete and then recreate image tags to cause pull failures (another form of brownout).

dims · 2023-03-05T22:28:49Z

Year to date GCP Billing data, please see here:
GCP_Billing_Report-year-to-date.pdf

($682,683.81 year-to-date / 62 days from Jan 1 to March 4) * 365 = $4,019,025.65 (our budget/credits is $3m )

sftim · 2023-03-06T09:16:22Z

(edited)

One option we have is to actually delete some images - and then optionally reinstate them per #4872 (comment). A 429 is subject to Google's say-so, but deleting an image is something we can Just Do™. So long as the comms are in place to explain why.

dims · 2023-03-06T14:30:55Z

@sftim yes, we will have a list of limited set of images that we will delete ASAP! (and will NOT reinstate them). @hh and folks are coming up with the high traffic / costly image list as the first step. Our comms will depend on what's in that list.

dims · 2023-03-06T15:54:16Z

xref: #4738

dims · 2023-03-06T17:31:49Z

An energetic discussion with @thockin here https://kubernetes.slack.com/archives/CCK68P2Q2/p1678118252030639

Weighing risk of project shutdown vs k8s.gcr.io shutdown
Input of GCR customers in what happens here
Risks of breaking old versions of k8s clusters that seem to contribute highly to the cost
What is the right amount of push needed for us to influence change
Will we ever be able to get rid of this old repository?
Are we able to say as a project that folks who run their clusters in production should have their own repositories?

BenTheElder · 2023-03-06T20:45:48Z

I think we can do broad brownouts ahead of any final sunset by toggling the access controls on the 3 backing GCR instances. To make the images public read we set the backing GCS bucket to have read permission for allUsers, we could probably invert that and put it back on a schedule to gradually increase the period of total non-availability.

Doing this is a big deal, and I'm not sure what the time frame should be. We know that users are very slow to migrate, and that doing this will disrupt their base ""cloud-native"" infrastructure. (E.G. I saw some recent data that Kubernetes 1.11 from 2018 is still reasonably popular (!))

dims · 2023-03-07T20:11:40Z

Some data from @justinsb:

dims · 2023-03-07T20:26:04Z

Some good discussion with @TheFoxAtWork here: https://cloud-native.slack.com/archives/CSCPTLTPE/p1678219030800149 on #tag-chairs channel on CNCF slack

This will likely break a lot of clusters and organizations, but it is certainly a good wake up call to the world that even open source has its costs. I know this is drastic, but we’ve broken the internet before, this one at least is more well coordinated with plenty of advance warnings. We can’t go to everyone personally, so we do our best with the time and energy we have available to us as open source volunteers and community members. Side note, eliminating older versions and forcing upgrades is a huge global security uplift.

I would also recommend (though this is likely already done) to work with the Ambassadors, Marketing Team, and other Foundations.

TheFoxAtWork · 2023-03-07T21:41:00Z

@dims i want to confirm what i'm looking at from the chart (i understand there is a new one in the works), can you confirm that each colored bar is who/what is primarily requesting the images? If this is the case, has AWS/Amazon been engaged to redirect requests they field to registry.k8s.io ? have we done this with other cloud providers? ( i know i'm late to the party trying to understand what has already been completed)

chris-short · 2023-03-07T22:09:16Z

@dims @rothgar and I are engaging folks on the AWS side.

dims · 2023-03-08T01:41:55Z

@TheFoxAtWork yep, there has been a bunch of back and forth.

chris-short · 2023-03-08T13:29:18Z

Has anyone pinged Microsoft? I don't know where Azure stands at the moment.

dims · 2023-03-08T14:29:55Z

A single line kubectl command to find images from the old registry:

https://kubernetes.io/blog/2023/02/06/k8s-gcr-io-freeze-announcement/#what-s-next

A Kyverno and Gatekeeper policy to help folks!

https://github.com/aws/aws-eks-best-practices/tree/master/policies/k8s-registry-deprecation

A kubectl/krew plugin:

https://github.com/kubernetes-sigs/community-images#kubectl-community-images

dims · 2023-03-08T14:38:37Z

FAQ(s) we are getting asked:

why can't you just redirect k8s.gcr.io to registry.k8s.io?
Can we just reroute the traffic using DNS and take over the k8s.gcr.io domain name on the new hosting provider?

TheFoxAtWork · 2023-03-08T14:52:55Z

attempted to pull a lot of the details from this ticket into a single LinkedIn post for sharing in case it helps: https://www.linkedin.com/posts/themoxiefox_action-required-update-references-from-activity-7039245748525256704-IrES

dims · 2023-03-08T18:55:19Z

Some good news from @BenTheElder here - https://kubernetes.slack.com/archives/CCK68P2Q2/p1678299674725429

chris-short · 2023-03-08T19:23:24Z

AWS just posted a bulletin in its StackOverflow: https://stackoverflow.com/collectives/aws/bulletins/75676424/important-kubernetes-registry-changes

chris-short · 2023-03-08T19:29:08Z

I chatted with @jeremyrickard at Microsoft. They are all over this.

dims · 2023-03-09T03:30:03Z

Question: when the new k8s.gcr.io->registry.k8s.io redirection takes effect, what is likely to fail?

folks using old versions of kubelet/docker/containerd are likely to see problems. newer kubelet/containerd have better back off and retry in place so they will fare better
folks with non-typical network configurations needing explicit whitelist of url(s) etc are likely to get hit by the http redirection (to the new s3 buckets)

thockin · 2023-03-09T03:38:07Z

Touching on the topic of network level firewalls or other things causing impact:

This is fairly easily tested - run a pod which uses a "registry.k8s.io" image in your cluster(s). If it is able to pull that image, you're almost certainly OK. If not, debug now before the redirect goes live (next week, we hope).

recollir · 2023-03-09T04:14:29Z

How will the redirect work? Just on DNS level? I have tried this locally myself, but containerd/Docker, obviously and for the right reasons, complains about certificate mismatch between k8s.gcr.io and registry.k8s.io. I solved it then by downloading the ca.crt and installing it locally for containerd/Docker.

tuapuikia · 2023-03-09T04:37:00Z

Some good news from @BenTheElder here - https://kubernetes.slack.com/archives/CCK68P2Q2/p1678299674725429

Do we have enough bandwidth on registry.k8s.io ?

BenTheElder · 2023-03-09T05:17:11Z

How will the redirect work? Just on DNS level? I have tried this locally myself, but containerd/Docker, obviously and for the right reasons, complains about certificate mismatch between k8s.gcr.io and registry.k8s.io. I solved it then by downloading the ca.crt and installing it locally for containerd/Docker.

HTTP 3XX redirect, not DNS. No cert changes.

You can test by taking any image you would pull and substituting registry.k8s.io instead of k8s.gcr.io. All images in k8s.gcr.io are in registry.k8s.io.

The only difference between doing this test and the redirect will be your client reaching k8s.gcr.io first and then following the redirect, but presumably k8s.gcr.io was already reachable for you if you're switching, and all production-grade registry clients follow HTTP redirects.

The same existing GCR endpoint will serve the redirect instead of the usual response. Existing GCR image pulls already involve redirects to backing storage, just not redirects to registry.k8s.io

Do we have enough bandwidth on registry.k8s.io ?

We should have more than enough capacity on https://registry.k8s.io, we've looked at traffic levels for k8s.gcr.io and planned accordingly.
We aren't hitting bandwidth limits on GCR either, just impractical cost of serving ever-increasing cross-cloud bandwidth.

registry.k8s.io gives us the ability to offload bandwidth-intensive image layer serving to additional hosts securely.
We're doing that on GCP (Artifact Registry, Cloud Run) and now AWS (S3) thanks to additional funding from Amazon and we will be serving substantially less expensive egress traffic. In the future it might include additional hosts / sponsors (https://registry.k8s.io#stability).

Just serving AWS traffic (which is the majority) from region-local AWS storage should bring us back within our budgets.

We have a lot more context in the docs (https://registry.k8s.io) and this talk https://www.youtube.com/watch?v=9CdzisDQkjE

recollir · 2023-03-09T05:30:47Z

@BenTheElder 👍

dims · 2023-03-09T11:47:52Z

Experiment results for redirect k8s.gcr.io->registry.k8s.io last october:
https://kubernetes.slack.com/archives/CCK68P2Q2/p1666725317568709

dims · 2023-03-10T11:42:10Z

xref: kubernetes/website#39887

dims · 2023-03-10T11:45:06Z

this text may get dropped from the blog post being drafted for automatic redirects, so saving it here:

Technical Details

The new registry.k8s.io is a secure blob redirector that allows the Kubernetes project to direct traffic based on request IP to the best possible blob storage for the user. If a user makes a request from an AWS region network and pulls a Kubernetes container image, for example, that user will be automatically redirected to pull an image from the closest S3 bucket image layer store. For the current decision tree, refer to this architecture decision tree [1]. To be clear, the new registry.k8s.io implementation allows the upstream project to host registries on more clouds in the future, not just GCP and AWS, which will increase stability, reduce cost, and accelerate bothspeed downloads and deployments. Please do not rely on the internal implementation details of the new image registry as these can be changed without notice.

Please note the upstream Kubernetes teams are working to provide additional communication, and the situation around how long the old registry remains is still being discussed.

[1]: https://kubernetes.io/blog/2023/02/06/k8s-gcr-io-freeze-announcement/
[2]: https://github.com/kubernetes/registry.k8s.io/blob/main/cmd/archeio/docs/request-handling.md

afbjorklund · 2023-03-10T17:15:29Z

The first step for minikube will be to start adding --image-repository=registry.k8s.io to the old kubeadm commands.

Probably add it to all kubeadm versions before 1.25.0, shouldn't hurt anything if it is already the default registry...

The second step is to retag all the older preloads with the new registry, to work air-gapped (but rather small download)

Some mirrors might still use a "k8s.gcr.io" subdirectory, which is fine, so this change is only for the default registry.

Main issue is that those people who are pulling those older kubernetes releases, also use older versions of minikube.

Or if we invalidate old caches, and have people pull "new" versions of the same images - but with a different name...

~/.minikube/cache/images/amd64 : k8s.gcr.io/pause_3.6 -> registry.k8s.io/pause_3.6

That would be somewhat contra-productive, so trying to "upgrade" those old caches in place (by re-tagging images)

BenTheElder · 2023-03-10T17:30:08Z

kubeadm had the default changed in patch releases back to 1.23 (older releases were not accepting any patches), when we published https://kubernetes.io/blog/2022/11/28/registry-k8s-io-faster-cheaper-ga/

dims · 2023-03-10T21:00:52Z

So on March 20, we'll be turning on redirects for almost everyone from k8s.gcr.io to registry.k8s.io, details here:
https://kubernetes.io/blog/2023/03/10/image-registry-redirect/

So the next question will be, how may folks still be using the underlying content of k8s.gcr.io from other ways:

directly using the underlying storage ( us.artifacts / eu.artifacts / asia.artifacts ?)
folks using k8s.gcr.io folks may still be getting pointing back to the underlying storage

So we'll have to then watch how much savings we get over time. Assuming about a week of roll out starting March 20, we'll get some concrete data a week or so after that ( lets' say April 3rd - monday given we have a saw tooth pattern of usage over the week with lows on saturday and sunday )

k8s-triage-robot · 2023-06-09T11:47:54Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-07-09T12:03:47Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-01-19T14:59:49Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-01-19T14:59:55Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sftim · 2024-01-19T15:37:09Z

/reopen

sftim · 2024-01-19T15:37:15Z

We did this
/close

k8s-ci-robot · 2024-01-19T15:37:16Z

@sftim: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2024-01-19T15:37:22Z

@sftim: Closing this issue.

In response to this:

We did this
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sftim · 2024-01-19T15:37:44Z

(but feel free to reopen if needed)

dims mentioned this issue Mar 6, 2023

[Umbrella Issue] Migrate K8s projects from k8s.gcr.io -> registry.k8s.io #4738

Closed

85 tasks

upodroid mentioned this issue Mar 6, 2023

End-of-life for k8s.gcr.io #4709

Closed

4 tasks

BenTheElder mentioned this issue Mar 9, 2023

Docs updates kubernetes/registry.k8s.io#167

Merged

dims mentioned this issue Mar 10, 2023

Add way to dismiss banners in the K8s website kubernetes/website#39887

Closed

afbjorklund mentioned this issue Mar 10, 2023

Make old k8s releases also use the new registry kubernetes/minikube#16017

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 9, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024

k8s-ci-robot reopened this Jan 19, 2024

k8s-ci-robot closed this as completed Jan 19, 2024

Sunset for k8s.gcr.io repository #4872

Sunset for k8s.gcr.io repository #4872

Comments

dims commented Mar 4, 2023 • edited by mrbobbytables Loading

dims commented Mar 4, 2023

enj commented Mar 4, 2023

sftim commented Mar 5, 2023

dims commented Mar 5, 2023

dims commented Mar 5, 2023

enj commented Mar 5, 2023

dims commented Mar 5, 2023

enj commented Mar 5, 2023

dims commented Mar 5, 2023

sftim commented Mar 6, 2023 • edited Loading

dims commented Mar 6, 2023 • edited Loading

dims commented Mar 6, 2023

dims commented Mar 6, 2023 • edited Loading

BenTheElder commented Mar 6, 2023

dims commented Mar 7, 2023

dims commented Mar 7, 2023

TheFoxAtWork commented Mar 7, 2023

chris-short commented Mar 7, 2023

dims commented Mar 8, 2023

chris-short commented Mar 8, 2023

dims commented Mar 8, 2023

dims commented Mar 8, 2023

TheFoxAtWork commented Mar 8, 2023

dims commented Mar 8, 2023

chris-short commented Mar 8, 2023

chris-short commented Mar 8, 2023

dims commented Mar 9, 2023

thockin commented Mar 9, 2023 • edited Loading

recollir commented Mar 9, 2023

tuapuikia commented Mar 9, 2023

BenTheElder commented Mar 9, 2023

recollir commented Mar 9, 2023

dims commented Mar 9, 2023 • edited Loading

dims commented Mar 10, 2023

dims commented Mar 10, 2023 • edited Loading

afbjorklund commented Mar 10, 2023 • edited Loading

BenTheElder commented Mar 10, 2023

dims commented Mar 10, 2023

k8s-triage-robot commented Jun 9, 2023

k8s-triage-robot commented Jul 9, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-ci-robot commented Jan 19, 2024

sftim commented Jan 19, 2024

sftim commented Jan 19, 2024

k8s-ci-robot commented Jan 19, 2024

k8s-ci-robot commented Jan 19, 2024

sftim commented Jan 19, 2024

dims commented Mar 4, 2023 •

edited by mrbobbytables

Loading

sftim commented Mar 6, 2023 •

edited

Loading

dims commented Mar 6, 2023 •

edited

Loading

dims commented Mar 6, 2023 •

edited

Loading

thockin commented Mar 9, 2023 •

edited

Loading

dims commented Mar 9, 2023 •

edited

Loading

dims commented Mar 10, 2023 •

edited

Loading

afbjorklund commented Mar 10, 2023 •

edited

Loading