-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Age Out Old Registry Content #144
Comments
Jeefy lizard brain policy idea: Non-Prod images: Age out after 9 months |
I am broadly in support of this. I don't think its a reasonable ask for the project to host all releases for all time. Right now I'm leaning towards EOL-9 (3 years), but would want some data before making any decision. EDIT: Even without data I think we should remove the google-containers images...oof |
/sig testing |
Some additional points:
Yeah. That also just hasn't happened due to lack of policy. We did the flip to k8s.gcr.io July 24th 2020. kubernetes/release#270 (comment)
While I'm sure sig-testing is happy to support this effort, I'd suggest that the policy is a combination of k8s-infra (what is k8s-infra willing to fund resources for generally) and release (particularly around the promo tools support for this and kubernetes release timeframe). /sig k8s-infra |
/committee steering |
Technical aside: when we serve layer redirects, we then get an option to serve a |
One additional complication for people to consider: We more commonly host our own base images at this point, building old commits from source will become more challenging (but not necessarily impossible*) if we age out those images. * non-reproducible builds may be an issue, e.g. the "debian-base" image. |
Posted this on the linked issue, but maybe it's better here: The cost reduction would primarily be because we would break people, and encourage them to upgrade (?) I think other OSS projects follow a similar strategy, e.g. the "normal" debian APT repos don't work with old releases, but there is a public archive. I don't know the actual strategy for when debian distros move to the archive. For kubernetes, if we support 4 versions (?), we probably want to keep at least 5 versions "live" so that people aren't forced to upgrade all their EOL clusters at once, but we probably want to keep no more than 8 versions "live" so that people are nudged to upgrade. So I come up with a range of 5-8 releases if we wanted to split off old releases, and I can imagine a case for any of those values. |
This is bringing up a very good point. We have to keep in mind that you can't skip Kubernetes versions when upgrading, otherwise you'll go against the skew policy and eventually break the cluster. Let's say that we remove images for up until 1.22, but someone is using 1.20. They don't have a way to upgrade their cluster to a newer version because they have to start with upgrading to 1.21 which wouldn't exist any longer. This is a very problematic scenario because the only way is more-or-less to start from scratch and that's unacceptable for many. We need to be a bit generous here. I agree with @justinsb that we should target 5-8 release. I'd probably go closer to 8. |
These are good points. I think we should probably be implementing policy in terms of time though, both to be more managable to implement and because we have many images that are not part of Kubernetes releases. If we consider the lifespan of releases but put it in terms of time we could say e.g. "3 years after publication" which would be 5-8 releases (since releases are every 1/3 year and supported for one year). |
I think we should set a more aggressive timeline going forward, say starting with 1.27 we'll host artifacts for 1 year after things after they hit end of support. I'm not sure how we best handle older things, but things like 1.21 are still being heavily used. If we said "5" releases, that would still fall into that window pretty soon. If we have to chose between the health of the project overall (CI infra, etc) and impacting people with those older versions, I think we have to unfortunately chose the health of the project :( Can we provide some extraordinary mechanisms for people to pull maybe tarballs of older stuff and some instructions on how to put those into a registry, like some sort of cold storage for folks? |
💯 to |
What if we keep the latest patch release for each minor? For example, 1 year after reaching the EOL date, we keep images for the latest patch release, but delete all other images. Eventually, we can remove the latest patch release, let's say, 3-4 years after the EOL date. That will reduce storage and (hopefully) bandwidth costs, but at the same time, it shouldn't break any clusters or workflows. |
The source code will remain https://github.com/kubernetes/kubernetes/releases , is not that people that needs can not do |
Technical capabilities aside, the ideal set would be, IMHO:
If we had the ability, I expect that there are probably lots of other images we could purge for older releases from subprojects, etc, which could be removed much more aggresively. |
Thanks all for all the input and suggestions!
That's not sufficient, because the base images cannot just be re-built from source typically, and the source code references base images. Those base images are a snapshot of various packages at a point in time. We also run binary builds in containers sometimes (see kubernetes/kubernetes), where the build-environment image is also an important point-in-time snapshot for reproducing a source build. We've been discussing this further in various forums, and on further review I think the cost angle is going to be near totally irrelevant once we have traffic off k8s.gcr.io onto registry.k8s.io. Storage costs are not large enough to matter. Image storage de-dupes pretty well, it's compressed, in real numbers we're actually looking at less than 2 TB currently even after ~8 years. 2 TB costs like $50/mo to store in GCS standard tier for exxample. Bandwidth costs matter, deeply, ... but only for content people actually use, and they're going to be a lot more manageable on registry.k8s.io (e.g. >50% of requests to k8s.gcr.io came from AWS, and we're now serving content out of AWS ... no egress). For the tooling "needs to iterate all of this" angle, I've been working on a new tool that needs to scan all the images over the past few days, and it's actually quite doable to optimize this even as images grow. AR provides a cheap API to list all digests in a repo. I think we can teach the promoter tools to avoid repeatedly fetching content-addressable data we've already processed. I think disappearing base images in particular is going to cause more harm than benefit ... We're also really ramping up having users migrate, so I think we are missing the window on "lets get a policy in place before people use it", and it we've forced users to adapt to a new registry twice in the past few years, so I think we can just introduce a policy later if we wind up needing it. @mrbobbytables outreach to subprojects and other CNCF projects to migrate, and other pending cost optimization efforts are probably a better use of time at the moment. |
There are other issues tracking things like potentially sunsetting k8s.gcr.io, they're tracked in the general k8s-infra repo at https://github.com/kubernetes/k8s.io |
We migrated forward all existing images since the beginning of hosting Kubernetes images, from gcr.io/google-containers to k8s.gcr.io to registry.k8s.io
We should consider establishing an intended retention policy for images hosted by registry.k8s.io, and communicating that early.
Not retaining all images indefinitely could help the container-image-promoter avoid dealing with an indefinitely growing set of images to synchronize and sign and may also have minor hosting cost benefits.
Even if we set a very lengthy period, we should probably consider not hosting content indefinitely.
The text was updated successfully, but these errors were encountered: