Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backfill images from google-containers #623

Merged
merged 6 commits into from
Mar 4, 2020
Merged

Conversation

listx
Copy link
Contributor

@listx listx commented Mar 2, 2020

This is part of
https://github.com/kubernetes/k8s.io/blob/master/k8s.gcr.io/Vanity-Domain-Flip.md,
in preparation for the domain flip. Backfilling all images to the
top-level (root) directory ensures that the transition for existing
images post-flip will be painless.

Holding for now because I'm curious to see what the presubmit checks have to say.

/hold
/cc @thockin @justaugustus @tpepper

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. wg/k8s-infra labels Mar 2, 2020
@listx
Copy link
Contributor Author

listx commented Mar 2, 2020

I'm surprised the pull-k8sio-cip check finished so quickly (only took just over a minute).

The dry run finished without an error (I downloaded the logs and ignore-case-grepped for "fail" and "error" and found nothing), so, I think it's good for merge.

@listx
Copy link
Contributor Author

listx commented Mar 2, 2020

As we get new images into google-containers, we'll have to update this manifest list.

I assume legacy-backfill is a good enough name for the directories --- but it's something we can change in a follow-up PR. Also, it's possible that we slice & dice the rather long (52K lines...!) images.yaml file into smaller chunks. Probably the first subset to move out to a separate manifest/image pair is the set of images listed on https://github.com/kubernetes/sig-release/blob/master/release-engineering/artifacts.md#docker-images for the Kubernetes releases. @justaugustus ^

@justaugustus
Copy link
Member

Probably the first subset to move out to a separate manifest/image pair is the set of images listed on https://github.com/kubernetes/sig-release/blob/master/release-engineering/artifacts.md#docker-images for the Kubernetes releases.

@listx -- Yes, please. It'd be great to have those first-class images in their own config file; but agreed that that can be a follow-up.

@justaugustus
Copy link
Member

For SIG Release:
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 2, 2020
This is part of
https://github.com/kubernetes/k8s.io/blob/master/k8s.gcr.io/Vanity-Domain-Flip.md,
in preparation for the domain flip. Backfilling all images to the
top-level (root) directory ensures that the transition for existing
images post-flip will be painless.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 2, 2020
@listx
Copy link
Contributor Author

listx commented Mar 3, 2020

I had to update one of the tags in the images.yaml to reflect a manual change I just performed for cadvisor:latest. I guess I should have just added a commit on top so you can see the diff --- oh well, too late.

@justaugustus you can generate the EXACT same images.yaml file as in this PR with:

cd
git clone https://github.com/kubernetes-sigs/k8s-container-image-promoter go/src/sigs.k8s.io/k8s-container-image-promoter
cd go/src/sigs.k8s.io/k8s-container-image-promoter
bazel run \
    --workspace_status_command=$PWD/workspace_status.sh \
    --host_force_python=PY2 //:cip -- -snapshot=gcr.io/google-containers \
    -output-format=YAML > images.yaml

Can you do the above as a sanity check?

@justaugustus
Copy link
Member

@justaugustus
Copy link
Member

A diff -u between my locally generated copy and this PR, comes up empty, so this should be clear to merge, I think.

@listx
Copy link
Contributor Author

listx commented Mar 3, 2020

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2020
@listx
Copy link
Contributor Author

listx commented Mar 3, 2020

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2020
@justaugustus
Copy link
Member

/lgtm
/approve
/hold for Tim's review
/assign @thockin

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 3, 2020
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please get a comment in each file, and a README.md in each directory explaining that these images are effectively frozen, new changes to them will be rejected, and any new promotions MUST happen through individual sub-project staging repos?

Specifically, I want there to be NO AMBIGUITY about whether we want PRs against these, and when the next wave of community maintainers step up, they don't lose all the context.

This was promoted around 2020-03-03 05:41 -08:00 from the
Google-internal promoter.
This way, users will be forced to read this message even if the aren't
looking at the README.md.
@listx
Copy link
Contributor Author

listx commented Mar 3, 2020

Dropped .gitignores, but also added a "DO-NOT-MODIFY" prefix to the folder names to be that much more explicit.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this kick off a giant import?

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 3, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justaugustus, listx, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 3, 2020
@justaugustus
Copy link
Member

Just to weave the tracking together a little...
ref: kubernetes/release#270

@listx
Copy link
Contributor Author

listx commented Mar 4, 2020

Will this kick off a giant import?

Yes. It will be interesting to see how ~50K images actually get copied over.

@thockin
Copy link
Member

thockin commented Mar 4, 2020 via email

@justaugustus
Copy link
Member

...and away we go!
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2020
@k8s-ci-robot k8s-ci-robot merged commit ecfc539 into kubernetes:master Mar 4, 2020
@thockin
Copy link
Member

thockin commented Mar 4, 2020 via email

@listx
Copy link
Contributor Author

listx commented Mar 4, 2020

@listx
Copy link
Contributor Author

listx commented Mar 4, 2020

It is still chugging along after the 1h mark now. Interestingly, the Prow job logs have been truncated (it now says only 387~ lines of logs, whereas I expect MBs of logs). @cjwagner I guess super long Prow jobs' logs getting truncated is a known (perhaps on-purpose by design) issue?

@listx
Copy link
Contributor Author

listx commented Mar 4, 2020

The job has been killed by Prow because it exceeded the 2h limit. I'm pasting all my findings below, though, because I have some thoughts I should collect: https://docs.google.com/document/d/1R7RhuKwOUHsi4i3RMpaSPYLJCv16EazSww5kUQXRPlk/edit?usp=sharing

TL;DR is that I really have to fix kubernetes-sigs/promo-tools#185.

@thockin
Copy link
Member

thockin commented Mar 4, 2020 via email

@cjwagner
Copy link
Member

cjwagner commented Mar 4, 2020

The job has been killed by Prow because it exceeded the 2h limit.

Note that this is just the default timeout:
https://github.com/kubernetes/test-infra/blob/268aa2ea35ddce4fa2696bb7f958327540e051b4/config/prow/config.yaml#L10

You can configure the job to have a different timeout like this: https://github.com/kubernetes/test-infra/blob/28c7418a8c8145a2b0203591b4a69262804e283a/config/jobs/kubernetes/kops/kops-presubmits.yaml#L13-L14

@listx
Copy link
Contributor Author

listx commented Mar 4, 2020

Cool failure mode. Did we get a clean signal that it failed?
Yes: https://prow.k8s.io/job-history/kubernetes-jenkins/logs/post-k8sio-cip

Will a periodic run pick up the debris and carry on cleanly? Or does a human have to intervene?

Yes. The promoter is designed to only perform required promotions.

That being said I should open a PR to make the periodic job run more frequently than once a day.

@listx
Copy link
Contributor Author

listx commented Mar 4, 2020

OK so the promoter is hitting an index-out-of-range error from the scheduled (daily) ci-k8sio-cip job run: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-k8sio-cip/1235320791898263552

I can reproduce locally. Will dig to see which commit in this repo caused it (I'm assuming that this PR itself did not cause this issue, but I'll have to see).

listx pushed a commit to listx/k8s.io that referenced this pull request Mar 4, 2020
This reverts commit ecfc539, reversing
changes made to 670fb0c.
@listx
Copy link
Contributor Author

listx commented Mar 4, 2020

So for some reason the read is failing because of this merge. Reverting this PR to unblock the promoter for possible other image promotions while I work on a fix.

Revert PR here: #629

listx pushed a commit to listx/k8s.io that referenced this pull request Mar 4, 2020
This reverts commit ecfc539, reversing
changes made to 670fb0c.

This reverts kubernetes#623. For some reason, with this change there is now an
index out-of-bounds failure in the ci-k8sio-cip job here:
https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-k8sio-cip/1235320791898263552.

I am puzzled why this did not manifest itself in the post-k8sio-cip
job (postsubmit) when kubernetes#623 was merged. However, the simple fix is to
revert while I figure this out.
k8s-ci-robot added a commit that referenced this pull request Mar 4, 2020
Revert "Merge pull request #623 from listx/master"
@listx
Copy link
Contributor Author

listx commented Mar 5, 2020

The error is happening because of a parsing failure in the function SplitByKnownRegistries. See more details here: kubernetes-sigs/promo-tools#188

listx pushed a commit to listx/k8s.io that referenced this pull request Mar 5, 2020
This reverts commit 0ec285c.

This is the second time we are trying to backfill with legacy images.
The first attempt failed, as detailed here:
kubernetes/kubernetes#88553
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants