Generate images for CAPD machines to be used in our E2E testing #4620

fabriziopandini · 2021-05-14T18:08:18Z

As part of the efforts form making our E2E test more reliable, I'm proposing this issue for discussing the opportunity to pre generate images to be used for CAPD machines in our E2E testing and store them into gcr.io/k8s-staging-cluster-api.

Context
Our E2E test use kind images from docker hub for creating CAPD machines in our E2E test; docker hub has rate limits constraints; also, by relying on docker hub, we are linked to the kind team to publish images for new Kubernetes releases timely.

As of today we are compensating the lack of images for specific Kubernetes versions by building such images on the flight inside our E2E test jobs, but this has negative impacts on the overall test duration (building a node images takes from 10 to 15 minutes).

How pre-building images will help?
By pre building required images and publishing them in gcr.io/k8s-staging-cluster-api, we can avoid speed up most of ours E2E tests in case the kind image is missing.

Implementation details
In scripts/ci-e2e-lib.sh we already have all the script required to check published Kubernetes releases for a given release label (e.g stable-1.20 returns the latest stable release in the 1.20 release series).
In the same script there's also all the machinery for checking if a node image exists for a given Kubernetes image and rebuild it if necessary. It makes sense to use the same machinery for the implementation required to address this issue.

I see two options to do so:

create a separated periodic job to do so, similar to the one we are using for publishing nightly images build
change current E2E jobs to publish images in case they are missing (build and publish instead of build); this requires to investigate if prow identity is allowed to publish images

Notes
The image we are going to build are de facto kind node images. No additional changes will be applied.
It is also important to notice that we are not planning to replace kind image repository, or to provide a mirror for it. We are publishing those images only for CAPD usage in E2E tests (other use cases usage not agreed on this issue won't be supported).

TDB
Eventually, we want to make CAPD to always use those node images, also outside E2E tests. in order to do so it is required for us to publish node images for all the Kubernetes version in the supported skew.

Eventually, we want to build node images for the latest kubernetes version and make out latest test to use it.

/kind feature
/area testing

sbueringer · 2021-05-14T18:44:34Z

Overall I think that's a great idea. I think I would prefer the separate job as it's then cleanly separated from the e2e test job.

A few questions:

We wouldn't publish images for latest (e.g. ci/latest-1.22), right? We would have to publish those images after every k/k merge.
I assume the logic in our e2e test would then be: try to use one of our published images, if not, build it locally? (especially important for ci/latest-1.22)
How do we handle the kind version, I see the following options:
- just build with whatever we have on the main branch (that's basically how kind itself does it on-release)
  - downside of this is that sometimes the kind version with which the image is build is actually important, most of the time it isn't (e.g. entrypoint fix, cgroup-parent=/kubelet change)
- build images for every kind<=>Kubernetes version combination we want/need

fabriziopandini · 2021-05-16T20:27:40Z

I would prefer the separate job as it's then cleanly separated from the e2e test job.

A nightly job is more than fine for me.

We wouldn't publish images for latest (e.g. ci/latest-1.22), right? We would have to publish those images after every k/k merge.

I think that the easiest way for starting, also because there is no automation that triggers our build after every K/k merge (and TBH we don't need all those images).
Eventually we can iterate later and create a nightly image instead.

I assume the logic in our e2e test would then be: try to use one of our published images, if not, build it locally? (especially important for ci/latest-1.22)

I agree.
This also helps given that there are not automation to ensure our image gerations is in sync with Kubernetes release publishing, so it might happen our job happen our job still have to generate images, but this is an exception.

How do we handle the kind version

Let's start simple and use kind from master (the same we have in out E2E test today)

sbueringer · 2021-05-17T13:07:30Z

@fabriziopandini Sounds good to me. I would like to wait with publishing the first images until after the kind v0.11 release (or alternatively overwrite them after the kind release). Otherwise we would be stuck with the PIPEFAIL issue for a while.

fabriziopandini · 2021-05-18T09:41:28Z

How do we handle the kind version

Let's start simple and use kind from master (the same we have in out E2E test today)

Might be I'm wrong here, and we are pinning the kind version under hack/tools. Let's be consistent and do the same

sbueringer · 2021-05-18T17:19:48Z

@fabriziopandini Currently we pin the kind version in:

go.mod
test/infrastructure/docker/go.mod
hack/ensure-kind.sh (MINIMUM_KIND_VERSION=v0.9.0)

But agree we should use the same version for e2e testing and image publishing.

vincepri · 2021-07-06T17:53:34Z

/milestone Next
/priority important-longterm

fabriziopandini · 2021-07-07T08:29:37Z

As per discussion.
We should build images for stable releases only (not for CI latest)
Ideally, we should have test both for latest stable and CI latest for each Kubernetes mirror

k8s-triage-robot · 2021-10-05T09:19:43Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-11-04T09:55:08Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-12-04T10:10:49Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-12-04T10:11:01Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2021-12-06T09:26:34Z

/reopen

k8s-ci-robot · 2021-12-06T09:26:45Z

@sbueringer: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2022-01-05T09:40:52Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-01-05T09:41:06Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2022-01-05T10:24:08Z

/reopen
/remove-lifecycle rotten

k8s-ci-robot · 2022-01-05T10:24:19Z

@sbueringer: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2022-02-18T17:18:57Z

/close
in favor of generating the images in kind itself

k8s-ci-robot · 2022-02-18T17:19:08Z

@sbueringer: Closing this issue.

In response to this:

/close
in favor of generating the images in kind itself

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2022-02-18T18:57:25Z

xref: kubernetes-sigs/kind#197

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/testing Issues or PRs related to testing labels May 14, 2021

sbueringer mentioned this issue May 25, 2021

📖 doc: add local e2e test execution documentation #4626

Merged

k8s-ci-robot added this to the Next milestone Jul 6, 2021

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jul 6, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 4, 2021

k8s-ci-robot closed this as completed Dec 4, 2021

k8s-ci-robot reopened this Dec 6, 2021

k8s-ci-robot closed this as completed Jan 5, 2022

k8s-ci-robot reopened this Jan 5, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 5, 2022

k8s-ci-robot closed this as completed Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate images for CAPD machines to be used in our E2E testing #4620

Generate images for CAPD machines to be used in our E2E testing #4620

fabriziopandini commented May 14, 2021

sbueringer commented May 14, 2021 •

edited

Loading

fabriziopandini commented May 16, 2021 •

edited

Loading

sbueringer commented May 17, 2021

fabriziopandini commented May 18, 2021

sbueringer commented May 18, 2021

vincepri commented Jul 6, 2021

fabriziopandini commented Jul 7, 2021

k8s-triage-robot commented Oct 5, 2021

k8s-triage-robot commented Nov 4, 2021

k8s-triage-robot commented Dec 4, 2021

k8s-ci-robot commented Dec 4, 2021

sbueringer commented Dec 6, 2021

k8s-ci-robot commented Dec 6, 2021

k8s-triage-robot commented Jan 5, 2022

k8s-ci-robot commented Jan 5, 2022

sbueringer commented Jan 5, 2022

k8s-ci-robot commented Jan 5, 2022

sbueringer commented Feb 18, 2022

k8s-ci-robot commented Feb 18, 2022

sbueringer commented Feb 18, 2022

Generate images for CAPD machines to be used in our E2E testing #4620

Generate images for CAPD machines to be used in our E2E testing #4620

Comments

fabriziopandini commented May 14, 2021

sbueringer commented May 14, 2021 • edited Loading

fabriziopandini commented May 16, 2021 • edited Loading

sbueringer commented May 17, 2021

fabriziopandini commented May 18, 2021

sbueringer commented May 18, 2021

vincepri commented Jul 6, 2021

fabriziopandini commented Jul 7, 2021

k8s-triage-robot commented Oct 5, 2021

k8s-triage-robot commented Nov 4, 2021

k8s-triage-robot commented Dec 4, 2021

k8s-ci-robot commented Dec 4, 2021

sbueringer commented Dec 6, 2021

k8s-ci-robot commented Dec 6, 2021

k8s-triage-robot commented Jan 5, 2022

k8s-ci-robot commented Jan 5, 2022

sbueringer commented Jan 5, 2022

k8s-ci-robot commented Jan 5, 2022

sbueringer commented Feb 18, 2022

k8s-ci-robot commented Feb 18, 2022

sbueringer commented Feb 18, 2022

sbueringer commented May 14, 2021 •

edited

Loading

fabriziopandini commented May 16, 2021 •

edited

Loading