Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate images for CAPD machines to be used in our E2E testing #4620

Closed
fabriziopandini opened this issue May 14, 2021 · 20 comments
Closed
Labels
area/testing Issues or PRs related to testing kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@fabriziopandini
Copy link
Member

As part of the efforts form making our E2E test more reliable, I'm proposing this issue for discussing the opportunity to pre generate images to be used for CAPD machines in our E2E testing and store them into gcr.io/k8s-staging-cluster-api.

Context
Our E2E test use kind images from docker hub for creating CAPD machines in our E2E test; docker hub has rate limits constraints; also, by relying on docker hub, we are linked to the kind team to publish images for new Kubernetes releases timely.

As of today we are compensating the lack of images for specific Kubernetes versions by building such images on the flight inside our E2E test jobs, but this has negative impacts on the overall test duration (building a node images takes from 10 to 15 minutes).

How pre-building images will help?
By pre building required images and publishing them in gcr.io/k8s-staging-cluster-api, we can avoid speed up most of ours E2E tests in case the kind image is missing.

Implementation details
In scripts/ci-e2e-lib.sh we already have all the script required to check published Kubernetes releases for a given release label (e.g stable-1.20 returns the latest stable release in the 1.20 release series).
In the same script there's also all the machinery for checking if a node image exists for a given Kubernetes image and rebuild it if necessary. It makes sense to use the same machinery for the implementation required to address this issue.

I see two options to do so:

  • create a separated periodic job to do so, similar to the one we are using for publishing nightly images build
  • change current E2E jobs to publish images in case they are missing (build and publish instead of build); this requires to investigate if prow identity is allowed to publish images

Notes
The image we are going to build are de facto kind node images. No additional changes will be applied.
It is also important to notice that we are not planning to replace kind image repository, or to provide a mirror for it. We are publishing those images only for CAPD usage in E2E tests (other use cases usage not agreed on this issue won't be supported).

TDB
Eventually, we want to make CAPD to always use those node images, also outside E2E tests. in order to do so it is required for us to publish node images for all the Kubernetes version in the supported skew.

Eventually, we want to build node images for the latest kubernetes version and make out latest test to use it.

/kind feature
/area testing

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/testing Issues or PRs related to testing labels May 14, 2021
@sbueringer
Copy link
Member

sbueringer commented May 14, 2021

Overall I think that's a great idea. I think I would prefer the separate job as it's then cleanly separated from the e2e test job.

A few questions:

  • We wouldn't publish images for latest (e.g. ci/latest-1.22), right? We would have to publish those images after every k/k merge.
  • I assume the logic in our e2e test would then be: try to use one of our published images, if not, build it locally? (especially important for ci/latest-1.22)
  • How do we handle the kind version, I see the following options:
    • just build with whatever we have on the main branch (that's basically how kind itself does it on-release)
      • downside of this is that sometimes the kind version with which the image is build is actually important, most of the time it isn't (e.g. entrypoint fix, cgroup-parent=/kubelet change)
    • build images for every kind<=>Kubernetes version combination we want/need

@fabriziopandini
Copy link
Member Author

fabriziopandini commented May 16, 2021

I would prefer the separate job as it's then cleanly separated from the e2e test job.

A nightly job is more than fine for me.

We wouldn't publish images for latest (e.g. ci/latest-1.22), right? We would have to publish those images after every k/k merge.

I think that the easiest way for starting, also because there is no automation that triggers our build after every K/k merge (and TBH we don't need all those images).
Eventually we can iterate later and create a nightly image instead.

I assume the logic in our e2e test would then be: try to use one of our published images, if not, build it locally? (especially important for ci/latest-1.22)

I agree.
This also helps given that there are not automation to ensure our image gerations is in sync with Kubernetes release publishing, so it might happen our job happen our job still have to generate images, but this is an exception.

How do we handle the kind version

Let's start simple and use kind from master (the same we have in out E2E test today)

@sbueringer
Copy link
Member

@fabriziopandini Sounds good to me. I would like to wait with publishing the first images until after the kind v0.11 release (or alternatively overwrite them after the kind release). Otherwise we would be stuck with the PIPEFAIL issue for a while.

@fabriziopandini
Copy link
Member Author

How do we handle the kind version

Let's start simple and use kind from master (the same we have in out E2E test today)

Might be I'm wrong here, and we are pinning the kind version under hack/tools. Let's be consistent and do the same

@sbueringer
Copy link
Member

@fabriziopandini Currently we pin the kind version in:

  • go.mod
  • test/infrastructure/docker/go.mod
  • hack/ensure-kind.sh (MINIMUM_KIND_VERSION=v0.9.0)

But agree we should use the same version for e2e testing and image publishing.

@vincepri
Copy link
Member

vincepri commented Jul 6, 2021

/milestone Next
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added this to the Next milestone Jul 6, 2021
@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jul 6, 2021
@fabriziopandini
Copy link
Member Author

As per discussion.
We should build images for stable releases only (not for CI latest)
Ideally, we should have test both for latest stable and CI latest for each Kubernetes mirror

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 4, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member

/reopen

@k8s-ci-robot
Copy link
Contributor

@sbueringer: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Dec 6, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member

/reopen
/remove-lifecycle rotten

@k8s-ci-robot
Copy link
Contributor

@sbueringer: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jan 5, 2022
@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 5, 2022
@sbueringer
Copy link
Member

/close
in favor of generating the images in kind itself

@k8s-ci-robot
Copy link
Contributor

@sbueringer: Closing this issue.

In response to this:

/close
in favor of generating the images in kind itself

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member

xref: kubernetes-sigs/kind#197

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Issues or PRs related to testing kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

5 participants