-
Notifications
You must be signed in to change notification settings - Fork 474
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
275 additions
and
0 deletions.
There are no files selected for viewing
275 changes: 275 additions & 0 deletions
275
enhancements/testing/ephemeral-agent-installer-e2e-tests.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,275 @@ | ||
--- | ||
title: ephemeral-agent-installer-e2e-tests | ||
authors: | ||
- "@lranjbar" | ||
reviewers: | ||
- "@rwsu" | ||
- "@zaneb" | ||
- "@pawanpinjarkar" | ||
approvers: | ||
- "@dhellmann" | ||
- "@celebdor" | ||
api-approvers: # in case of new or modified APIs or API extensions (CRD, aggregated api servers, webhooks, finalizers) | ||
- N/A | ||
creation-date: 2022-02-10 | ||
last-updated: 2022-02-15 | ||
tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement | ||
- https://issues.redhat.com/browse/AGENT-20 | ||
--- | ||
|
||
# Ephemeral Agent Installer e2e Tests | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Operational readiness criteria is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
||
## Summary | ||
|
||
Create a new e2e testing framework for the new agent based installer project. | ||
Currently, for this project there would be potentially two existing e2e frameworks | ||
that could be used, dev-scripts and assisted-test-infra. However, both of these have | ||
significant challenges in re-purposing them for this new project. | ||
|
||
The main challenge with either of these existing frameworks is that the environment | ||
setup, installation and subsequent tests they run are all highly coupled together. | ||
With the ephemeral agent installer image it is looking like parts of the potential | ||
e2e flow is in both of these frameworks. However, there is no way to mix and match the | ||
things we need to simply use them without significant re-factoring effort of both | ||
of these frameworks. | ||
|
||
A more expected challenge is that some of the routines it is looking like we will | ||
need are completely new and thus I am taking the time to argue that it will be less | ||
risky to create a new framework for the ephemeral agent installer image. | ||
|
||
These challenges will be explained in more detail in the User Stories section below. | ||
|
||
## Motivation | ||
|
||
The new agent based installer project will need an e2e testing framework for validation. | ||
We should look at the existing frameworks and understand what re-purposing them will | ||
entail for this new project. | ||
|
||
This enhancement looks at the existing e2e frameworks dev-scripts and assisted-test-infra | ||
closely since this is the previous work in this space. The motivation is to expand on | ||
what we have learned from those frameworks try to come up with a solution that we can | ||
reuse for new installers without these challenges in the future. | ||
|
||
A second motivation is that after working with both of these frameworks the past year, | ||
I can tell from experience that the understanding of what they actually do is extremely | ||
low in at least one of the teams that use them. To be frank when a failure happens in the CI | ||
the usual response amongst the Assisted Installer development team is that the CI broke | ||
something and not necessarily the code changes that they are introducing broke something. | ||
This is mostly because there isn't sufficient information about the actual failure when | ||
it happens and there are tons of red herring errors in the logs. There has been significant | ||
efforts in trying to fix these issues in the past but they almost never get prioritized | ||
over new feature development. With every new feature added the complexity grows and so does | ||
the knowledge gap of the e2e tests and the average developer on the team. With this new | ||
project I would like to avoid this outcome as much as possible. | ||
|
||
### Goals | ||
|
||
This proposal will focus using this new e2e framework in the new ephemeral agent based installer | ||
image project only. | ||
|
||
### Non-Goals | ||
|
||
* Re-factoring dev-scripts and assisted-test-infra for reasons not related to the new | ||
ephemeral agent based installer image project. | ||
* Getting buy in from other installer teams (Assisted Installer, IPI) to use this new | ||
framework in their e2e tests for the upcoming release. | ||
|
||
This is not to say that we shouldn't talk to other installer teams about this and | ||
collaborate with them. It is just not a goal to have them use the same e2e framework as the | ||
new project. | ||
|
||
## Proposal | ||
|
||
Create an e2e testing framework for the ephemeral agent based installer image. | ||
|
||
### User Stories | ||
|
||
#### Run Assisted Installer Outside k8s | ||
|
||
As a developer, I need to run Assisted Installer without installing a k8s platform first | ||
so that I can test the installation of a users first cluster, cluster 0. | ||
|
||
This is the overall purpose of this ephemeral agent based installer image project. | ||
Unfortunately, neither of the existing frameworks actually can actually assist with this. | ||
|
||
* The Assisted Installer e2e tests that run in dev-scripts first use the IPI installer | ||
to install OCP and then the infrastructure-operator on this OCP cluster making it a hub | ||
cluster. | ||
|
||
* The Assisted Installer e2e tests that run in assisted-test-infra installs minikube first | ||
and then runs Assisted Installer on top of it. | ||
|
||
Which means in both existing e2e frameworks we first setup some form of k8s first. Given | ||
the goal of the ephemeral agent based installer project to install a users first cluster, | ||
cluster 0, this is highly undesirable. | ||
|
||
While there is a flow to run Assisted Installer in podman this is a manually supported | ||
flow. The support of it in the e2e frameworks is mostly abandoned by the Assisted Installer | ||
team in favor of infrastructure-operator and the hosted SaaS solution. | ||
|
||
#### Setup Environment for Install Without Installing OCP or k8s | ||
|
||
As a developer, I need to set up VMs so that I can simulate the installation of a users | ||
first cluster using the ephemeral based agent installer image. | ||
|
||
Both existing frameworks today set up VMs using libvirt today to test a users OCP install. | ||
However, they also both immediately use those VMs to start the installation process. There | ||
isn't an easy way to stop the current frameworks from using IPI and Assisted Installer | ||
after the VM creation. | ||
|
||
|
||
#### Launch a VM with the Ephemeral Agent Based Installer Image | ||
|
||
As a developer, I need to launch a VM with the built ephemeral agent based installer | ||
image so that I can test and validate this image. | ||
|
||
Both existing frameworks don't have an easy way to do this. Which makes sense because | ||
they predate the ephemeral agent based installer project. There also isn't an easy way | ||
to use an existing routine to just launch a generic ISO image which would have been | ||
something we could have re-purposed. | ||
|
||
### API Extensions | ||
|
||
Not applicable at this time. | ||
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
What are the caveats to the implementation? What are some important details that | ||
didn't come across above. Go in to as much detail as necessary here. This might | ||
be a good place to talk about core concepts and how they relate. | ||
|
||
### Risks and Mitigations | ||
|
||
The risks for this proposal are highlighted in the motivation section. The risks we | ||
have are: | ||
|
||
1) Creating a third e2e framework for installers that isn't reusable. Meaning in | ||
the future we will have this same issue if a new installer project comes online. | ||
|
||
2) The third e2e framework becomes very complex and is hard to understand where the | ||
failures are coming from. | ||
|
||
As for how we mitigate these issues I believe that we can create a modular e2e framework | ||
that will mitigate this problem for future installer projects. To mitigate the second risk | ||
we need to make sure that each test has a clear failure message. We also need to encapsulate | ||
red herring messages that happen during set up and installation of components and mark these | ||
accordingly. | ||
|
||
## Design Details | ||
|
||
* The e2e framework will be a collection of Ansible playbooks and roles | ||
* Using roles to define reusable steps scoped to units of work | ||
* Use playbooks to set up common e2e workflows and chains for the project | ||
|
||
Thinking about a general pattern of how to organize the repository: | ||
|
||
``` | ||
playbooks/ | ||
example-e2e-workflow.yml | ||
roles/ | ||
common/ | ||
tasks/ | ||
handlers/ | ||
library/ | ||
files/ | ||
templates/ | ||
vars/ | ||
defaults/ | ||
meta/ | ||
example_role/ | ||
tasks/ | ||
defaults/ | ||
meta/ | ||
``` | ||
|
||
This is pretty open ended and meant to be. I just believe that focusing on making | ||
reusable roles here for the steps is key. | ||
|
||
These are my thoughts about Openshift CI and the step registry and making it easy to understand. | ||
Copy the `pre`, `test`, and `post` pattern locally in the playbooks and include the playbooks like so: | ||
|
||
``` | ||
# File structure: | ||
playbooks/ | ||
example-e2e-workflow.yml | ||
example-e2e-workflow_pre.yml | ||
example-e2e-workflow_test.yml | ||
example-e2e-workflow_post.yml | ||
# example-e2e-workflow.yml | ||
- hosts: localhost | ||
- name: Run e2e-workflow Pre steps | ||
import_playbook: example-e2e-workflow_pre.yml | ||
- name: Run e2e-workflow Test steps | ||
import_playbook: example-e2e-workflow_test.yml | ||
- name: Run e2e-workflow Post steps | ||
import_playbook: example-e2e-workflow_test.yml | ||
``` | ||
|
||
This is just a general idea and would like to keep this pretty open ended at the moment. | ||
The first priority will be to run these playbooks locally but how to understand this in the CI | ||
is being taken into consideration. | ||
|
||
### Open Questions [optional] | ||
|
||
### Test Plan | ||
|
||
* e2e tests for the three install scenarios of cluster 0 (SNO, Compact, HA) | ||
* Run Openshift compliance tests against these installed clusters | ||
* Unit tests for new code to reach 50% code coverage | ||
|
||
### Graduation Criteria | ||
|
||
#### Dev Preview -> Tech Preview | ||
|
||
* Ability to utilize the e2e framework locality to aid in development | ||
* Openshift compliance tests are run against the installed clusters | ||
* The e2e framework is on boarded into Openshift CI for the team's use | ||
* Sufficient test coverage (50% coverage for unit tests) | ||
|
||
#### Tech Preview -> GA | ||
|
||
* Add upgrade/downgrade e2e tests | ||
* Increase unit test code coverage to 75% or higher | ||
|
||
#### Removing a deprecated feature | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
### Version Skew Strategy | ||
|
||
Since the ephemeral agent based installer is targeted to be released with OCP it would | ||
be easiest to version the e2e framework the same as OCP x.y. This is the strategy | ||
that openshift-tests takes. | ||
|
||
### Operational Aspects of API Extensions | ||
|
||
#### Failure Modes | ||
|
||
#### Support Procedures | ||
|
||
## Implementation History | ||
|
||
## Drawbacks | ||
|
||
The biggest argument against it is that by creating a new framework we don't | ||
inherit the work that was done previously in validating Assisted Installer. | ||
|
||
## Alternatives | ||
|
||
Alternatively we can re-factor one of the existing frameworks to include this | ||
new project. | ||
|
||
## Infrastructure Needed [optional] |