Skip to content

Commit

Permalink
Ephemeral Agent Installer e2e Tests
Browse files Browse the repository at this point in the history
  • Loading branch information
lranjbar committed Feb 15, 2022
1 parent 53bf422 commit 5df4e4e
Showing 1 changed file with 275 additions and 0 deletions.
275 changes: 275 additions & 0 deletions enhancements/testing/ephemeral-agent-installer-e2e-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
---
title: ephemeral-agent-installer-e2e-tests
authors:
- "@lranjbar"
reviewers:
- "@rwsu"
- "@zaneb"
- "@pawanpinjarkar"
approvers:
- "@dhellmann"
- "@celebdor"
api-approvers: # in case of new or modified APIs or API extensions (CRD, aggregated api servers, webhooks, finalizers)
- N/A
creation-date: 2022-02-10
last-updated: 2022-02-15
tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement
- https://issues.redhat.com/browse/AGENT-20
---

# Ephemeral Agent Installer e2e Tests

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Operational readiness criteria is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

Create a new e2e testing framework for the new agent based installer project.
Currently, for this project there would be potentially two existing e2e frameworks
that could be used, dev-scripts and assisted-test-infra. However, both of these have
significant challenges in re-purposing them for this new project.

The main challenge with either of these existing frameworks is that the environment
setup, installation and subsequent tests they run are all highly coupled together.
With the ephemeral agent installer image it is looking like parts of the potential
e2e flow is in both of these frameworks. However, there is no way to mix and match the
things we need to simply use them without significant re-factoring effort of both
of these frameworks.

A more expected challenge is that some of the routines it is looking like we will
need are completely new and thus I am taking the time to argue that it will be less
risky to create a new framework for the ephemeral agent installer image.

These challenges will be explained in more detail in the User Stories section below.

## Motivation

The new agent based installer project will need an e2e testing framework for validation.
We should look at the existing frameworks and understand what re-purposing them will
entail for this new project.

This enhancement looks at the existing e2e frameworks dev-scripts and assisted-test-infra
closely since this is the previous work in this space. The motivation is to expand on
what we have learned from those frameworks try to come up with a solution that we can
reuse for new installers without these challenges in the future.

A second motivation is that after working with both of these frameworks the past year,
I can tell from experience that the understanding of what they actually do is extremely
low in at least one of the teams that use them. To be frank when a failure happens in the CI
the usual response amongst the Assisted Installer development team is that the CI broke
something and not necessarily the code changes that they are introducing broke something.
This is mostly because there isn't sufficient information about the actual failure when
it happens and there are tons of red herring errors in the logs. There has been significant
efforts in trying to fix these issues in the past but they almost never get prioritized
over new feature development. With every new feature added the complexity grows and so does
the knowledge gap of the e2e tests and the average developer on the team. With this new
project I would like to avoid this outcome as much as possible.

### Goals

This proposal will focus using this new e2e framework in the new ephemeral agent based installer
image project only.

### Non-Goals

* Re-factoring dev-scripts and assisted-test-infra for reasons not related to the new
ephemeral agent based installer image project.
* Getting buy in from other installer teams (Assisted Installer, IPI) to use this new
framework in their e2e tests for the upcoming release.

This is not to say that we shouldn't talk to other installer teams about this and
collaborate with them. It is just not a goal to have them use the same e2e framework as the
new project.

## Proposal

Create an e2e testing framework for the ephemeral agent based installer image.

### User Stories

#### Run Assisted Installer Outside k8s

As a developer, I need to run Assisted Installer without installing a k8s platform first
so that I can test the installation of a users first cluster, cluster 0.

This is the overall purpose of this ephemeral agent based installer image project.
Unfortunately, neither of the existing frameworks actually can actually assist with this.

* The Assisted Installer e2e tests that run in dev-scripts first use the IPI installer
to install OCP and then the infrastructure-operator on this OCP cluster making it a hub
cluster.

* The Assisted Installer e2e tests that run in assisted-test-infra installs minikube first
and then runs Assisted Installer on top of it.

Which means in both existing e2e frameworks we first setup some form of k8s first. Given
the goal of the ephemeral agent based installer project to install a users first cluster,
cluster 0, this is highly undesirable.

While there is a flow to run Assisted Installer in podman this is a manually supported
flow. The support of it in the e2e frameworks is mostly abandoned by the Assisted Installer
team in favor of infrastructure-operator and the hosted SaaS solution.

#### Setup Environment for Install Without Installing OCP or k8s

As a developer, I need to set up VMs so that I can simulate the installation of a users
first cluster using the ephemeral based agent installer image.

Both existing frameworks today set up VMs using libvirt today to test a users OCP install.
However, they also both immediately use those VMs to start the installation process. There
isn't an easy way to stop the current frameworks from using IPI and Assisted Installer
after the VM creation.


#### Launch a VM with the Ephemeral Agent Based Installer Image

As a developer, I need to launch a VM with the built ephemeral agent based installer
image so that I can test and validate this image.

Both existing frameworks don't have an easy way to do this. Which makes sense because
they predate the ephemeral agent based installer project. There also isn't an easy way
to use an existing routine to just launch a generic ISO image which would have been
something we could have re-purposed.

### API Extensions

Not applicable at this time.

### Implementation Details/Notes/Constraints [optional]

What are the caveats to the implementation? What are some important details that
didn't come across above. Go in to as much detail as necessary here. This might
be a good place to talk about core concepts and how they relate.

### Risks and Mitigations

The risks for this proposal are highlighted in the motivation section. The risks we
have are:

1) Creating a third e2e framework for installers that isn't reusable. Meaning in
the future we will have this same issue if a new installer project comes online.

2) The third e2e framework becomes very complex and is hard to understand where the
failures are coming from.

As for how we mitigate these issues I believe that we can create a modular e2e framework
that will mitigate this problem for future installer projects. To mitigate the second risk
we need to make sure that each test has a clear failure message. We also need to encapsulate
red herring messages that happen during set up and installation of components and mark these
accordingly.

## Design Details

* The e2e framework will be a collection of Ansible playbooks and roles
* Using roles to define reusable steps scoped to units of work
* Use playbooks to set up common e2e workflows and chains for the project

Thinking about a general pattern of how to organize the repository:

```
playbooks/
example-e2e-workflow.yml
roles/
common/
tasks/
handlers/
library/
files/
templates/
vars/
defaults/
meta/
example_role/
tasks/
defaults/
meta/
```

This is pretty open ended and meant to be. I just believe that focusing on making
reusable roles here for the steps is key.

These are my thoughts about Openshift CI and the step registry and making it easy to understand.
Copy the `pre`, `test`, and `post` pattern locally in the playbooks and include the playbooks like so:

```
# File structure:
playbooks/
example-e2e-workflow.yml
example-e2e-workflow_pre.yml
example-e2e-workflow_test.yml
example-e2e-workflow_post.yml
# example-e2e-workflow.yml
- hosts: localhost
- name: Run e2e-workflow Pre steps
import_playbook: example-e2e-workflow_pre.yml
- name: Run e2e-workflow Test steps
import_playbook: example-e2e-workflow_test.yml
- name: Run e2e-workflow Post steps
import_playbook: example-e2e-workflow_test.yml
```

This is just a general idea and would like to keep this pretty open ended at the moment.
The first priority will be to run these playbooks locally but how to understand this in the CI
is being taken into consideration.

### Open Questions [optional]

### Test Plan

* e2e tests for the three install scenarios of cluster 0 (SNO, Compact, HA)
* Run Openshift compliance tests against these installed clusters
* Unit tests for new code to reach 50% code coverage

### Graduation Criteria

#### Dev Preview -> Tech Preview

* Ability to utilize the e2e framework locality to aid in development
* Openshift compliance tests are run against the installed clusters
* The e2e framework is on boarded into Openshift CI for the team's use
* Sufficient test coverage (50% coverage for unit tests)

#### Tech Preview -> GA

* Add upgrade/downgrade e2e tests
* Increase unit test code coverage to 75% or higher

#### Removing a deprecated feature

### Upgrade / Downgrade Strategy

### Version Skew Strategy

Since the ephemeral agent based installer is targeted to be released with OCP it would
be easiest to version the e2e framework the same as OCP x.y. This is the strategy
that openshift-tests takes.

### Operational Aspects of API Extensions

#### Failure Modes

#### Support Procedures

## Implementation History

## Drawbacks

The biggest argument against it is that by creating a new framework we don't
inherit the work that was done previously in validating Assisted Installer.

## Alternatives

Alternatively we can re-factor one of the existing frameworks to include this
new project.

## Infrastructure Needed [optional]

0 comments on commit 5df4e4e

Please sign in to comment.