Ephemeral Agent Installer e2e Tests

openshift · Feb 15, 2022 · 5df4e4e · 5df4e4e
1 parent 53bf422
commit 5df4e4e
Showing 1 changed file with 275 additions and 0 deletions.
diff --git a/enhancements/testing/ephemeral-agent-installer-e2e-tests.md b/enhancements/testing/ephemeral-agent-installer-e2e-tests.md
@@ -0,0 +1,275 @@
+---
+title: ephemeral-agent-installer-e2e-tests
+authors:
+  - "@lranjbar"
+reviewers:
+  - "@rwsu"
+  - "@zaneb"
+  - "@pawanpinjarkar"
+approvers:
+  - "@dhellmann"
+  - "@celebdor"
+api-approvers: # in case of new or modified APIs or API extensions (CRD, aggregated api servers, webhooks, finalizers)
+  - N/A
+creation-date: 2022-02-10
+last-updated: 2022-02-15
+tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement
+  - https://issues.redhat.com/browse/AGENT-20
+---
+
+# Ephemeral Agent Installer e2e Tests
+
+## Release Signoff Checklist
+
+- [ ] Enhancement is `implementable`
+- [ ] Design details are appropriately documented from clear requirements
+- [ ] Test plan is defined
+- [ ] Operational readiness criteria is defined
+- [ ] Graduation criteria for dev preview, tech preview, GA
+- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)
+
+## Summary
+
+Create a new e2e testing framework for the new agent based installer project. 
+Currently, for this project there would be potentially two existing e2e frameworks 
+that could be used, dev-scripts and assisted-test-infra. However, both of these have 
+significant challenges in re-purposing them for this new project. 
+
+The main challenge with either of these existing frameworks is that the environment
+setup, installation and subsequent tests they run are all highly coupled together.
+With the ephemeral agent installer image it is looking like parts of the potential
+e2e flow is in both of these frameworks. However, there is no way to mix and match the
+things we need to simply use them without significant re-factoring effort of both
+of these frameworks. 
+
+A more expected challenge is that some of the routines it is looking like we will 
+need are completely new and thus I am taking the time to argue that it will be less
+risky to create a new framework for the ephemeral agent installer image.
+
+These challenges will be explained in more detail in the User Stories section below.
+
+## Motivation
+
+The new agent based installer project will need an e2e testing framework for validation.
+We should look at the existing frameworks and understand what re-purposing them will 
+entail for this new project.
+
+This enhancement looks at the existing e2e frameworks dev-scripts and assisted-test-infra 
+closely since this is the previous work in this space. The motivation is to expand on
+what we have learned from those frameworks try to come up with a solution that we can 
+reuse for new installers without these challenges in the future.
+
+A second motivation is that after working with both of these frameworks the past year,
+I can tell from experience that the understanding of what they actually do is extremely 
+low in at least one of the teams that use them. To be frank when a failure happens in the CI
+the usual response amongst the Assisted Installer development team is that the CI broke
+something and not necessarily the code changes that they are introducing broke something.
+This is mostly because there isn't sufficient information about the actual failure when 
+it happens and there are tons of red herring errors in the logs. There has been significant
+efforts in trying to fix these issues in the past but they almost never get prioritized
+over new feature development. With every new feature added the complexity grows and so does
+the knowledge gap of the e2e tests and the average developer on the team. With this new 
+project I would like to avoid this outcome as much as possible.
+
+### Goals
+
+This proposal will focus using this new e2e framework in the new ephemeral agent based installer 
+image project only. 
+
+### Non-Goals
+
+* Re-factoring dev-scripts and assisted-test-infra for reasons not related to the new 
+ephemeral agent based installer image project.
+* Getting buy in from other installer teams (Assisted Installer, IPI) to use this new 
+framework in their e2e tests for the upcoming release.
+
+This is not to say that we shouldn't talk to other installer teams about this and 
+collaborate with them. It is just not a goal to have them use the same e2e framework as the 
+new project.
+
+## Proposal
+
+Create an e2e testing framework for the ephemeral agent based installer image.
+
+### User Stories
+
+#### Run Assisted Installer Outside k8s
+
+As a developer, I need to run Assisted Installer without installing a k8s platform first 
+so that I can test the installation of a users first cluster, cluster 0.
+
+This is the overall purpose of this ephemeral agent based installer image project. 
+Unfortunately, neither of the existing frameworks actually can actually assist with this. 
+
+* The Assisted Installer e2e tests that run in dev-scripts first use the IPI installer 
+to install OCP and then the infrastructure-operator on this OCP cluster making it a hub 
+cluster.
+
+* The Assisted Installer e2e tests that run in assisted-test-infra installs minikube first 
+and then runs Assisted Installer on top of it.
+
+Which means in both existing e2e frameworks we first setup some form of k8s first. Given
+the goal of the ephemeral agent based installer project to install a users first cluster, 
+cluster 0, this is highly undesirable.
+
+While there is a flow to run Assisted Installer in podman this is a manually supported 
+flow. The support of it in the e2e frameworks is mostly abandoned by the Assisted Installer 
+team in favor of infrastructure-operator and the hosted SaaS solution.
+
+#### Setup Environment for Install Without Installing OCP or k8s
+
+As a developer, I need to set up VMs so that I can simulate the installation of a users 
+first cluster using the ephemeral based agent installer image.
+
+Both existing frameworks today set up VMs using libvirt today to test a users OCP install.
+However, they also both immediately use those VMs to start the installation process. There
+isn't an easy way to stop the current frameworks from using IPI and Assisted Installer
+after the VM creation.
+
+
+#### Launch a VM with the Ephemeral Agent Based Installer Image
+
+As a developer, I need to launch a VM with the built ephemeral agent based installer
+image so that I can test and validate this image.
+
+Both existing frameworks don't have an easy way to do this. Which makes sense because
+they predate the ephemeral agent based installer project. There also isn't an easy way
+to use an existing routine to just launch a generic ISO image which would have been
+something we could have re-purposed. 
+
+### API Extensions
+
+Not applicable at this time.
+
+### Implementation Details/Notes/Constraints [optional]
+
+What are the caveats to the implementation? What are some important details that
+didn't come across above. Go in to as much detail as necessary here. This might
+be a good place to talk about core concepts and how they relate.
+
+### Risks and Mitigations
+
+The risks for this proposal are highlighted in the motivation section. The risks we 
+have are:
+
+1) Creating a third e2e framework for installers that isn't reusable. Meaning in
+the future we will have this same issue if a new installer project comes online.
+
+2) The third e2e framework becomes very complex and is hard to understand where the 
+failures are coming from.
+
+As for how we mitigate these issues I believe that we can create a modular e2e framework
+that will mitigate this problem for future installer projects. To mitigate the second risk
+we need to make sure that each test has a clear failure message. We also need to encapsulate
+red herring messages that happen during set up and installation of components and mark these
+accordingly.
+
+## Design Details
+
+* The e2e framework will be a collection of Ansible playbooks and roles
+* Using roles to define reusable steps scoped to units of work
+* Use playbooks to set up common e2e workflows and chains for the project
+
+Thinking about a general pattern of how to organize the repository:
+
+```
+playbooks/
+  example-e2e-workflow.yml
+roles/
+    common/
+        tasks/
+        handlers/
+        library/
+        files/
+        templates/
+        vars/
+        defaults/
+        meta/
+    example_role/
+        tasks/
+        defaults/
+        meta/
+```
+
+This is pretty open ended and meant to be. I just believe that focusing on making
+reusable roles here for the steps is key.
+
+These are my thoughts about Openshift CI and the step registry and making it easy to understand. 
+Copy the `pre`, `test`, and `post` pattern locally in the playbooks and include the playbooks like so:
+
+```
+# File structure:
+playbooks/
+  example-e2e-workflow.yml
+  example-e2e-workflow_pre.yml
+  example-e2e-workflow_test.yml
+  example-e2e-workflow_post.yml
+
+# example-e2e-workflow.yml
+- hosts: localhost
+
+- name: Run e2e-workflow Pre steps
+  import_playbook: example-e2e-workflow_pre.yml
+
+- name: Run e2e-workflow Test steps
+  import_playbook: example-e2e-workflow_test.yml
+
+- name: Run e2e-workflow Post steps
+  import_playbook: example-e2e-workflow_test.yml
+```
+
+This is just a general idea and would like to keep this pretty open ended at the moment.
+The first priority will be to run these playbooks locally but how to understand this in the CI
+is being taken into consideration.
+
+### Open Questions [optional]
+
+### Test Plan
+
+* e2e tests for the three install scenarios of cluster 0 (SNO, Compact, HA)
+* Run Openshift compliance tests against these installed clusters
+* Unit tests for new code to reach 50% code coverage
+
+### Graduation Criteria
+
+#### Dev Preview -> Tech Preview
+
+* Ability to utilize the e2e framework locality to aid in development
+* Openshift compliance tests are run against the installed clusters
+* The e2e framework is on boarded into Openshift CI for the team's use
+* Sufficient test coverage (50% coverage for unit tests)
+
+#### Tech Preview -> GA
+
+* Add upgrade/downgrade e2e tests
+* Increase unit test code coverage to 75% or higher
+
+#### Removing a deprecated feature
+
+### Upgrade / Downgrade Strategy
+
+### Version Skew Strategy
+
+Since the ephemeral agent based installer is targeted to be released with OCP it would 
+be easiest to version the e2e framework the same as OCP x.y. This is the strategy
+that openshift-tests takes.
+
+### Operational Aspects of API Extensions
+
+#### Failure Modes
+
+#### Support Procedures
+
+## Implementation History
+
+## Drawbacks
+
+The biggest argument against it is that by creating a new framework we don't
+inherit the work that was done previously in validating Assisted Installer.
+
+## Alternatives
+
+Alternatively we can re-factor one of the existing frameworks to include this
+new project.
+
+## Infrastructure Needed [optional]