Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent Installer Development Environment #1034

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
270 changes: 270 additions & 0 deletions enhancements/agent-installer/development-environment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
---
title: development-environment
authors:
- "@lranjbar"
reviewers:
- "@andfasano"
- "@bfournie"
- "@hardys"
- "@pawanpinjarkar"
- "@rwsu"
- "@zaneb"
approvers:
- "@dhellmann"
- "@celebdor"
api-approvers: # in case of new or modified APIs or API extensions (CRD, aggregated api servers, webhooks, finalizers)
- N/A
creation-date: 2022-02-10
last-updated: 2022-04-01
tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement
- https://issues.redhat.com/browse/AGENT-71
---

# Ephemeral Agent Installer Development Infrastructure

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Operational readiness criteria is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

Create a development environment for the ephemeral agent based installer image in the
[dev-scripts](https://github.com/openshift-metal3/dev-scripts) repository and refactor
components as needed for the new project. In addition we will add base case e2e test flows
integrated into Openshift CI to get the project started.

## Motivation

The new agent based installer project will need a standard development environment for
the team to use. We will need an e2e testing framework for validation in Openshift CI.
In this enhancement we will explain our desire to reuse existing frameworks for the new
project while limiting the scope of modifications to them to the new project's requirements.

### Goals

* Create a development environment for the agent installer project that can be used for local
development and for automating the flows in Openshift CI.
* Create predefined configurations to deploy Openshift in HA, compact and SNO topologies.
* Create predefined network configurations for Openshift including but not limited IP type
(IPv4, IPv6, dual-stack).
* A reusable and reproducible environment configuration with predefined flows for development
and testing teams to use.
* Convergence of the underlying scripts and environment setup code between Metal Platform (IPI)
team and the new Agent Installer teams.
lranjbar marked this conversation as resolved.
Show resolved Hide resolved

### Non-Goals

* Re-factoring dev-scripts and assisted-test-infra for reasons not related to the new
ephemeral agent based installer image project.
* Overall convergence between all the installer development teams of scripts and e2e test flows.

## Proposal

Create an development environment for the ephemeral agent based installer image in the
[dev-scripts](https://github.com/openshift-metal3/dev-scripts) repository and refactor
components as needed for the new project.

### User Stories

* As a developer, I need to set up an environment so that I can install an Openshift
cluster locally using the agent-based installer for development and testing

* As a developer, I need to launch the built ephemeral agent based installer image so that
I can test and validate this image during development.

* As a developer, I want my environment configuration to be reusable and reproducible so that
I can share it with my colleagues so that we can easily troubleshoot the same problems.

* As a developer, I need to check cluster installation progress of the Openshift cluster being
installed by agent installer and when this installation is complete to validate the Openshift
cluster.
lranjbar marked this conversation as resolved.
Show resolved Hide resolved

* As a developer, I want to extract logs from agent installer and the Openshift cluster under
install to troubleshoot and debug.
lranjbar marked this conversation as resolved.
Show resolved Hide resolved

### API Extensions

Not applicable.

### Risks and Mitigations

** There will be now a third team using dev-scripts for slightly different purposes.
Which means the risk of breaking things in the central framework now is larger and
has a larger "splash zone."

We can mitigate this for the new ephemeral agent installer by running e2e tests using the
agent flows for changes to the dev-scripts repository. The purpose of this would be to find
breakages in the tests before they happen.

** The dev-scripts repository clones the [metal-dev-env](https://github.com/metal3-io/metal3-dev-env)
repository for most of its Ansible roles. Which means there is a risk of changes in this
repository can break things in our repository as well. We have less control over this repository
than others inside Openshift.

This is partially mitigated already because the repository is cloned at a specific
Git SHA. Extra testing and sometimes changes are needed when this Git SHA pin for metal3-dev-env
moves.

**OPTIONAL SUGGESTION:** Add a test in dev-scripts that runs maybe once a week that clones
the HEAD of metal3-dev-env repository. We would be alerted for breakages more frequently
and mitigate them more often.


## Design Details

### General Strategy
lranjbar marked this conversation as resolved.
Show resolved Hide resolved

* Using the scope of the agent installer project we will identify possible refactors to
existing scripts and new functionality required by this new project.
* New functionality we will write in Ansible, unless it is highly coupled with existing
shell scripts.
* We will create Ansible roles to wrap the existing code making it easier to use within
a pure Ansible solution
* The existing bash scripts will be left as is. Refactoring them will be done piece by
piece as needed.

### Phase 1 (Crawl): Add development scripts into dev-scripts for Agent Installer

1) Add development scripts into dev-scripts for the new agent installer project using
lranjbar marked this conversation as resolved.
Show resolved Hide resolved
existing conventions. [1]
2) Add Ansible roles and tasks to wrap existing shell scripts to make them easier
to integrate with an Ansible solution.
3) Add base e2e test flow to run the ephemeral agent installer image in Openshift CI. [2]
4) Write documentation for new developers joining the agent installer team.
dhellmann marked this conversation as resolved.
Show resolved Hide resolved

In this phase it should also become more clear what scripts and code are overlapping
with the existing code base.

[1]: The development scripts for agent installer will include the following scenarios:

* Building the agent installer from a configurable Git SHA.
* Setting up the environment for install and stopping. Allowing for a person to manually
test an install.
* Setting up the environment for install and automatically running the install using the
agent installer.

[2]: This will be run against the repository that the agent installer is being worked on.
The base e2e test flow for agent installer is the following scenario:

* Build agent installer
* Set up environment for Openshift installation
* Using the built agent installer start the Openshift installation
* Report when the Openshift installation is complete
* Run existing Openshift conformance tests and validations against the newly installed cluster

### Phase 2 (Walk): Define common environment default configurations for Agent Installer

1) Add basic cluster validations to be ran by a local developer after the Openshift cluster
is installed. [1]
2) Define an Ansible role that will pipe in values to existing shell scripts and manage
the environment variables needed for these scripts.
3) Add in more configuration examples like
[config_example.sh](https://github.com/openshift-metal3/dev-scripts/blob/master/config_example.sh)
for dev-scripts that define common configurations for OCP installs.
4) Use the above examples to define default configurations for VMs (HA, Compact, SNO)
and networks (IPV4, IPV6, dualstack) in Ansible.
5) Identify minor refactors to existing scripts deemed helpful for the two projects and
move them into Ansible.

[1]: These local cluster validations include checks for the following: Kubeconfig is created and accessible,
Openshift Control Plane came up successfully, Openshift cluster is in the "Installed" state and
finally a generic error capture for the errors that output when the Openshift cluster fails installation.

### Phase 3 (Run): Update existing e2e IPI flows to use the refactored Ansible in dev-scripts

1) Update Agent Installer flows to use the new Ansible roles to configure the flows instead
of using the environment variables directly.
2) Update the e2e IPI flows to use this new Ansible configuration roles as needed.


At the end of Phase 3 this enhancement should be considered complete. This enhancement is
defining the base set of things for our development environment. Phase 4 includes the next steps
that we've already identified.

### Phase 4 (Fly): Continue adding features, flows and configurations as needed for Agent Installer

1) Add more advanced configurations for Agent Installer flows including: connected / disconnected,
network configurations using multiple L2 segments and DHCP configurations.
2) Add new scenarios and flows into the environment as new features require them.
3) Continue to evaluate the existing code base for improvements.

### Open Questions [optional]

### Test Plan

* Test locally while creating the development environment.
* On-board the base e2e flow into Openshift CI for agent installer.

### Graduation Criteria

Not applicable.

#### Dev Preview -> Tech Preview

Not applicable.

#### Tech Preview -> GA

Not applicable.

#### Removing a deprecated feature

Not applicable.

### Upgrade / Downgrade Strategy

Not applicable.

### Version Skew Strategy

* Currently the dev-scripts repository is not versioned. This will stay the same.

* For our test scenarios these will be versioned in openshift-tests following the existing
versioning scheme.

### Operational Aspects of API Extensions

Not applicable.

#### Failure Modes

Not applicable.

#### Support Procedures

Not applicable.

## Implementation History

TBD

## Drawbacks

The biggest argument against it is that by using dev-scripts as the base is we don't
inherit the work that was done previously in validating Assisted Installer in
assisted-test-infra.

## Alternatives

The choices for making a new development environment for the agent installer project were:

1) Make an entirely new framework
2) Extend assisted-test-infra for agent-installer
3) Extend dev-scripts for agent-installer (Chosen)

Making a new framework (Option #1) was not chosen due to the amount of progress
that would be lost by starting over. In existing frameworks we have solved a lot
of problems for example: image registry mirroring, proxies, upgrades, etc. Redoing
this work is not trivial. The risks were deemed likely to happen with this approach.

As far as extending assisted-test-infra (Option #2) for agent installer was not pursued.
This is because the flow of the e2e tests in this framework first setup a minikube cluster.
From the perspective of the agent-installer this is not needed. The beginning of the
e2e test flow used in dev-scripts is significantly closer to the flow we desire for the
agent installer.