Add enhancement: IPI kubevirt provider #417

ravidbro · 2020-08-04T13:02:37Z

Signed-off-by: Ravid Brown ravid@redhat.com

Signed-off-by: Ravid Brown <ravid@redhat.com>

russellb · 2020-08-04T13:45:09Z

enhancements/installer/kubevirt-ipi.md

+
+## Summary
+
+This document describes how `kubevirt` becomes a platform provider for Openshift. \


What are the trailing slashes?

also, s/Openshift/OpenShift/

The trailing backslashes are for a new line in the .md file

crawford · 2020-08-05T23:44:03Z

enhancements/installer/kubevirt-ipi.md

+
+1. Survey
+
+    The installation starts and right after the user supplies his public ssh key,\


s/his/their/

derekwaynecarr · 2020-08-06T00:39:46Z

Is there a plan to build a KubeVirt cloud provider so node lifecycle controller, failure domain zone labeling, and other related items like service loadbalancing would function ?

ravidbro · 2020-08-06T09:13:45Z

Is there a plan to build a KubeVirt cloud provider so node lifecycle controller, failure domain zone labeling, and other related items like service loadbalancing would function ?

I think this is part of the plan

Signed-off-by: Ravid Brown <ravid@redhat.com>

dankenigsberg

Hi, sorry for picking mostly nits; Pardon for a partial review.

dankenigsberg · 2020-09-03T19:03:43Z

enhancements/installer/kubevirt-ipi.md

+## Motivation
+
+- Achieve true multi-tenancy of OpenShift were each tenant has dedicated control plane \
+ and have full control on its configuration. 


English: s/have full/has full/

enhancements/installer/kubevirt-ipi.md

dankenigsberg · 2020-09-08T12:20:06Z

enhancements/installer/kubevirt-ipi.md

+## Summary
+
+This document describes how `KubeVirt` becomes a platform provider for OpenShift. \
+`KubeVirt` is a virtualization platform running as an extension of Kubernetes. \


should we add a URL to https://kubevirt.io ?

dankenigsberg · 2020-09-08T12:28:35Z

enhancements/installer/kubevirt-ipi.md

+
+## Summary
+
+This document describes how `KubeVirt` becomes a platform provider for OpenShift. \


later you use the term "infra cluster" rather than "platform". I'd prefer using one term.

I think that KubeVirt as a technology/product is a platform but when we name the clusters involved in the solution we name name 'infra cluster' and 'tenant cluster'

dankenigsberg · 2020-09-08T12:30:56Z

enhancements/installer/kubevirt-ipi.md

+ virtual machines by KubeVirt for every node in the tenant cluster (master and workers nodes)
+ and other Openshift/Kubernetes resources to allow **users** (not admins) of the infra cluster
+ to create a tenant cluster as it was an application running on the infra cluster.
+ To achieve that we will implement all the components needed for the installer and cluster-api-provider


s/To achieve that // - I find that it just makes the sentence cumbersome.

dankenigsberg · 2020-09-08T13:09:44Z

enhancements/installer/kubevirt-ipi.md

+        - (KubeVirt gap) Interface binding - Currently the only supported binding on the pods
+        network is masquerade which means that all nodes are behind NAT, each VM
+        behind the NAT of his own pod.
+        - (KubeVirt gap) Static IP - Currently the VM egress IP is always the pod IP which is


I'd say that this is also a k8s gap, assuming the node IP never changes.

dankenigsberg · 2020-09-08T13:10:46Z

enhancements/installer/kubevirt-ipi.md

+        - With this approach admin of the infra cluster will need to be involved in
+    the creation of each new tenant cluster since NADs need to be created and
+    probably also nmstate will need to be used to create the topology on the hosts.
+    At the moment, we will assume that admin created all network resources before running the installer, 


s/at the moment/In this proposal/

I'd mention that the admin has to predefine the target namespace, too. Adding a NAD into it is just a tweak.

dankenigsberg · 2020-09-08T13:15:20Z

enhancements/installer/kubevirt-ipi.md

+    At the moment, we will assume that admin created all network resources before running the installer, 
+    and the created networkName (NAD) will be the input for the installer.  
+
+        - Guest-agent is not available at the moment for RHCOS and when Multus is used \


The reader may not know what "guest-agent" is.

When you say "at the moment" it appears as if you're having a plan to change that. Is that true? Do you have a PR?

It is not clear to me why a guest-agent is needed with the KubeVirt provider but not with other IaaSs. Can you explain?

the qemu-guest-agent is needed because without that when running VMs with Multus the hypervisor isn't aware of the IP that the guest allocates from external DHCP (The issue doesn't exist when using pods network)
In KubeVirt, that means that the attribute IP on the VMI resource will be empty.
It's true not just for KubeVirt but for every qemu based virtualization (oVirt/OpenStack).
Today (It's in progress) all these platforms won't show the IP if the guest OS is RHCOS

it's not a gap anymore, it was released since I sent this PR

it's not a gap anymore, it was released since I sent this PR

which PR? Can you share a URL?

I meant to the current enhancement PR, since I wrote it guest-agent container was released in ERRATA and available now

dankenigsberg · 2020-09-08T13:15:43Z

enhancements/installer/kubevirt-ipi.md

+    and the created networkName (NAD) will be the input for the installer.  
+
+        - Guest-agent is not available at the moment for RHCOS and when Multus is used \
+         the VMI is depended on guest agent running inside the guest to report the IP address. 


I know what's a VMI, but a typical reader of this doc may not know.

dankenigsberg · 2020-09-08T13:17:28Z

enhancements/installer/kubevirt-ipi.md

+- Storage
+
+    Currently, attaching PV to a running VM (hot-plug) is not supported and may needed to develop CSI
+    driver for `KubeVirt`.


It is not clear from this sentence why a CSI is needed; hot-plug is only one possible implementation for a kubevirt-csi, so I don't understand the semantic of the sentence.

Signed-off-by: Ravid Brown <ravid@redhat.com>

dankenigsberg · 2020-09-09T16:11:17Z

enhancements/installer/kubevirt-ipi.md

-        - (KubeVirt gap) Static IP - Currently the VM egress IP is always the pod IP which is
-        changing every time the VM restarts (and new pod is being created).
+        - (OpenShift/KubeVirt gap) Static IP - Currently OpenShift assumes that node's IP addresses are static,
+         and the VM egress IP is always the pod IP which is changing every time the VM restarts (and new pod is being created).


this sentence is too complex for me to parse.

crawford

The high-level strategies look good to me. There's a few things blocking some of these paths, but I believe they are all mentioned.

crawford · 2020-09-09T23:49:44Z

enhancements/installer/kubevirt-ipi.md

+- Provide multi-tenancy and isolation between the tenant clusters
+
+### Non-Goals
+- UPI implementation will be provided separately.


This is a bit of a double negative. The non-goal is to provide a UPI implementation (right?).

robyoungky · 2020-09-10T12:38:58Z

enhancements/installer/kubevirt-ipi.md

+and then choose `KubeVirt` the installation will ask for all the relevant details
+of the installation: **kubeconfig** for the infrastructure OpenShift, **namespace**, **storageClass**, 
+ **networkName (NAD)** and other KubeVirt specific attributes. 
+The installer will validate it can communicate with the api, otherwise it will fail to proceed.


have you thought through the negative flow and remediation when a failure occurs? ideally the users knows exactly why this failed and how to diagnose, address and fix. And we need to retain any input values the user chose or entered leading up to this point so he/she does not have to enter them again. Same concept should apply for all input values we require.

@robyoungky I don't see how this should be different from other platforms. At the end the user will have the same user experience as with any other IPIs. (while I agree this information can probably be improved, it is out of this enhancement scope)

The installer config file from any of the IPI installers is persisted, and can be used to deploy the cluster again.

robyoungky · 2020-09-10T12:45:31Z

enhancements/installer/kubevirt-ipi.md

+
+**Note:** *Section not required until targeted at a release.*
+
+Consider the following in developing a test plan for this enhancement:


Negative flow outcomes and recovery, with user and automated remediation.

Signed-off-by: Ravid Brown <ravid@redhat.com>

peterclauterbach · 2020-09-22T22:07:48Z

enhancements/installer/kubevirt-ipi.md

+cluster with platform services as we can or pods deployed in the infrastructure cluster to supply the services
+as DNS and load balancing.
+
+We see two main network options for deployment over KubeVirt:


Will we be able to support cluster using third-party network SDNs like Calico or similar?

We will be able to support only CNIs that are supported by CNV,
Right now, the only option that supported is Multus with bridge CNI,
By theory, every Multus CNI that supplies DHCP and routes in and out should work too.

peterclauterbach · 2020-09-22T22:11:39Z

enhancements/installer/kubevirt-ipi.md

+        - (KubeVirt gap) Interface binding - Currently the only supported binding on the pods
+        network is masquerade which means that all nodes are behind NAT, each VM
+        behind the NAT of his own pod.
+        - (OpenShift/KubeVirt gap) Static IP - OpenShift assumes that node's IP addresses are static,


related topic, some customers insist on allocating IPs to VMs though static addressing, DHCP is not allowed on production networks.

Is that something that supported on other platforms?
I don't see in the API/YAML a way to supply IP per node,
Also, it contradicts the concept of the machine-API with machine-set, machine-set is like a cattle and not as a pet, it has a property 'replica' which is a number, you modify this number, and VMs are being created/destroyed.
I don't see how that can work with static IPs.
What am I missing here?

peterclauterbach · 2020-09-22T22:13:21Z

enhancements/installer/kubevirt-ipi.md

+
+- Storage
+
+    CSI driver for `KubeVirt` is not available yet.


dynamic storage provisioning should be part of the MVP. We previously made this mistake with OCP on RHV IPI

AFAIK, KubeVirt CSI driver is planned for Feb 2021, but I guess you know better on this effort.

If this is a blocker even for pre-GA versions then we have a problem.

This is not a blocker, we can release this feature without KubeVirt CSI.

peterclauterbach · 2020-09-22T22:17:11Z

enhancements/installer/kubevirt-ipi.md

+
+    For each tenant cluster we will create a namespace with the ClusterID.
+
+    *Open question: should the namespace creation done by the user or by the installer.*


IPI implies we do everything. However, some users may have prescribed naming schemes that we should conform to. I'd recommend we give them the options to specify a name, but we create it.

When we moved to the networking option of using Multus, we are not able anymore to create the namespace, since the input of the installer is NAD resource name and the resource must exists in the namespace before we start theinstallation.

peterclauterbach · 2020-09-22T22:19:56Z

enhancements/installer/kubevirt-ipi.md

+   from the infra cluster storageClass and attach it to the relevant VM where the PV will be exposed to the guest
+   as block device that the driver will attach to the requested pods.
+
+- Anti-affinity


We definitely need to do this, to prevent the cluster from being unrecoverable in the event of the loss of two master nodes. We can use soft affinity, so things can come up in a demo environment.

peterclauterbach · 2020-09-22T22:23:26Z

enhancements/installer/kubevirt-ipi.md

+
+**Note:** *Section not required until targeted at a release.*
+
+Consider the following in developing a test plan for this enhancement:


upgrades of different versions of OCP will be interesting. How far will we allow versions to drift? i.e. could we support an OCP 4.2 tenants cluster alongside an OCP 4.10 cluster, all hosted on an OCP 4.8 cluster?

I don't see a reason why not, we are not relying on any feature/resource that doesn't exist in any OCP 4.x version

peterclauterbach · 2020-09-22T22:24:08Z

enhancements/installer/kubevirt-ipi.md

+work because we control the dnsmasq inside VMs network.
+
+What could go wrong?
+- we may not be able to make the CI play nicely on time and we need as much help


CI is a requirement for going GA with any OCP features. No exceptions, AFAIK.

you right, fixing this one

peterclauterbach · 2020-09-22T22:28:08Z

enhancements/installer/kubevirt-ipi.md

+    before running the installer, and the created networkName (NAD) will be the input for the installer.  
+
+
+- Storage


One of the other risks to call out is the etcd performance and latency requirements. We've had issues where people deploy OCP clusters in virtual environments with insufficient hardware, and they have all sorts of problems installing the OCP clusters. Worse, the cluster install may go fine, but the cluster goes unhealthy after a few days. I'm not sure what the right answer is, but we should have a discussion about how we can make this easier to validate and troubleshoot.

good point, I don't know how to solve it, especially when we are running on Baremetal and not on a public cloud

peterclauterbach · 2020-09-22T22:29:49Z

enhancements/installer/kubevirt-ipi.md

+and then choose `KubeVirt` the installation will ask for all the relevant details
+of the installation: **kubeconfig** for the infrastructure OpenShift, **namespace**, **storageClass**, 
+ **networkName (NAD)** and other KubeVirt specific attributes. 
+The installer will validate it can communicate with the api, otherwise it will fail to proceed.


The installer config file from any of the IPI installers is persisted, and can be used to deploy the cluster again.

Signed-off-by: Ravid Brown <ravid@redhat.com>

openshift-ci-robot · 2020-10-12T12:51:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ravidbro
To complete the pull request process, please assign mrunalp after the PR has been reviewed.
You can assign the PR to them by writing /assign @mrunalp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ravidbro · 2020-10-12T12:54:31Z

enhancements/installer/kubevirt-ipi.md

+- End user documentation, relative API stability
+- Sufficient test coverage
+- Gather feedback from users rather than just developers
+- Approved review by the installer team.


@chenyosef @sdodson FYI

ravidbro · 2020-10-12T12:55:24Z

/assign @mrunalp

kikisdeliveryservice · 2020-11-17T23:09:02Z

enhancements/installer/kubevirt-ipi.md

+that would count as tricky in the implementation and anything particularly
+challenging to test should be called out.
+
+All code is expected to have adequate tests (eventually with coverage


I think there should be a test plan here.

kikisdeliveryservice · 2020-11-17T23:12:02Z

enhancements/installer/kubevirt-ipi.md

+
+TODO
+
+[maturity levels][maturity-levels].


I'd like to see what dev preview entails here

kikisdeliveryservice · 2020-11-17T23:15:04Z

enhancements/installer/kubevirt-ipi.md

+
+- Ability to utilize the enhancement end to end
+- End user documentation, relative API stability
+- Sufficient test coverage


there is no real test plan above so feels like this also needs more detail

More details at openshift/enhancements#417 Signed-off-by: Ravid Brown <ravid@redhat.com>

staebler · 2020-11-21T02:52:34Z

enhancements/installer/kubevirt-ipi.md

+    - secrets for the Ignition configs of the VMs
+    - 1 bootstrap machine


Isn't the bootstrap ignition config going to be too large to fit into a secret?

Ignore this. I'm getting my orders of magnitude mixed up.

openshift-bot · 2021-02-19T18:58:57Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-03-21T20:49:38Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-04-21T00:47:30Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2021-04-21T00:47:42Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Add enhancement: IPI kubevirt provider

b7ad18c

Signed-off-by: Ravid Brown <ravid@redhat.com>

openshift-ci-robot requested review from smarterclayton and squeed August 4, 2020 13:02

russellb reviewed Aug 4, 2020

View reviewed changes

crawford reviewed Aug 5, 2020

View reviewed changes

Fix syntax

a449dad

Modify network option 2 - secondary network to use KNI networking

ee07b2f

Signed-off-by: Ravid Brown <ravid@redhat.com>

nirarg mentioned this pull request Sep 2, 2020

Add KubeVirt platform type openshift/api#734

Merged

Update summary

66b849a

Signed-off-by: Ravid Brown <ravid@redhat.com>

dankenigsberg suggested changes Sep 8, 2020

View reviewed changes

Fix some comments

66c15da

Signed-off-by: Ravid Brown <ravid@redhat.com>

bardielle mentioned this pull request Sep 9, 2020

WIP adding Kubevirt ad a new provider bardielle/release#1

Open

dankenigsberg reviewed Sep 9, 2020

View reviewed changes

crawford reviewed Sep 10, 2020

View reviewed changes

robyoungky reviewed Sep 10, 2020

View reviewed changes

bardielle mentioned this pull request Sep 10, 2020

WIP Kubevirt ipi provider e2e template openshift/release#11738

Closed

Fix comments

4f76214

Signed-off-by: Ravid Brown <ravid@redhat.com>

peterclauterbach reviewed Sep 22, 2020

View reviewed changes

Ravid Brown added 2 commits September 23, 2020 15:04

Fix comments

dc78dc1

Signed-off-by: Ravid Brown <ravid@redhat.com>

update dev_preview to tech_preview criteria

a39a514

Signed-off-by: Ravid Brown <ravid@redhat.com>

ravidbro commented Oct 12, 2020

View reviewed changes

openshift-ci-robot assigned mrunalp Oct 12, 2020

bardielle mentioned this pull request Oct 13, 2020

Add kubevirt platform openshift/cloud-credential-operator#260

Merged

cgwalters mentioned this pull request Nov 2, 2020

Add kubevirt platform openshift/machine-config-operator#2098

Merged

bardielle mentioned this pull request Nov 3, 2020

Adding kubevirt-installer image openshift/installer#4337

Closed

nirarg mentioned this pull request Nov 5, 2020

Add KubeVirt platform as infrastructure for Openshift installation openshift/installer#4350

Merged

nirarg mentioned this pull request Nov 11, 2020

Add Kubevirt provider openshift/machine-api-operator#716

Merged

kikisdeliveryservice reviewed Nov 17, 2020

View reviewed changes

ravidbro pushed a commit to ravidbro/machine-config-operator that referenced this pull request Nov 19, 2020

Add KubeVirt platform

6d005c4

More details at openshift/enhancements#417 Signed-off-by: Ravid Brown <ravid@redhat.com>

staebler reviewed Nov 21, 2020

View reviewed changes

bardielle mentioned this pull request Dec 6, 2020

add kubevirt as a provide to openshift tests openshift/origin#25743

Merged

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 19, 2021

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 21, 2021

openshift-ci-robot closed this Apr 21, 2021


		## Summary

		This document describes how `kubevirt` becomes a platform provider for Openshift. \


		1. Survey

		The installation starts and right after the user supplies his public ssh key,\


		Note: Section not required until targeted at a release.

		Consider the following in developing a test plan for this enhancement:


		For each tenant cluster we will create a namespace with the ClusterID.

		Open question: should the namespace creation done by the user or by the installer.

		before running the installer, and the created networkName (NAD) will be the input for the installer.


		- Storage

		- secrets for the Ignition configs of the VMs
		- 1 bootstrap machine


		TODO

		[maturity levels][maturity-levels].

Add enhancement: IPI kubevirt provider #417

Add enhancement: IPI kubevirt provider #417

Conversation

ravidbro commented Aug 4, 2020

Choose a reason for hiding this comment

ravidbro Aug 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr commented Aug 6, 2020

ravidbro commented Aug 6, 2020

dankenigsberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ravidbro Sep 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crawford left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Oct 12, 2020

Choose a reason for hiding this comment

ravidbro commented Oct 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-bot commented Feb 19, 2021

openshift-bot commented Mar 21, 2021

openshift-bot commented Apr 21, 2021

openshift-ci-robot commented Apr 21, 2021

ravidbro Aug 6, 2020 •

edited

Loading

ravidbro Sep 9, 2020 •

edited

Loading