Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enhancement: IPI kubevirt provider #417

Closed
wants to merge 8 commits into from

Conversation

ravidbro
Copy link

@ravidbro ravidbro commented Aug 4, 2020

Signed-off-by: Ravid Brown ravid@redhat.com

Signed-off-by: Ravid Brown <ravid@redhat.com>

## Summary

This document describes how `kubevirt` becomes a platform provider for Openshift. \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the trailing slashes?

also, s/Openshift/OpenShift/

Copy link
Author

@ravidbro ravidbro Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailing backslashes are for a new line in the .md file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


1. Survey

The installation starts and right after the user supplies his public ssh key,\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/his/their/

@derekwaynecarr
Copy link
Member

Is there a plan to build a KubeVirt cloud provider so node lifecycle controller, failure domain zone labeling, and other related items like service loadbalancing would function ?

@ravidbro
Copy link
Author

ravidbro commented Aug 6, 2020

Is there a plan to build a KubeVirt cloud provider so node lifecycle controller, failure domain zone labeling, and other related items like service loadbalancing would function ?

I think this is part of the plan

Signed-off-by: Ravid Brown <ravid@redhat.com>
Signed-off-by: Ravid Brown <ravid@redhat.com>
Copy link
Contributor

@dankenigsberg dankenigsberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for picking mostly nits; Pardon for a partial review.

## Motivation

- Achieve true multi-tenancy of OpenShift were each tenant has dedicated control plane \
and have full control on its configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English: s/have full/has full/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

## Summary

This document describes how `KubeVirt` becomes a platform provider for OpenShift. \
`KubeVirt` is a virtualization platform running as an extension of Kubernetes. \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a URL to https://kubevirt.io ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


## Summary

This document describes how `KubeVirt` becomes a platform provider for OpenShift. \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

later you use the term "infra cluster" rather than "platform". I'd prefer using one term.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that KubeVirt as a technology/product is a platform but when we name the clusters involved in the solution we name name 'infra cluster' and 'tenant cluster'

virtual machines by KubeVirt for every node in the tenant cluster (master and workers nodes)
and other Openshift/Kubernetes resources to allow **users** (not admins) of the infra cluster
to create a tenant cluster as it was an application running on the infra cluster.
To achieve that we will implement all the components needed for the installer and cluster-api-provider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/To achieve that // - I find that it just makes the sentence cumbersome.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- (KubeVirt gap) Interface binding - Currently the only supported binding on the pods
network is masquerade which means that all nodes are behind NAT, each VM
behind the NAT of his own pod.
- (KubeVirt gap) Static IP - Currently the VM egress IP is always the pod IP which is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that this is also a k8s gap, assuming the node IP never changes.

- With this approach admin of the infra cluster will need to be involved in
the creation of each new tenant cluster since NADs need to be created and
probably also nmstate will need to be used to create the topology on the hosts.
At the moment, we will assume that admin created all network resources before running the installer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/at the moment/In this proposal/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd mention that the admin has to predefine the target namespace, too. Adding a NAD into it is just a tweak.

At the moment, we will assume that admin created all network resources before running the installer,
and the created networkName (NAD) will be the input for the installer.

- Guest-agent is not available at the moment for RHCOS and when Multus is used \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reader may not know what "guest-agent" is.

When you say "at the moment" it appears as if you're having a plan to change that. Is that true? Do you have a PR?

It is not clear to me why a guest-agent is needed with the KubeVirt provider but not with other IaaSs. Can you explain?

Copy link
Author

@ravidbro ravidbro Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the qemu-guest-agent is needed because without that when running VMs with Multus the hypervisor isn't aware of the IP that the guest allocates from external DHCP (The issue doesn't exist when using pods network)
In KubeVirt, that means that the attribute IP on the VMI resource will be empty.
It's true not just for KubeVirt but for every qemu based virtualization (oVirt/OpenStack).
Today (It's in progress) all these platforms won't show the IP if the guest OS is RHCOS

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a gap anymore, it was released since I sent this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a gap anymore, it was released since I sent this PR

which PR? Can you share a URL?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to the current enhancement PR, since I wrote it guest-agent container was released in ERRATA and available now

and the created networkName (NAD) will be the input for the installer.

- Guest-agent is not available at the moment for RHCOS and when Multus is used \
the VMI is depended on guest agent running inside the guest to report the IP address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what's a VMI, but a typical reader of this doc may not know.

- Storage

Currently, attaching PV to a running VM (hot-plug) is not supported and may needed to develop CSI
driver for `KubeVirt`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear from this sentence why a CSI is needed; hot-plug is only one possible implementation for a kubevirt-csi, so I don't understand the semantic of the sentence.

Signed-off-by: Ravid Brown <ravid@redhat.com>
- (KubeVirt gap) Static IP - Currently the VM egress IP is always the pod IP which is
changing every time the VM restarts (and new pod is being created).
- (OpenShift/KubeVirt gap) Static IP - Currently OpenShift assumes that node's IP addresses are static,
and the VM egress IP is always the pod IP which is changing every time the VM restarts (and new pod is being created).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence is too complex for me to parse.

Copy link
Contributor

@crawford crawford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high-level strategies look good to me. There's a few things blocking some of these paths, but I believe they are all mentioned.

- Provide multi-tenancy and isolation between the tenant clusters

### Non-Goals
- UPI implementation will be provided separately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a double negative. The non-goal is to provide a UPI implementation (right?).

and then choose `KubeVirt` the installation will ask for all the relevant details
of the installation: **kubeconfig** for the infrastructure OpenShift, **namespace**, **storageClass**,
**networkName (NAD)** and other KubeVirt specific attributes.
The installer will validate it can communicate with the api, otherwise it will fail to proceed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you thought through the negative flow and remediation when a failure occurs? ideally the users knows exactly why this failed and how to diagnose, address and fix. And we need to retain any input values the user chose or entered leading up to this point so he/she does not have to enter them again. Same concept should apply for all input values we require.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robyoungky I don't see how this should be different from other platforms. At the end the user will have the same user experience as with any other IPIs. (while I agree this information can probably be improved, it is out of this enhancement scope)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installer config file from any of the IPI installers is persisted, and can be used to deploy the cluster again.


**Note:** *Section not required until targeted at a release.*

Consider the following in developing a test plan for this enhancement:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative flow outcomes and recovery, with user and automated remediation.

Signed-off-by: Ravid Brown <ravid@redhat.com>
cluster with platform services as we can or pods deployed in the infrastructure cluster to supply the services
as DNS and load balancing.

We see two main network options for deployment over KubeVirt:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we be able to support cluster using third-party network SDNs like Calico or similar?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will be able to support only CNIs that are supported by CNV,
Right now, the only option that supported is Multus with bridge CNI,
By theory, every Multus CNI that supplies DHCP and routes in and out should work too.

- (KubeVirt gap) Interface binding - Currently the only supported binding on the pods
network is masquerade which means that all nodes are behind NAT, each VM
behind the NAT of his own pod.
- (OpenShift/KubeVirt gap) Static IP - OpenShift assumes that node's IP addresses are static,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related topic, some customers insist on allocating IPs to VMs though static addressing, DHCP is not allowed on production networks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that something that supported on other platforms?
I don't see in the API/YAML a way to supply IP per node,
Also, it contradicts the concept of the machine-API with machine-set, machine-set is like a cattle and not as a pet, it has a property 'replica' which is a number, you modify this number, and VMs are being created/destroyed.
I don't see how that can work with static IPs.
What am I missing here?


- Storage

CSI driver for `KubeVirt` is not available yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic storage provisioning should be part of the MVP. We previously made this mistake with OCP on RHV IPI

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, KubeVirt CSI driver is planned for Feb 2021, but I guess you know better on this effort.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a blocker even for pre-GA versions then we have a problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a blocker, we can release this feature without KubeVirt CSI.


For each tenant cluster we will create a namespace with the ClusterID.

*Open question: should the namespace creation done by the user or by the installer.*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPI implies we do everything. However, some users may have prescribed naming schemes that we should conform to. I'd recommend we give them the options to specify a name, but we create it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we moved to the networking option of using Multus, we are not able anymore to create the namespace, since the input of the installer is NAD resource name and the resource must exists in the namespace before we start theinstallation.

from the infra cluster storageClass and attach it to the relevant VM where the PV will be exposed to the guest
as block device that the driver will attach to the requested pods.

- Anti-affinity

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to do this, to prevent the cluster from being unrecoverable in the event of the loss of two master nodes. We can use soft affinity, so things can come up in a demo environment.


**Note:** *Section not required until targeted at a release.*

Consider the following in developing a test plan for this enhancement:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrades of different versions of OCP will be interesting. How far will we allow versions to drift? i.e. could we support an OCP 4.2 tenants cluster alongside an OCP 4.10 cluster, all hosted on an OCP 4.8 cluster?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason why not, we are not relying on any feature/resource that doesn't exist in any OCP 4.x version

work because we control the dnsmasq inside VMs network.

What could go wrong?
- we may not be able to make the CI play nicely on time and we need as much help

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is a requirement for going GA with any OCP features. No exceptions, AFAIK.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you right, fixing this one

before running the installer, and the created networkName (NAD) will be the input for the installer.


- Storage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the other risks to call out is the etcd performance and latency requirements. We've had issues where people deploy OCP clusters in virtual environments with insufficient hardware, and they have all sorts of problems installing the OCP clusters. Worse, the cluster install may go fine, but the cluster goes unhealthy after a few days. I'm not sure what the right answer is, but we should have a discussion about how we can make this easier to validate and troubleshoot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I don't know how to solve it, especially when we are running on Baremetal and not on a public cloud

and then choose `KubeVirt` the installation will ask for all the relevant details
of the installation: **kubeconfig** for the infrastructure OpenShift, **namespace**, **storageClass**,
**networkName (NAD)** and other KubeVirt specific attributes.
The installer will validate it can communicate with the api, otherwise it will fail to proceed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installer config file from any of the IPI installers is persisted, and can be used to deploy the cluster again.

Ravid Brown added 2 commits September 23, 2020 15:04
Signed-off-by: Ravid Brown <ravid@redhat.com>
Signed-off-by: Ravid Brown <ravid@redhat.com>
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ravidbro
To complete the pull request process, please assign mrunalp after the PR has been reviewed.
You can assign the PR to them by writing /assign @mrunalp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- End user documentation, relative API stability
- Sufficient test coverage
- Gather feedback from users rather than just developers
- Approved review by the installer team.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravidbro
Copy link
Author

/assign @mrunalp

that would count as tricky in the implementation and anything particularly
challenging to test should be called out.

All code is expected to have adequate tests (eventually with coverage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a test plan here.


TODO

[maturity levels][maturity-levels].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see what dev preview entails here


- Ability to utilize the enhancement end to end
- End user documentation, relative API stability
- Sufficient test coverage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no real test plan above so feels like this also needs more detail

ravidbro pushed a commit to ravidbro/machine-config-operator that referenced this pull request Nov 19, 2020
More details at openshift/enhancements#417

Signed-off-by: Ravid Brown <ravid@redhat.com>
Comment on lines +102 to +103
- secrets for the Ignition configs of the VMs
- 1 bootstrap machine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the bootstrap ignition config going to be too large to fit into a secret?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this. I'm getting my orders of magnitude mixed up.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 19, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 21, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.