-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add enhancement: IPI kubevirt provider #417
Conversation
Signed-off-by: Ravid Brown <ravid@redhat.com>
|
||
## Summary | ||
|
||
This document describes how `kubevirt` becomes a platform provider for Openshift. \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the trailing slashes?
also, s/Openshift/OpenShift/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trailing backslashes are for a new line in the .md file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
1. Survey | ||
|
||
The installation starts and right after the user supplies his public ssh key,\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/his/their/
Is there a plan to build a KubeVirt cloud provider so node lifecycle controller, failure domain zone labeling, and other related items like service loadbalancing would function ? |
I think this is part of the plan |
Signed-off-by: Ravid Brown <ravid@redhat.com>
Signed-off-by: Ravid Brown <ravid@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry for picking mostly nits; Pardon for a partial review.
## Motivation | ||
|
||
- Achieve true multi-tenancy of OpenShift were each tenant has dedicated control plane \ | ||
and have full control on its configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
English: s/have full/has full/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
## Summary | ||
|
||
This document describes how `KubeVirt` becomes a platform provider for OpenShift. \ | ||
`KubeVirt` is a virtualization platform running as an extension of Kubernetes. \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add a URL to https://kubevirt.io ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
||
## Summary | ||
|
||
This document describes how `KubeVirt` becomes a platform provider for OpenShift. \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
later you use the term "infra cluster" rather than "platform". I'd prefer using one term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that KubeVirt as a technology/product is a platform but when we name the clusters involved in the solution we name name 'infra cluster' and 'tenant cluster'
virtual machines by KubeVirt for every node in the tenant cluster (master and workers nodes) | ||
and other Openshift/Kubernetes resources to allow **users** (not admins) of the infra cluster | ||
to create a tenant cluster as it was an application running on the infra cluster. | ||
To achieve that we will implement all the components needed for the installer and cluster-api-provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/To achieve that // - I find that it just makes the sentence cumbersome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
- (KubeVirt gap) Interface binding - Currently the only supported binding on the pods | ||
network is masquerade which means that all nodes are behind NAT, each VM | ||
behind the NAT of his own pod. | ||
- (KubeVirt gap) Static IP - Currently the VM egress IP is always the pod IP which is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that this is also a k8s gap, assuming the node IP never changes.
- With this approach admin of the infra cluster will need to be involved in | ||
the creation of each new tenant cluster since NADs need to be created and | ||
probably also nmstate will need to be used to create the topology on the hosts. | ||
At the moment, we will assume that admin created all network resources before running the installer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/at the moment/In this proposal/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd mention that the admin has to predefine the target namespace, too. Adding a NAD into it is just a tweak.
At the moment, we will assume that admin created all network resources before running the installer, | ||
and the created networkName (NAD) will be the input for the installer. | ||
|
||
- Guest-agent is not available at the moment for RHCOS and when Multus is used \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reader may not know what "guest-agent" is.
When you say "at the moment" it appears as if you're having a plan to change that. Is that true? Do you have a PR?
It is not clear to me why a guest-agent is needed with the KubeVirt provider but not with other IaaSs. Can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the qemu-guest-agent is needed because without that when running VMs with Multus the hypervisor isn't aware of the IP that the guest allocates from external DHCP (The issue doesn't exist when using pods network)
In KubeVirt, that means that the attribute IP on the VMI resource will be empty.
It's true not just for KubeVirt but for every qemu based virtualization (oVirt/OpenStack).
Today (It's in progress) all these platforms won't show the IP if the guest OS is RHCOS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a gap anymore, it was released since I sent this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a gap anymore, it was released since I sent this PR
which PR? Can you share a URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to the current enhancement PR, since I wrote it guest-agent container was released in ERRATA and available now
and the created networkName (NAD) will be the input for the installer. | ||
|
||
- Guest-agent is not available at the moment for RHCOS and when Multus is used \ | ||
the VMI is depended on guest agent running inside the guest to report the IP address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what's a VMI, but a typical reader of this doc may not know.
- Storage | ||
|
||
Currently, attaching PV to a running VM (hot-plug) is not supported and may needed to develop CSI | ||
driver for `KubeVirt`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear from this sentence why a CSI is needed; hot-plug is only one possible implementation for a kubevirt-csi, so I don't understand the semantic of the sentence.
Signed-off-by: Ravid Brown <ravid@redhat.com>
- (KubeVirt gap) Static IP - Currently the VM egress IP is always the pod IP which is | ||
changing every time the VM restarts (and new pod is being created). | ||
- (OpenShift/KubeVirt gap) Static IP - Currently OpenShift assumes that node's IP addresses are static, | ||
and the VM egress IP is always the pod IP which is changing every time the VM restarts (and new pod is being created). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sentence is too complex for me to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high-level strategies look good to me. There's a few things blocking some of these paths, but I believe they are all mentioned.
- Provide multi-tenancy and isolation between the tenant clusters | ||
|
||
### Non-Goals | ||
- UPI implementation will be provided separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a double negative. The non-goal is to provide a UPI implementation (right?).
and then choose `KubeVirt` the installation will ask for all the relevant details | ||
of the installation: **kubeconfig** for the infrastructure OpenShift, **namespace**, **storageClass**, | ||
**networkName (NAD)** and other KubeVirt specific attributes. | ||
The installer will validate it can communicate with the api, otherwise it will fail to proceed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you thought through the negative flow and remediation when a failure occurs? ideally the users knows exactly why this failed and how to diagnose, address and fix. And we need to retain any input values the user chose or entered leading up to this point so he/she does not have to enter them again. Same concept should apply for all input values we require.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robyoungky I don't see how this should be different from other platforms. At the end the user will have the same user experience as with any other IPIs. (while I agree this information can probably be improved, it is out of this enhancement scope)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installer config file from any of the IPI installers is persisted, and can be used to deploy the cluster again.
|
||
**Note:** *Section not required until targeted at a release.* | ||
|
||
Consider the following in developing a test plan for this enhancement: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative flow outcomes and recovery, with user and automated remediation.
Signed-off-by: Ravid Brown <ravid@redhat.com>
cluster with platform services as we can or pods deployed in the infrastructure cluster to supply the services | ||
as DNS and load balancing. | ||
|
||
We see two main network options for deployment over KubeVirt: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we be able to support cluster using third-party network SDNs like Calico or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will be able to support only CNIs that are supported by CNV,
Right now, the only option that supported is Multus with bridge CNI,
By theory, every Multus CNI that supplies DHCP and routes in and out should work too.
- (KubeVirt gap) Interface binding - Currently the only supported binding on the pods | ||
network is masquerade which means that all nodes are behind NAT, each VM | ||
behind the NAT of his own pod. | ||
- (OpenShift/KubeVirt gap) Static IP - OpenShift assumes that node's IP addresses are static, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related topic, some customers insist on allocating IPs to VMs though static addressing, DHCP is not allowed on production networks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something that supported on other platforms?
I don't see in the API/YAML a way to supply IP per node,
Also, it contradicts the concept of the machine-API with machine-set, machine-set is like a cattle and not as a pet, it has a property 'replica' which is a number, you modify this number, and VMs are being created/destroyed.
I don't see how that can work with static IPs.
What am I missing here?
|
||
- Storage | ||
|
||
CSI driver for `KubeVirt` is not available yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynamic storage provisioning should be part of the MVP. We previously made this mistake with OCP on RHV IPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, KubeVirt CSI driver is planned for Feb 2021, but I guess you know better on this effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a blocker even for pre-GA versions then we have a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a blocker, we can release this feature without KubeVirt CSI.
|
||
For each tenant cluster we will create a namespace with the ClusterID. | ||
|
||
*Open question: should the namespace creation done by the user or by the installer.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPI implies we do everything. However, some users may have prescribed naming schemes that we should conform to. I'd recommend we give them the options to specify a name, but we create it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we moved to the networking option of using Multus, we are not able anymore to create the namespace, since the input of the installer is NAD resource name and the resource must exists in the namespace before we start theinstallation.
from the infra cluster storageClass and attach it to the relevant VM where the PV will be exposed to the guest | ||
as block device that the driver will attach to the requested pods. | ||
|
||
- Anti-affinity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need to do this, to prevent the cluster from being unrecoverable in the event of the loss of two master nodes. We can use soft affinity, so things can come up in a demo environment.
|
||
**Note:** *Section not required until targeted at a release.* | ||
|
||
Consider the following in developing a test plan for this enhancement: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upgrades of different versions of OCP will be interesting. How far will we allow versions to drift? i.e. could we support an OCP 4.2 tenants cluster alongside an OCP 4.10 cluster, all hosted on an OCP 4.8 cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason why not, we are not relying on any feature/resource that doesn't exist in any OCP 4.x version
work because we control the dnsmasq inside VMs network. | ||
|
||
What could go wrong? | ||
- we may not be able to make the CI play nicely on time and we need as much help |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is a requirement for going GA with any OCP features. No exceptions, AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you right, fixing this one
before running the installer, and the created networkName (NAD) will be the input for the installer. | ||
|
||
|
||
- Storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the other risks to call out is the etcd performance and latency requirements. We've had issues where people deploy OCP clusters in virtual environments with insufficient hardware, and they have all sorts of problems installing the OCP clusters. Worse, the cluster install may go fine, but the cluster goes unhealthy after a few days. I'm not sure what the right answer is, but we should have a discussion about how we can make this easier to validate and troubleshoot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I don't know how to solve it, especially when we are running on Baremetal and not on a public cloud
and then choose `KubeVirt` the installation will ask for all the relevant details | ||
of the installation: **kubeconfig** for the infrastructure OpenShift, **namespace**, **storageClass**, | ||
**networkName (NAD)** and other KubeVirt specific attributes. | ||
The installer will validate it can communicate with the api, otherwise it will fail to proceed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installer config file from any of the IPI installers is persisted, and can be used to deploy the cluster again.
Signed-off-by: Ravid Brown <ravid@redhat.com>
Signed-off-by: Ravid Brown <ravid@redhat.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ravidbro The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- End user documentation, relative API stability | ||
- Sufficient test coverage | ||
- Gather feedback from users rather than just developers | ||
- Approved review by the installer team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenyosef @sdodson FYI
/assign @mrunalp |
that would count as tricky in the implementation and anything particularly | ||
challenging to test should be called out. | ||
|
||
All code is expected to have adequate tests (eventually with coverage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a test plan here.
|
||
TODO | ||
|
||
[maturity levels][maturity-levels]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see what dev preview entails here
|
||
- Ability to utilize the enhancement end to end | ||
- End user documentation, relative API stability | ||
- Sufficient test coverage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no real test plan above so feels like this also needs more detail
More details at openshift/enhancements#417 Signed-off-by: Ravid Brown <ravid@redhat.com>
- secrets for the Ignition configs of the VMs | ||
- 1 bootstrap machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the bootstrap ignition config going to be too large to fit into a secret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore this. I'm getting my orders of magnitude mixed up.
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Ravid Brown ravid@redhat.com