Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add enhancement: IPI kubevirt provider #417
Add enhancement: IPI kubevirt provider #417
Changes from all commits
b7ad18c
a449dad
ee07b2f
66b849a
66c15da
4f76214
dc78dc1
a39a514
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we be able to support cluster using third-party network SDNs like Calico or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will be able to support only CNIs that are supported by CNV,
Right now, the only option that supported is Multus with bridge CNI,
By theory, every Multus CNI that supplies DHCP and routes in and out should work too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this break "the machine's network will be private" of the first paragraph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, secondary Multus network with its own bridge and its own CIDR (e.g. using whereabouts IPAM) will be a private network for the cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you thought through the negative flow and remediation when a failure occurs? ideally the users knows exactly why this failed and how to diagnose, address and fix. And we need to retain any input values the user chose or entered leading up to this point so he/she does not have to enter them again. Same concept should apply for all input values we require.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robyoungky I don't see how this should be different from other platforms. At the end the user will have the same user experience as with any other IPIs. (while I agree this information can probably be improved, it is out of this enhancement scope)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installer config file from any of the IPI installers is persisted, and can be used to deploy the cluster again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the bootstrap ignition config going to be too large to fit into a secret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore this. I'm getting my orders of magnitude mixed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related topic, some customers insist on allocating IPs to VMs though static addressing, DHCP is not allowed on production networks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something that supported on other platforms?
I don't see in the API/YAML a way to supply IP per node,
Also, it contradicts the concept of the machine-API with machine-set, machine-set is like a cattle and not as a pet, it has a property 'replica' which is a number, you modify this number, and VMs are being created/destroyed.
I don't see how that can work with static IPs.
What am I missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the other risks to call out is the etcd performance and latency requirements. We've had issues where people deploy OCP clusters in virtual environments with insufficient hardware, and they have all sorts of problems installing the OCP clusters. Worse, the cluster install may go fine, but the cluster goes unhealthy after a few days. I'm not sure what the right answer is, but we should have a discussion about how we can make this easier to validate and troubleshoot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I don't know how to solve it, especially when we are running on Baremetal and not on a public cloud
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynamic storage provisioning should be part of the MVP. We previously made this mistake with OCP on RHV IPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, KubeVirt CSI driver is planned for Feb 2021, but I guess you know better on this effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a blocker even for pre-GA versions then we have a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a blocker, we can release this feature without KubeVirt CSI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPI implies we do everything. However, some users may have prescribed naming schemes that we should conform to. I'd recommend we give them the options to specify a name, but we create it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we moved to the networking option of using Multus, we are not able anymore to create the namespace, since the input of the installer is NAD resource name and the resource must exists in the namespace before we start theinstallation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need to do this, to prevent the cluster from being unrecoverable in the event of the loss of two master nodes. We can use soft affinity, so things can come up in a demo environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative flow outcomes and recovery, with user and automated remediation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upgrades of different versions of OCP will be interesting. How far will we allow versions to drift? i.e. could we support an OCP 4.2 tenants cluster alongside an OCP 4.10 cluster, all hosted on an OCP 4.8 cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason why not, we are not relying on any feature/resource that doesn't exist in any OCP 4.x version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a test plan here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see what dev preview entails here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no real test plan above so feels like this also needs more detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenyosef @sdodson FYI