-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce DPU Network Operator #890
Introduce DPU Network Operator #890
Conversation
bb7d3fe
to
e021d60
Compare
|
||
The DPU OVNKube Operator will be responsible for enable the switchdev mode on | ||
BF2 cards. We assume all the BF2 cards will be put into a custom | ||
MachineConfigPool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma should be a period, or else there's something missing at the end?
Who is responsible for the custom MachineConfigPool? When are they supposed to create it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is responsible for the custom MachineConfigPool? When are they supposed to create it?
That is a good question. In my assumption, the cluster-admin shall label the nodes and create the MCP. WDYT?
|
||
We will create a namespaced CRD OVNKubeConfig.smartnic.openshift.io for the | ||
DPU OVNKube Operator. One custom resource of this CRD shall be created for | ||
the tenant cluster. The operator will take this custom resource as input to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's created "for the tenant cluster", but it's created in the infra cluster, which again probably needs to be clarified.
How is the admin expected to create this resource? We should probably provide at least a script or something to grab the necessary information from the tenant cluster and create the resource in the infra cluster...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get the question. This operator only needs the kubeconfig of the tenant cluster, then it can access the apiserver to grab all the info. Do you mean we need a script to generate the kubeconfig file for the tenant cluster?
In my assumption, the whole process shall be like this:
- After infra cluster installation, the admin labels the DPUs which he wants to enable hardware offload.
- The admin creates a custom MCP that contains all the selected DPUs.
- The admin configures the CNO to deprovision the ovn-kube components from the DPUs.
- The admin installs the DPU operator and creates an OVNKubeConfig CR with the
poolName
but leaveskubeConfigFile
blank. The operator then does the host network configuration. - Install the tenant cluster. Scale up the x86 workers with DPU.
- The CNO of the tenant cluster puts those x86 hosts into smart-nic host mode.
- The admin configures VFs on x86 node with the Sriov Network Operator.
- The admin stores the kubeconfig file in the infra cluster then adds
kubeConfigFile
to the existing OVNKubeConfig CR. The operator installs the ovn-kube components on the selected DPUs.
f918873
to
0eaa1c7
Compare
- enable the switchdev mode on DPUs. | ||
- add PF representor to the br-ex bridge, so that the x86 host can connect to | ||
the cluster network of the tenant cluster. | ||
- Define the ovn-kube upgrade mechanism in such a deployment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the ovn-kube upgrade mechanism is discussed in this doc (we talked about the ovnkube operator upgrade in the lifecycle management section).
How can we ensure the version of ovnkube-node in infra cluster is compatible with the version of ovn-kube components in tenant cluster during cluster upgrade or during ovnkube operator installation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, should this OVNKube operator fetch the image information from tenant cluster as well and use the image to spin up ovnkube-node in infra cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assume the two clusters will always be in the same version, and upgrade together? Then we can let the operator use the images which are currently used in the infra cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to define the version compatibility between infra and tenant cluster, like what is supported and not supported. For example, we may say that 4.x tenant is compatible with 4.x+1 infra or 4.x+n infra.
This would allow new feature be introduced without updating tenant cluster. for example, the update of rhcos image in infra cluster (openvswitch, driver etc) may bring in new features, Operator designed to run on infra cluster would also benefit from it.
Thinking loudly, if we were to move ovn-central to infra cluster, it would further decouple the infra and tenant in terms of cluster or ovnkube versions. which allow more features be added without updating tenant cluster.
## Design Details | ||
|
||
The DPU OVNKube Operator will fetch the necessary information from the | ||
tenant cluster, then create the following objects in the infra cluster: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add that OVNKube operator will watch those runtime ovn-k8s changes in tenant cluster and reflect that accordingly in the infra clusters? Like whenever the configMap or Secret are changed in tenant cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a good point. AFAIK, those ovnkube objects can not be changed after cluster installation because CNO doesn't allow them to. But the certificates do expire, we need to watch the change in the tenant cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example I had in mind was the ipsec enablement, it can be done after initial cluster installation and ovnkube Operator will need to render the ipsec daemonset in infra cluster.
|
||
## Design Details | ||
|
||
The DPU OVNKube Operator will fetch the necessary information from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this Operator need to handle kube API compatibility between infra and tenant clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that necessary? I assume both clusters shall in the same version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allow tenant and infra to be in different cluster versions (which I think we should if not in the initial delivery), then yes.
0eaa1c7
to
16af993
Compare
## Motivation | ||
|
||
Normally, the Cluster Network Operator watches the Network customer resource, | ||
then provision the ovn-kube components to the Openshift cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally CNO manages the ovn-kube components. If this new operator is taking over that role for the infra cluster, do we need to document somewhere that CNO needs to be changed to not manage OVN-Kube on infra clusters and what's the CNO trigger for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operator is not taking take over that role for the infra cluster, but managing the ovnkube that servers the tenant cluster. The CNO will still manage the ovnkube of the infra cluster on non-BF2 nodes.
@zshi-redhat Shall we mention these CNO changes in #732
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a similar patch in CNO like this openshift/cluster-network-operator#1138, to isolate the CNO managed ovnkube from being scheduled on DPU in infra cluster. For example, to introduce the label network.operator.openshift.io/dpu
which will be used by CNO in infra cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this into the you may need to add some more dummy sections to make the latest markdownlint job happy too... |
d052542
to
52d1b2e
Compare
Signed-off-by: Peng Liu <Peng Liu>
52d1b2e
to
6caff80
Compare
Done |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Peng Liu