Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external control plane #309

Merged

Conversation

zshi-redhat
Copy link
Collaborator

@zshi-redhat zshi-redhat commented May 26, 2022

Enable sriov operator in the cluster w/o master nodes (k8s apiserver and other control plane components are managed externally).

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

infra := &configv1.Infrastructure{}
err := c.Get(context.TODO(), types.NamespacedName{Name: infraResourceName}, infra)
if err != nil {
return false, err
return "", err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add context information to the error:

Suggested change
return "", err
return "", fmt.Errorf("can't get Infrastructure [%s]: %w, infraResourceName, err)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@adrianchiris
Copy link
Collaborator

@zshi-redhat can you explain abit more about the use-case ?

@zshi-redhat
Copy link
Collaborator Author

@zshi-redhat can you explain abit more about the use-case ?

@adrianchiris The use case is for managed kubernetes services (similar to aws EKS or google GKE), where the kube control plane components (apiserver, etcd, kube-scheduler etc) are hosted by cloud service provider in the provider managed kubernetes cluster and only workload components (worker nodes) are visible to end user. In such case, end user can still deploy sriov operator on the worker node by accessing k8s apiserver managed externally.

DPU two-cluster design can take advantage of this as well, where infra and tenant control planes are hosted in a separate k8s cluster (management cluster) to reduce the footprint of master nodes. dpu-network-operator runs in the management cluster which has access to both infra and tenant APIs.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
tolerations:
{{ if .ExternalControlPlane }}
- key: "node-role.kubernetes.io/worker"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker taint ? that's a thing ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@@ -32,13 +32,21 @@ spec:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
{{ if .ExternalControlPlane }}
Copy link
Collaborator

@adrianchiris adrianchiris Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of external control plane maybe drop the entire 'requiredDuringScheduling....'

or alternatively just add another - matchExpression as these are ORed

replacing one with another is not good IMO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped requiredDuringScheduling in External mode.

@@ -35,13 +35,21 @@ spec:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
{{ if .ExternalControlPlane }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
tolerations:
{{ if .ExternalControlPlane }}
- key: "node-role.kubernetes.io/worker"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about worker taint

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed "node-role.kubernetes.io/worker"

@@ -47,6 +47,10 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: K8S_POD_NAME
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. you already have POD_NAME defined below
  2. you can use spec.nodeName to expose Node name directly and avoid logic added to get node name from pod

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed K8S_POD_NAME and added NODE_NAME

if err != nil {
return false, err
}
switch role {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for switch here imo

if role == worker { return true, nil }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

}

nodeList := &corev1.NodeList{}
err = c.List(context.TODO(), nodeList)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you gen get node by name no ? no need to list and iterate here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

infraResourceName = "cluster"
infraResourceName = "cluster"
workerNodeLabelKey = "node-role.kubernetes.io/worker"
masterNodeLabelKey = "node-role.kubernetes.io/master"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about node-role.kubernetes.io/control-plane ?
in k8s 1.24 kubeadm now replaced master with control-plane (master label deprecated)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added support for both.

@github-actions
Copy link

github-actions bot commented Jul 1, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

github-actions bot commented Jul 1, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zshi-redhat zshi-redhat requested review from bn222 and SchSeba July 1, 2022 10:42
pkg/utils/cluster.go Outdated Show resolved Hide resolved
@bn222
Copy link
Collaborator

bn222 commented Jul 4, 2022

Can you put your first comment in the commit message? Or something more elaborate?

@github-actions
Copy link

github-actions bot commented Jul 6, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zshi-redhat
Copy link
Collaborator Author

Can you put your first comment in the commit message? Or something more elaborate?

added the comment in the commit message.

@bn222
Copy link
Collaborator

bn222 commented Jul 6, 2022

/lgtm

@github-actions github-actions bot added the lgtm label Jul 6, 2022
@SchSeba
Copy link
Collaborator

SchSeba commented Aug 11, 2022

Hi @zshi-redhat will you please be able to rebase this PR?

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zshi-redhat
Copy link
Collaborator Author

Hi @zshi-redhat will you please be able to rebase this PR?

rebased, thanks for reminding!

@coveralls
Copy link

coveralls commented Aug 18, 2022

Pull Request Test Coverage Report for Build 3011830441

  • 17 of 54 (31.48%) changed or added relevant lines in 2 files are covered.
  • 8 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.1%) to 15.993%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/cluster.go 11 48 22.92%
Files with Coverage Reduction New Missed Lines %
controllers/sriovibnetwork_controller.go 2 67.33%
controllers/sriovnetwork_controller.go 6 65.0%
Totals Coverage Status
Change from base Build 3010210970: 0.1%
Covered Lines: 1176
Relevant Lines: 7353

💛 - Coveralls

@bn222
Copy link
Collaborator

bn222 commented Sep 1, 2022

@zshi-redhat We could label the workers in guest cluster with label "master" (such that the role will be master,worker). In that case, this would not be needed. For the sake of my understanding, is that correct? I still think it's nice to allow to run on workers so we will still want to have this PR in.

@zshi-redhat
Copy link
Collaborator Author

@zshi-redhat We could label the workers in guest cluster with label "master" (such that the role will be master,worker). In that case, this would not be needed. For the sake of my understanding, is that correct?

@bn222 This may not be desired from my understanding, since the node pool is supposed to only contain worker nodes, not sure if there will be any side effects to other pods.

btw, do we still need this PR? or is it going to be integreted to Tyler's?

@github-actions
Copy link

github-actions bot commented Sep 1, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zshi-redhat
Copy link
Collaborator Author

@adrianchiris PTAL.

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of nits otherwise LGTM

pkg/utils/cluster.go Show resolved Hide resolved
pkg/utils/cluster.go Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Sep 8, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

github-actions bot commented Sep 8, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

The use case is for managed kubernetes services (similar to
aws EKS or google GKE), where the kubernetes control plane
components (apiserver, etcd, kube-scheduler etc) are hosted
by cloud service provider in the provider managed kubernetes
cluster and only workload components (worker nodes) are
visible to end user. In such case, end user can still deploy
sriov operator on the worker node by accessing k8s apiserver
hosted externally.

Signed-off-by: Zenghui Shi <zshi@redhat.com>
@github-actions
Copy link

github-actions bot commented Sep 8, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@adrianchiris
Copy link
Collaborator

/test-all

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, lets wait for e2e test to pass

@bn222 bn222 merged commit 54ba95d into k8snetworkplumbingwg:master Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants