Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation capturing enablement of NFD-Topology-Updater in NFD #526

Conversation

swatisehgal
Copy link
Contributor

@swatisehgal swatisehgal commented May 13, 2021

Prior to this feature, NFD consisted of only two software components namely
nfd-master and nfd-worker. We have introduced another software component
called nfd-topology-updater.

NFD-Topology-Updater is a daemon responsible for examining allocated resources
on a worker node to account for allocatable resources on a per-zone basis (where
a zone can be a NUMA node). It then communicates the information to nfd-master
which does the CRD creation corresponding to all the nodes in the cluster. One
instance of nfd-topology-updater is supposed to be running on each node of the
cluster.

Design Document: here
Implementation PR: #525
Issue: #333

Signed-off-by: Swati Sehgal swsehgal@redhat.com

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 13, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @swatisehgal. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 13, 2021
@swatisehgal
Copy link
Contributor Author

/cc @zvonkok @marquiz

Copy link
Contributor

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments

docs/get-started/deployment-and-usage.md Outdated Show resolved Hide resolved
docs/get-started/quick-start.md Outdated Show resolved Hide resolved
docs/get-started/deployment-and-usage.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second round of review on my end

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2021
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc update! Some comments. Need to rebase and align this PR with the latest changes brought in by e.g. kustomize. Better to do a another review when #525 starts to be in shape.

Maybe you could add a sentence or two (e.g. to introduction or then a section under deployment and usage) about the planned usage scenario(s).

docs/advanced/developer-guide.md Outdated Show resolved Hide resolved
docs/advanced/developer-guide.md Outdated Show resolved Hide resolved
docs/advanced/topology-updater-commandline-reference.md Outdated Show resolved Hide resolved
docs/get-started/deployment-and-usage.md Outdated Show resolved Hide resolved
docs/get-started/deployment-and-usage.md Outdated Show resolved Hide resolved
docs/get-started/deployment-and-usage.md Outdated Show resolved Hide resolved
docs/get-started/introduction.md Outdated Show resolved Hide resolved
docs/get-started/introduction.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, mostly we need to update after the Kustomize rebase

docs/get-started/deployment-and-usage.md Outdated Show resolved Hide resolved
docs/get-started/deployment-and-usage.md Outdated Show resolved Hide resolved
docs/get-started/deployment-and-usage.md Outdated Show resolved Hide resolved
docs/get-started/quick-start.md Outdated Show resolved Hide resolved
docs/get-started/quick-start.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2021
@swatisehgal
Copy link
Contributor Author

@marquiz @ArangoGutierrez The PR has been rebased and updated. PTAL when you have time.

@marquiz
Copy link
Contributor

marquiz commented Sep 16, 2021

The PR has been rebased and updated. PTAL when you have time.

Thanks, I will do

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 16, 2021
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another quick-ish review round. Some small comments (plus the picky md linter errors now that I remembered to enable testing 🙄 ) but I think this looks pretty good overall to get started with nfd-topology-updater 👍

There are parts in the deployment section for example which I skipped for now. Need to do yet another final review round after #525 is in final shape (or merged) to see that these are in sync.

docs/advanced/developer-guide.md Outdated Show resolved Hide resolved
docs/advanced/developer-guide.md Outdated Show resolved Hide resolved
docs/advanced/topology-updater-commandline-reference.md Outdated Show resolved Hide resolved
docs/get-started/deployment-and-usage.md Outdated Show resolved Hide resolved
docs/get-started/deployment-and-usage.md Outdated Show resolved Hide resolved
@swatisehgal
Copy link
Contributor Author

Thanks for the review, I will update this PR once the deployment process is finalized in #525.

@swatisehgal swatisehgal force-pushed the topology-updater-documentation branch 2 times, most recently from 97667b4 to d80b504 Compare October 21, 2021 13:43
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 21, 2021
@swatisehgal swatisehgal force-pushed the topology-updater-documentation branch 3 times, most recently from 78d03e4 to 65e698a Compare October 21, 2021 14:00
@swatisehgal swatisehgal force-pushed the topology-updater-documentation branch 3 times, most recently from 4cde007 to 22a6ec8 Compare October 21, 2021 14:28
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 21, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 21, 2021
@swatisehgal swatisehgal force-pushed the topology-updater-documentation branch 4 times, most recently from c8175e5 to ae396ce Compare October 21, 2021 14:49
@swatisehgal
Copy link
Contributor Author

@marquiz I have squashed the commits and the PR is ready for your review.

Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more comment in the "code".

Could you also rebase on top of latest master in order to get rid of the changes to source/.

Comment on lines 187 to 189
NFD master and NFD Topologyupdater can be configured to be deployed
as separate pods. The `topologyupdater` overlay may be used to
achieve this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the "as separate pods" part is somewhat confusing as we do not provide a way to deploy them in the same pod. Food for thought 😀:

In order to deploy just nfd-opologyupdater without nfd-worker use the `topologyupdater` overlay:

Also, from the previous version:

NFD Topologyupdater can be configured along with the default overlay, all
deployed as separate pods. The `topologyupdater` overlay may be used along
with `default` overlay to achieve this:

I think this could/should incorporated in this section as well. Basically, you can deploy topology-updater later on by deploying this overlay.

Copy link
Contributor Author

@swatisehgal swatisehgal Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marquiz Addressed your comments but there is tiny issue I see in explicitly mentioning the deployment of topologyupdater with default overlay here. Both default overlay topologyupdater overlay deploy the master leading to an error message when we deploy the second overlay that the masteralready exists which could give the impression that there has been an error. It is not a big deal but just want us to be conscious of that.

Copy link
Contributor

@marquiz marquiz Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deploy the master leading to an error message when we deploy the second overlay that the masteralready exists which could give the impression that there has been an error. It is not a big deal but just want us to be conscious of that.

Hmm, what kind of an error do you get? I think you should get just smth like "deployment unchanged" or "deployment configured" 🤔 You should basically get identical objects (for master, rbac rules etc) from the two overlays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors are because some kubernetes objects (nfd-master, service accounts etc) already exist. Irrespective of the errors topologyupdater is deployed appropriately. Please refer below for how it looks like in action:

# deploying default overlay
$ kubectl create -k .
namespace/node-feature-discovery created
serviceaccount/nfd-master created
clusterrole.rbac.authorization.k8s.io/nfd-master created
clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
configmap/nfd-worker-conf created
service/nfd-master created
deployment.apps/nfd-master created
daemonset.apps/nfd-worker created


# deploying the topologyupdater after deploying the default overlay
$ kubectl create -k deployment/overlays/topologyupdater/
customresourcedefinition.apiextensions.k8s.io/noderesourcetopologies.topology.node.k8s.io created
serviceaccount/nfd-topology-updater created
clusterrole.rbac.authorization.k8s.io/nfd-topology-updater created
clusterrolebinding.rbac.authorization.k8s.io/nfd-topology-updater created
daemonset.apps/nfd-topology-updater created
Error from server (AlreadyExists): error when creating "deployment/overlays/topologyupdater/": namespaces "node-feature-discovery" already exists
Error from server (AlreadyExists): error when creating "deployment/overlays/topologyupdater/": serviceaccounts "nfd-master" already exists
Error from server (AlreadyExists): error when creating "deployment/overlays/topologyupdater/": clusterroles.rbac.authorization.k8s.io "nfd-master" already exists
Error from server (AlreadyExists): error when creating "deployment/overlays/topologyupdater/": clusterrolebindings.rbac.authorization.k8s.io "nfd-master" already exists
Error from server (AlreadyExists): error when creating "deployment/overlays/topologyupdater/": services "nfd-master" already exists
Error from server (AlreadyExists): error when creating "deployment/overlays/topologyupdater/": deployments.apps "nfd-master" already exists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...so I agree that getting some errors is confusing. I now tested this myself and didn't get any errors

$ kubectl apply -k deployment/overlays/default
namespace/node-feature-discovery created
serviceaccount/nfd-master created
clusterrole.rbac.authorization.k8s.io/nfd-master created
clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
configmap/nfd-worker-conf created
service/nfd-master created
deployment.apps/nfd-master created
daemonset.apps/nfd-worker created

$ kubectl apply -k deployment/overlays/topologyupdater
namespace/node-feature-discovery unchanged
customresourcedefinition.apiextensions.k8s.io/noderesourcetopologies.topology.node.k8s.io created
serviceaccount/nfd-master unchanged
serviceaccount/nfd-topology-updater created
clusterrole.rbac.authorization.k8s.io/nfd-master unchanged
clusterrole.rbac.authorization.k8s.io/nfd-topology-updater created
clusterrolebinding.rbac.authorization.k8s.io/nfd-master unchanged
clusterrolebinding.rbac.authorization.k8s.io/nfd-topology-updater created
service/nfd-master unchanged
deployment.apps/nfd-master configured
daemonset.apps/nfd-topology-updater created

I think we can merge the PR – after we get happy with this pararagraph of the documentation 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Markus for pointing out that I should be using kubectl apply instead of kubectl create as we already have in the documentation :) All looks good:

$kubectl apply -k .
namespace/node-feature-discovery created
serviceaccount/nfd-master created
clusterrole.rbac.authorization.k8s.io/nfd-master created
clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
configmap/nfd-worker-conf created
service/nfd-master created
deployment.apps/nfd-master created
daemonset.apps/nfd-worker created

$ kubectl apply -k deployment/overlays/topologyupdater/
namespace/node-feature-discovery unchanged
customresourcedefinition.apiextensions.k8s.io/noderesourcetopologies.topology.node.k8s.io created
serviceaccount/nfd-master unchanged
serviceaccount/nfd-topology-updater created
clusterrole.rbac.authorization.k8s.io/nfd-master unchanged
clusterrole.rbac.authorization.k8s.io/nfd-topology-updater created
clusterrolebinding.rbac.authorization.k8s.io/nfd-master unchanged
clusterrolebinding.rbac.authorization.k8s.io/nfd-topology-updater created
service/nfd-master unchanged
deployment.apps/nfd-master configured
daemonset.apps/nfd-topology-updater created

Prior to this feature, NFD consisted of only software components namely
nfd-master and nfd-worker. We have introduced another software component
called nfd-topology-updater.

NFD-Topology-Updater is a daemon responsible for examining allocated resources
on a worker node to account for allocatable resources on a per-zone basis (where
a zone can be a NUMA node). It then communicates the information to nfd-master
which does the CRD creation corresponding to all the nodes in the cluster. One
instance of nfd-topology-updater is supposed to be running on each node of the
cluster.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @swatisehgal for this! Users will appreciate 😄

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marquiz, swatisehgal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2021
@marquiz
Copy link
Contributor

marquiz commented Oct 29, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2021
@k8s-ci-robot k8s-ci-robot merged commit 347b16d into kubernetes-sigs:master Oct 29, 2021
@marquiz marquiz mentioned this pull request Dec 22, 2021
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants