Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create extended resources with NodeFeatureRule #1099

Merged

Conversation

ArangoGutierrez
Copy link
Contributor

@ArangoGutierrez ArangoGutierrez commented Mar 28, 2023

Fixes #1081

Add support for management of extended resources with the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez eduardoa@nvidia.com
Co-authored-by: Fabiano Fidêncio fabiano.fidencio@intel.com

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 28, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 28, 2023
@netlify
Copy link

netlify bot commented Mar 28, 2023

Deploy Preview for kubernetes-sigs-nfd ready!

Name Link
🔨 Latest commit 250aea4
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-nfd/deploys/6430257e9940c70008e4a4cd
😎 Deploy Preview https://deploy-preview-1099--kubernetes-sigs-nfd.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@ArangoGutierrez
Copy link
Contributor Author

/assign marquiz

@ArangoGutierrez
Copy link
Contributor Author

using

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
  name: my-sample-rule-object
spec:
  rules:
    - name: "my sample rule"
      extendedResources: 
        - "kernel-version.major"
      matchFeatures:
        - feature: kernel.version
          matchExpressions:
            major: {op: Exists}

I was able to get

Capacity:
  cpu:                                              4
  ephemeral-storage:                                17784752Ki
  feature.node.kubernetes.io/kernel-version.major:  5
  hugepages-2Mi:                                    0
  memory:                                           8135008Ki
  pods:                                             110
Allocatable:
  cpu:                                              4
  ephemeral-storage:                                17784752Ki
  feature.node.kubernetes.io/kernel-version.major:  5
  hugepages-2Mi:                                    0
  memory:                                           8135008Ki
  pods:                                             110
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/crio/crio.sock
                    nfd.node.kubernetes.io/extended-resources: kernel-version.major

(used kernel-version just bc is an int, so it was easy to use as example)

@ArangoGutierrez
Copy link
Contributor Author

/cc @fidencio

@k8s-ci-robot
Copy link
Contributor

@ArangoGutierrez: GitHub didn't allow me to request PR reviews from the following users: fidencio.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @fidencio

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

docs/usage/customization-guide.md Outdated Show resolved Hide resolved
docs/usage/customization-guide.md Outdated Show resolved Hide resolved
@fidencio
Copy link
Contributor

fidencio commented Mar 29, 2023

Using this PR, combined with mine , with the following rule ...

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
  name: tdx-total-keys
spec:
  rules:
    - name: "TDX total keys rule"
      extendedResources:
        - "cpu-security.tdx.total_keys"
      matchFeatures:
        - feature: cpu.security
          matchExpressions:
            tdx.enabled: {op: Exists}

... I was able to get ...

{
  "cpu": "128",
  "ephemeral-storage": "383957772Ki",
  "feature.node.kubernetes.io/cpu-security.tdx.total_keys": "31",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "memory": "131192364Ki",
  "pods": "110"
}

cmd/nfd-master/main.go Outdated Show resolved Hide resolved
@zvonkok
Copy link
Contributor

zvonkok commented Mar 30, 2023

@fidencio Thanks for the examples here and on the depending PR, this explains the bigger picture.

@ArangoGutierrez AMD told me that the only countable resources for SNP is the number of ASIDs. We can do the same for SNP what Fabiano has done here for TDX.

@zvonkok
Copy link
Contributor

zvonkok commented Mar 30, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9a0dd119a383f5e5304659a1a8e951d5caba7721

@mythi
Copy link
Contributor

mythi commented Mar 30, 2023

@fidencio Thanks for the examples here and on the depending PR, this explains the bigger picture.

@zvonkok one hiccup we noticed yesterday was that these extended resources are not correctly shown in nodes' Allocatable. However, the accounting works fine. See kubernetes/kubernetes#115190

pkg/apis/nfd/v1alpha1/annotations_labels.go Outdated Show resolved Hide resolved
pkg/nfd-master/nfd-master.go Outdated Show resolved Hide resolved
docs/usage/customization-guide.md Outdated Show resolved Hide resolved
@ArangoGutierrez ArangoGutierrez force-pushed the extended_resources_v2 branch 2 times, most recently from 0c67733 to 96ddbae Compare April 6, 2023 12:59
@ArangoGutierrez ArangoGutierrez requested a review from marquiz April 6, 2023 12:59
@ArangoGutierrez ArangoGutierrez force-pushed the extended_resources_v2 branch 2 times, most recently from b586bc9 to 04707d9 Compare April 6, 2023 15:52
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big hand for this work @ArangoGutierrez. I'd be happy to get this in 😎 Really cool new feature and nice e2e coverage, too. +1

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ArangoGutierrez, marquiz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 6, 2023
@ArangoGutierrez
Copy link
Contributor Author

@PiotrProkop @zvonkok @fidencio PR is ready for your lgtm now :)

@marquiz
Copy link
Contributor

marquiz commented Apr 6, 2023

ping @fidencio @PiotrProkop @mythi @zvonkok

@marquiz
Copy link
Contributor

marquiz commented Apr 6, 2023

Hmm, e2e failed
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 6, 2023
@ArangoGutierrez
Copy link
Contributor Author

Hmm, e2e failed /hold

Ran 8 of 263 Specs in 279.768 seconds
SUCCESS! -- 8 Passed | 0 Failed | 0 Pending | 255 Skipped
--- PASS: TestE2E (279.77s)
PASS
ok      sigs.k8s.io/node-feature-discovery/test/e2e     279.813s

I got a successful run before pushing

@marquiz
Copy link
Contributor

marquiz commented Apr 6, 2023

There's an ancient bug in nfd-master that this PR exposed. Fixed by #1119. I'd suggest to merge that one first either way is fine, we need both of these. Anyway, this PR (1099) still legit
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 6, 2023
@ArangoGutierrez
Copy link
Contributor Author

There's an ancient bug in nfd-master that this PR exposed. Fixed by #1119. I'd suggest to merge that one first either way is fine, we need both of these. Anyway, this PR (1099) still legit /unhold

And we are back in business here!

@ArangoGutierrez
Copy link
Contributor Author

/test all

@PiotrProkop
Copy link
Contributor

I am happy to LGTM this PR but something is wrong with Netlify :(

@PiotrProkop
Copy link
Contributor

/retest

@ArangoGutierrez
Copy link
Contributor Author

/retest

Prow comments don't retrigger GitHub actions.
And I don't think is the PR changes fault for Netify errors .... I think jejeje 🤞

Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
@ArangoGutierrez ArangoGutierrez force-pushed the extended_resources_v2 branch from 04707d9 to 250aea4 Compare April 7, 2023 14:15
@ArangoGutierrez
Copy link
Contributor Author

@PiotrProkop Netlify is happy now

@PiotrProkop
Copy link
Contributor

/lgtm

Good job! @ArangoGutierrez @fidencio

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 7, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 01d6880d520986fbccdd492edea49c36d4fbc501

@k8s-ci-robot k8s-ci-robot merged commit ad07829 into kubernetes-sigs:master Apr 7, 2023
@mythi
Copy link
Contributor

mythi commented Apr 11, 2023

@ArangoGutierrez @fidencio thanks!

@marquiz marquiz mentioned this pull request Apr 12, 2023
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create extended resources with NodeFeatureRule
8 participants