Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying different label templates for different PCI devices #508

Closed
kpouget opened this issue Apr 7, 2021 · 15 comments
Closed
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@kpouget
Copy link

kpouget commented Apr 7, 2021

I am currently working with a custom PCI device, identified by a specific vendor and device ID.

I would like to be able to specify a particular set of fields for this device, eg deviceLabelFields: [vendor, device]

and at the same time, I want to keep the existing labeling untouched, in particular I don't want the GPU PCI cards labels to change, as they are mandatory for the deployment of the GPU Operator:

var gpuNodeLabels = map[string]string{
	"feature.node.kubernetes.io/pci-10de.present":      "true",
	"feature.node.kubernetes.io/pci-0302_10de.present": "true",
	"feature.node.kubernetes.io/pci-0300_10de.present": "true",
}
@kpouget kpouget added the kind/bug Categorizes issue or PR as related to a bug. label Apr 7, 2021
@kpouget
Copy link
Author

kpouget commented Apr 7, 2021

@zvonkok FYI

@marquiz
Copy link
Contributor

marquiz commented Apr 7, 2021

I actually thought about this myself when working on the prototype for extending the custom source (#464). I.e. a more flexible PCI device labeling scheme might be needed. But this was a poor fit for the custom source as the rules there create a single static label in contrast to the pci labels where basically a single rule is creating potentially a multitude of labels.

It could be something like the matchOn directives of the custom source, e.g.

sources:
  pci:
    deviceLabels:
    - matchOn:
        vendor: ["8086"]
      fields: ["class", "vendor"]
    - matchOn:
        class: ["02", "03"]
        vendor: ["15b3"]
      fields: ["class", "vendor", "device"]

But, we need to think is this really needed. In your case (a single device) a custom rule could be the right solution.

@zvonkok
Copy link
Contributor

zvonkok commented Apr 7, 2021

@marquiz I think this could also help in the multiple instance issue we had. Since the GPU and Network operators need different/same PCI labels aka configs compared to the default one.

The GPU operator could have pci-<WHATEVER> and the Network operator could have pci-<WHICHEVER>

cc @adrianchiris what is the exact difference between the two configs of the two operators?
AFAIR GPU operator pci-<vendor-id> and the networking pci-<vendor-id> but the default is pci-<vendor-id>-<class-id>

@ArangoGutierrez
Copy link
Contributor

/assign

@adrianchiris
Copy link
Contributor

adrianchiris commented Apr 7, 2021

cc @adrianchiris what is the exact difference between the two configs of the two operators?
AFAIR GPU operator pci- and the networking pci- but the default is pci--

both use the same label format, pci-<vendor-id>, but different class whitelist (network-operator : Network PCI classes, GPU Operator: default).
This was solved with the --instance functionality. to allow each operator to deploy its own NFD instance.

Now if GPU operator starts using this flag then the user can deploy its own NFD and as long as the labels (specifically their value) dont conflict we should be fine.

@kpouget for now if you deploy your instance of NFD with --instance i think it should allow to define your own labels.
downside is having multiple deployments of NFD :\

Having one NFD would require to allow a more flexible labeling for PCI.

If its a specific PCI device then a custom label might fit the bill

e.g

source:
  custom:
  - name: "my.custom.pci.device"
    matchOn:
    - pciId:
        vendor: ["<vendor ID>"]
        device: ["<device ID>"]

@zvonkok
Copy link
Contributor

zvonkok commented Apr 7, 2021

@adrianchiris We do not want to run two instances in an OpenShift cluster, I think this will "confuse" the user and make UX hard.

@marquiz We also merged the PR for per node settings, if we could "generalize" the PCI labelling maybe we can leverage the per node feature to have "better" PCI labels where needed.

@zvonkok
Copy link
Contributor

zvonkok commented Apr 7, 2021

We could also add pci to #464

@marquiz
Copy link
Contributor

marquiz commented Apr 8, 2021

both use the same label format, pci-<vendor-id>, but different class whitelist (network-operator : Network PCI classes, GPU Operator: default).

Hmm, maybe we should extend the default whitelist to contain network devices, too 🤔

@adrianchiris
Copy link
Contributor

Hmm, maybe we should extend the default whitelist to contain network devices, too 🤔

this would solve the immediate problem of running both operators with a single NFD instance.
However this introduces a dependency in common labels format (e.g PCI label format should be the same)

@marquiz
Copy link
Contributor

marquiz commented Apr 8, 2021

We could also add pci to #464

It's not that straightforward as I explained: custom rule currently are only about creating one label per rule. But I was pondering this, too, as it would make everything more flexible. One idea that I just came up with would be adding something like forEach alongside matchOn to the custom rule.

source:
  custom:
  - name: "intel.com/{pciId.class}.present"
    matchOn:
        class: ["02", "03"]
        vendor: ["8086"]
    forEach: pciId

This would currently make most sense for PCI and USB but could also be extended to others (like kconfig) if seen useful.

/remove-kind bug
/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Apr 8, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2021
@marquiz
Copy link
Contributor

marquiz commented Jul 7, 2021

I now submitted a draft PR that should solve this issue (#550). It allows templating of the names of custom rules, making it possible to generate a custom label for each matching pci device.
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2021
@marquiz marquiz added this to the v0.10.0 milestone Aug 20, 2021
@marquiz
Copy link
Contributor

marquiz commented Sep 1, 2021

@kpouget I moved #550. This will cover your use case. Any feedback, e.g. testing or code review is more than welcome 😉

@marquiz
Copy link
Contributor

marquiz commented Nov 26, 2021

@kpouget #550 should have you covered.

An example nfd-worker config for creating "" and "-" labels of all Intel PCI devices on the system:

sources:
  custom:
    - name: "kpouget's pci rule"
      labelsTemplate: |
        {{ range .pci.device -}}
        pci-{{ .device }}.present=true
        pci-{{ .class }}-{{ .device }}.present=true
        {{ end }}
      matchFeatures:
        - feature: pci.device
          matchExpressions:
            vendor: {op: In, value: ["8086"]}

/close

@k8s-ci-robot
Copy link
Contributor

@marquiz: Closing this issue.

In response to this:

@kpouget #550 should have you covered.

An example nfd-worker config for creating "" and "-" labels of all Intel PCI devices on the system:

sources:
 custom:
   - name: "kpouget's pci rule"
     labelsTemplate: |
       {{ range .pci.device -}}
       pci-{{ .device }}.present=true
       pci-{{ .class }}-{{ .device }}.present=true
       {{ end }}
     matchFeatures:
       - feature: pci.device
         matchExpressions:
           vendor: {op: In, value: ["8086"]}

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

7 participants