Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support additional custom source configurations #66

Closed
slintes opened this issue Apr 30, 2021 · 13 comments
Closed

Support additional custom source configurations #66

slintes opened this issue Apr 30, 2021 · 13 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@slintes
Copy link

slintes commented Apr 30, 2021

What would you like to be added:
Recently configuration options for the NFD worker were extended: it supports a /etc/kubernetes/node-feature-discovery/custom.d directory now, which can be used for additional custom source configurations besides those in the main worker config. Typically this directory will be populated by mounting ConfigMaps. The needed volume and mount configurations should be handled by the operator.

Why is this needed:
The new configuration option allows dynamic configuration by potentially multiple parties. Each party can maintain their own ConfigMap without the need for cross party agreements. Ideally the operator can watch for those ConfigMaps and reconfigure the worker on the fly by adding and removing relevant volumes and mounts.

Design considerations
The big question is: how to find the relevant ConfigMaps. Some thoughts:

  • add the ConfigMap names to the NFD CRD. This makes implementation easy. But this isn't very dynamic and still needs manual work, which introduces some risk that in case of misconfiguration (name mismatch between config and actual ConfigMap, accidental deletion of ConfigMaps, ...?) the nfd workers won't start because a volume mounts fails.
  • let the operator find relevant ConfigMaps and (re)-configure the worker daemonset dynamically. However, we still need to know which ConfigMaps are relevant. The first restriction is easy: the CM needs to be in the same namespace as the NFD CR / worker. And then? A very basic check might be to have a look into the data and try to parse it as custom source configuration. Or at least look for e.g. the "matchOn" string. An alternative might be to require the ConfigMaps to have a certain name, e.g. a "custom-config-" prefix (maybe configurable in the NFD CR), or a label, or an annotation.
  • what worries me a bit: for this dynamic solution the operator would need to watch for all ConfigMaps in all namespaces. At least I did not find how to dynamically restrict the watch to the desired namespace(s). Is this a problem (thinking of huge clusters with many CMs...)?
  • side note: talking about namespaces: do I see correct that the operator watches ALL namespaces for NFD CRs and potentially installs multiple instances of NFD in multiple namespaces? Is that on purpose?

I think my favorite is using a marker label for ConfigMaps which should be mounted.
When we agree a way forward, I volunteer to implement it, in order to finish the work which started on the NFD worker :)

Related: #53

@slintes slintes added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 30, 2021
@ArangoGutierrez
Copy link
Contributor

/assign

@ArangoGutierrez ArangoGutierrez added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Apr 30, 2021
@mythi
Copy link

mythi commented May 4, 2021

side note: talking about namespaces: do I see correct that the operator watches ALL namespaces for NFD CRs and potentially installs multiple instances of NFD in multiple namespaces? Is that on purpose?

your observation is correct, see #54

@slintes
Copy link
Author

slintes commented May 19, 2021

in case of misconfiguration (name mismatch between config and actual ConfigMap, accidental deletion of ConfigMaps, ...?) the nfd workers won't start because a volume mounts fails.

I learned there is a optional flag for configmap mounts, so this isn't an issue

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2021
@marquiz
Copy link
Contributor

marquiz commented Aug 18, 2021

Let's not close this yet
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 18, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 16, 2021
@marquiz
Copy link
Contributor

marquiz commented Nov 16, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 16, 2021
@marquiz
Copy link
Contributor

marquiz commented Jan 20, 2022

Hmm, do we still want to keep this open? Especially after NFD v0.10.0 (and #653)? I'd say no.

If we would implement this it would probably mean a separate CRD for the extra custom configs which doesn't make much sense after kubernetes-sigs/node-feature-discovery#653, in practice overlapping functionality and more maintenance burden

Thoughts @slintes @ArangoGutierrez?

@slintes
Copy link
Author

slintes commented Jan 20, 2022

thanks for the heads up

in practice overlapping functionality and more maintenance burden

sounds like a good argument to me to close this

@ArangoGutierrez
Copy link
Contributor

Let's close this after #119

@marquiz
Copy link
Contributor

marquiz commented Feb 9, 2022

Let's close this after #119

Agree

@ArangoGutierrez
Copy link
Contributor

#119 is merged, we can say this issue has been properly addressed
/close

@ArangoGutierrez ArangoGutierrez added this to the v0.4.0 milestone Feb 17, 2022
@k8s-ci-robot
Copy link
Contributor

@ArangoGutierrez: Closing this issue.

In response to this:

#119 is merged, we can say this issue has been properly addressed
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ArangoGutierrez ArangoGutierrez mentioned this issue Feb 17, 2022
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants