-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split NFD into client and server #209
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: marquiz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f2fccb4
to
5acd22a
Compare
5acd22a
to
c91ffc3
Compare
I have a POC running with #209 , #211 , #189. Still some testing to do.
|
c91ffc3
to
f2edda8
Compare
Thanks you for testing these. One major missing piece of functionality (in my backlog) regarding this PR is the TLS encryption. That could be added in a separate PR a bit later, too, of course, if we want to get this merged soon. One thing that could also be added would be a spec template of running the master and worker in the same pod on the nodes. In case somebody would not want to deploy the split architecture. Do you think this would make sense? |
f2edda8
to
df5f8e9
Compare
Yeah makes sense, there are folks who run k8s only on one node, and this would help deploy NFD on a single host. This could be a configuration option for the operator specifying a CR with a special keyword option, have no idea, but something like |
5aed898
to
2244cbd
Compare
OK, I now added I also implemented (mutual) TLS authentication, i.e. both nfd-master and nfd-worker(s) are authenticated. Comments on this are welcome, too. |
d63ebac
to
3d03277
Compare
I wrote a quick update to the documentation. Comments to that are more than welcome 😄 |
b0abe53
to
ff48ce2
Compare
fd42716
to
33905c4
Compare
Add a new Makefile target for regenerating these files. Also, add a note that the files are auto-generated, including instructions how to re-generate them. Renames the mock files, using the defaults provided by the mockery tool, in order to make their generation easier.
Glide is not actively developed anymore, and, its documentation recommends migrating to dep. Also, dep is widely used in other k8s projects. Migrating to dep dramatically reduces the size of the populated vendor/ directory from 75MB down to about 20MB.
Update client-go and related packages to the latest version.
Refactor NFD into a simple server-client system. Labeling is now done by a separate 'nfd-master' server. It is a simple service with small codebase, designed for easy isolation. The feature discovery part is implemented in a 'nfd-worker' client which sends labeling requests to nfd-server, thus, requiring no access/permissions to the Kubernetes API itself. Client-server communication is implemented by using gRPC. The protocol currently consists of only one request, i.e. the labeling request. The spec templates are converted to the new scheme. The nfd-master server can be deployed using the nfd-master.yaml.template which now also contains the necessary RBAC configuration. NFD workers can be deployed by using the nfd-worker-daemonset.yaml.template or nfd-worker-job.yaml.template (most easily used with the label-nodes.sh script). Only nfd-worker currently support config file or options. The (default) NFD config file is renamed to nfd-worker.conf.
Refactor old tests and add tests for new functions. Add 'test' target in Makefile.
Makes deployment simpler, but, "softens" the setup by basically giving nodes the capability to label themselves.
Add support for TLS authentication. When enabled, nfd-worker verifies that nfd-master has a valid certificate, i.e. signed by the given root certificate and its Common Name (CN) matches the DNS name of the nfd-master service being used. TLS authentication is enabled by specifying --key-file and --cert-file on nfd-master, and, --ca-file on nfd-worker.
Implement TLS client certificate authentication. It is enabled by specifying --ca-file, --key-file and --cert-file, on both the nfd-master and nfd-worker side. When enabled, nfd-master verifies that the client (worker) presents a valid certificate signed by the root certificate (--ca-file). In addition, nfd-master does authorization based on the Common Name (CN) of the client certificate: CN must match the node name specified in the labeling request. This ensures (assuming that the worker certificates are correctly deployed) that nfd-worker is only able to label the node it is running on, i.e. prevents it from labeling other nodes.
Command line option for overriding the Common Name (CN) expected from the nfd-master TLS certificate. This can be especially handy in testing/development.
Make NodeName based authorization of the workers optional (off by default). This makes it possible for all nfd-worker pods in the cluster to use one shared secret, making NFD deployment much easier. However, this also opens a way for nfd-workers to label other nodes (than what it is running on), too.
Makes solving issues easier when gRPC prints out information e.g. about TLS authentication problems on the server (nfd-master) side, too.
Simplifies the code a bit. Also, log NodeName at startup.
33905c4
to
e09b380
Compare
Pushed a new rebased version. With one new patch: logging NodeName, and reading it only once, at at startup |
/lgtm Have done several tests, this looks good to me. |
/lgtm |
@marquiz: you cannot LGTM your own PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Version 0.4.0 Node-feature-discovery was migrated into a new repository under the kubernetes-sigs organization in Github (kubernetes-sigs#175). Related to the migration, the final container image registry/repo hasn't been dediced yet (kubernetes-sigs#177) – for this release we still use the old repo. Major changes - Split NFD into client and server (kubernetes-sigs#209) - Changes in labels: - NFD label namespace was changed to 'feature.node.kubernetes.io/' (kubernetes-sigs#176) - 'nfd-' prefix was dropped from all feature labels - NFD version label (feature.node.kubernetes.io/node-feature-discovery.version) was replaced by an annotation (nfd.node.kubernetes.io/version) - network SRIOV labels were changed (kubernetes-sigs#173): - 'network-sriov' -> 'network-sriov.capable' - 'network-sriov-configured' -> 'network-sriov.configured' - selinux detection was moved to kernel feature source - 'selinux' -> 'kernel-selinux.enabled' - cpuid, pstate and RDT labels moved under cpu feature source (kubernetes-sigs#217) - 'cpuid-<cpuid flag>' -> 'cpu-cpuid.<cpuid flag>' - 'pstate-turbo' -> 'cpu-pstate.turbo' - 'rdt-<rdt feature>' -> 'cpu-rdt.<rdt feature>' - Support for config file (kubernetes-sigs#169). Currently with three configurable feature sources i.e. cpu (kubernetes-sigs#224), kernel (kubernetes-sigs#157) and pci (kubernetes-sigs#168) - Support for non-binary labels, with arbitrary values other than plain 'true' - PCI device detection (kubernetes-sigs#168) - Kernel version detection (kubernetes-sigs#157) - Kernel config option detection (kubernetes-sigs#146) - Support for custom feature-detector hooks (kubernetes-sigs#144) - Support OS version detection (kubernetes-sigs#149, kubernetes-sigs#211) - Detection of hardware multithreading, such as Intel Hyper-Threading Technology (kubernetes-sigs#147) - Arm64 support for CPUID detection (kubernetes-sigs#194) - Validation of feature label names and values (kubernetes-sigs#199, kubernetes-sigs#219) - Detection of NVDIMM devices (kubernetes-sigs#214) - Get labels by reading from file in 'local' source (kubernetes-sigs#228) - Detection of Intel SST-BF (Speed Select Technology - Base Frequency) (kubernetes-sigs#235) - Make it possible to create feature labels in non-default namespace (kubernetes-sigs#231). Currently possible for using the local source (hooks and files). Miscellaneous - Template specs converted from json to yaml - Documentation updates and fixes
Refactor NFD into a simple server-client system. Labeling is now done by
a separate 'nfd-master' server. It is a simple service with small
codebase, designed for easy isolation. The feature discovery part is
implemented in a 'nfd-worker' client which sends labeling requests to
nfd-server, thus, requiring no access/permissions to the Kubernetes API
itself.
Client-server communication is implemented by using gRPC. The protocol
currently consists of only one request, i.e. the labeling request.
The spec templates are converted to the new scheme. The nfd-master
server can be deployed using the nfd-master.yaml.template which now also
contains the necessary RBAC configuration. NFD workers can be deployed
by using the nfd-worker-daemonset.yaml.template or
nfd-worker-job.yaml.template (most easily used with the label-nodes.sh
script)
Also, migrates from Glide to dep (#180) in dependency handling. This was necessary because Glide was unable to handle the dependencies of the new codebase.
This PR aims at solving #204.
Testing this should be relatively straightforward:
make
, tag and push it:docker tag <IMAGE_ID> <TARGET_IMAGE> && docker push <TARGET_IMAGE>
sed s'!quay.io/kubernetes_incubator/node-feature-discovery:v0.3.0!<TARGET_IMAGE>!' -i nfd-*template
kubectl apply -f nfd-master.yaml.template && kubectl apply -f nfd-worker-daemonset.yaml.template
TODO: