Enable Kubernetes objects to be reported just once in a cluster #3274

bboreham · 2018-07-23T09:21:03Z

This set of changes allows you to configure the probe on each node with Kubernetes probing disabled, and run one extra probe with Kubernetes enabled and processes, containers, etc., disabled. This is quite simple to arrange with one DaemonSet and one Deployment.

Benefits:

less impact on the Kubernetes api-server
less work on each node gathering and reporting the data in each probe
less work in the app to merge N copies of identical nodes
less network traffic to send them to the app.
Disappointingly the CPU-usage benefit wasn't huge when I tried it in our staging cluster, but I didn't spend long looking at why.

It has one disruptive change: pods never get tagged with a host-id. The rendering code is changed to find this on a container node. When reporting Kubernetes on just one node in the cluster, it doesn't know what hostID has been given to any other nodes.

Alternatives considered to the above change:

Add the Kubernetes node-name to each host node, then have the renderer match them up.
use Kubernetes node-name instead of OS-supplied hostname as the host ID. This seems like a better idea, but is a more disruptive change if people are used to seeing a certain format on the screen. Maybe we can separate the internal ID from the on-screen presentation?

2opremio · 2018-08-15T08:31:44Z

The rendering code is changed to find this on a container node

I think this is acceptable and not worse than the other solutions

2opremio · 2018-08-16T14:33:10Z

A few comments:

Shouldn't this PR remove https://github.com/weaveworks/scope/blob/052ff39bf1b1e92f3127aeb4cfbe9cf554d105b9/probe/kubernetes/kubelet.go (and its test file)?
I think t's important to make sure that the apps in Weave Cloud are updated ASAP (as I understand it, k8s rendering will break for probes using this code until the counterpart app is updated).
Did you test backwards compatibility with probes using the hostID?

prog/main.go

@@ -320,7 +322,7 @@ func setupFlags(flags *flags) {
 	flag.StringVar(&flags.probe.kubernetesClientConfig.User, "probe.kubernetes.user", "", "The name of the kubeconfig user to use")
 	flag.StringVar(&flags.probe.kubernetesClientConfig.Username, "probe.kubernetes.username", "", "Username for basic authentication to the API server")
 	flag.StringVar(&flags.probe.kubernetesNodeName, "probe.kubernetes.node-name", "", "Name of this node, for filtering pods")
-	flag.UintVar(&flags.probe.kubernetesKubeletPort, "probe.kubernetes.kubelet-port", 10255, "Node-local TCP port for contacting kubelet")
+	flag.UintVar(&flags.probe.kubernetesKubeletPort, "probe.kubernetes.kubelet-port", 10255, "Node-local TCP port for contacting kubelet (zero to disable)")


2opremio · 2018-08-16T14:59:28Z

If I understood properly, after this PR:

The normal probes (daemon sets) will run with --probe.kubernetes-tag=true
The deployment gathering k8s information will run with --probe.kubernetes =true

If that's correct I think it would be more intuitive to something like:

Run the daemon set with --probe.kubernetes=true``--probe.kubernetes.role=host-agent
Run the deployment with --probe.kubernetes=true --probe.kubernetes.role=api-server-agent

(we could optionally drop the --probe.kubernetes=true since it's redundant.)

I think that exposing whether it tags or not is an implementation detail which may be confusing for the end user. Indicating what the probe is used for would more useful.

probe/kubernetes/reporter.go

@@ -552,7 +559,7 @@ func (r *Reporter) podTopology(services []Service, deployments []Deployment, dae
 	}

 	var localPodUIDs map[string]struct{}
-	if r.nodeName == "" {
+	if r.nodeName == "" && r.kubeletPort != 0 {
 		// We don't know the node name: fall back to obtaining the local pods from kubelet
 		var err error
 		localPodUIDs, err = GetLocalPodUIDs(fmt.Sprintf("127.0.0.1:%d", r.kubeletPort))


This enables us to run Kubernetes probing on one node for the whole cluster.

This gives us the option of disabling the function

So they can be reported centrally, find the pod host ID from the child containers.

bboreham · 2018-10-11T18:14:45Z

I have rebased and updated the flag settings in line with what @2opremio suggested. Now it is:

Run the daemon set with --probe.kubernetes.role=host
Run the deployment with --probe.kubernetes.role=cluster

Talking directly to kubelet is now disabled in both modes, although the code remains - removing it would be a separate PR.

Now you specify a role instead of controlling the internal behaviour

bboreham · 2018-10-12T14:01:19Z

I tested this again in our staging cluster; backwards-compatibility is fine.
However I hadn't quite got the host parent propagation right, so have fixed that.

Impact on our staging cluster was about a 10% reduction in CPU usage by probes. In bigger clusters the net impact should be better.

I think this is good to go now.

2opremio · 2018-10-19T14:42:13Z

Are there followup issues/PRs for:

Updating https://cloud.weave.works/k8s/ ? (so that we only talk to the api server from a deployement)
Removing the kubelet code ?

bboreham · 2018-10-19T15:48:24Z

#3242 is the latter. Yes we are very keen to update the cloud.weave.works config, but need to do a Scope release first, which was waiting on a review here.

We stop the per-host probes talking to Kubernetes and run an extra Deployment with one more probe process to collect all information for the cluster, which is less resource-intensive overall. This feature was added at #3274

2opremio self-requested a review August 14, 2018 16:38

2opremio reviewed Aug 16, 2018

View reviewed changes

bboreham added 5 commits October 11, 2018 17:54

Refactor: implement kubernetes tagger in separate struct

dd21a55

Add option for Kubernetes tagging when kubernetes probing disabled

88049b0

This enables us to run Kubernetes probing on one node for the whole cluster.

Check if dockerBridge is nonblank before using it

38ea862

This gives us the option of disabling the function

Allow kubelet port to be disabled

98d52bd

Stop tagging pods with host ID

1279a02

So they can be reported centrally, find the pod host ID from the child containers.

bboreham force-pushed the kubernetes-tagger branch from bb7ca55 to ae2d7de Compare October 11, 2018 18:10

Make flag names easier to understand

78eaf93

Now you specify a role instead of controlling the internal behaviour

bboreham force-pushed the kubernetes-tagger branch from ae2d7de to 78eaf93 Compare October 12, 2018 08:07

bboreham added 2 commits October 12, 2018 11:52

Turn everything else off in Kubernetes cluster probe

fb96fe0

Fix pod host propagation

0c394e6

bboreham mentioned this pull request Oct 13, 2018

Lots of 'Dropping report' logs under Kubernetes #3150

Closed

2opremio approved these changes Oct 19, 2018

View reviewed changes

bboreham merged commit 8cccbb6 into master Oct 19, 2018

bboreham deleted the kubernetes-tagger branch October 19, 2018 15:48

bboreham mentioned this pull request Nov 5, 2018

Disable host and endpoint reporting in Kubernetes cluster mode #3419

Merged

This was referenced Nov 14, 2018

Check if host record has uptime in billing emitter #3428

Closed

Move "hideous hack" for Kubernetes service network from probe to app #3432

Merged

bboreham mentioned this pull request Nov 22, 2018

Release 1.10 #3437

Merged

bboreham mentioned this pull request Feb 13, 2019

Update example yamls to probe Kubernetes once per cluster #3569

Merged

bboreham mentioned this pull request Mar 11, 2020

Removed kubelet port flag. Node name now always need from env/flag #3754

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Kubernetes objects to be reported just once in a cluster #3274

Enable Kubernetes objects to be reported just once in a cluster #3274

bboreham commented Jul 23, 2018

2opremio commented Aug 15, 2018 •

edited

Loading

2opremio commented Aug 16, 2018 •

edited

Loading

This comment was marked as abuse.

This comment was marked as abuse.

2opremio commented Aug 16, 2018

This comment was marked as abuse.

bboreham commented Oct 11, 2018

bboreham commented Oct 12, 2018

2opremio commented Oct 19, 2018

bboreham commented Oct 19, 2018

Enable Kubernetes objects to be reported just once in a cluster #3274

Enable Kubernetes objects to be reported just once in a cluster #3274

Conversation

bboreham commented Jul 23, 2018

2opremio commented Aug 15, 2018 • edited Loading

2opremio commented Aug 16, 2018 • edited Loading

This comment was marked as abuse.

This comment was marked as abuse.

2opremio commented Aug 16, 2018

This comment was marked as abuse.

bboreham commented Oct 11, 2018

bboreham commented Oct 12, 2018

2opremio commented Oct 19, 2018

bboreham commented Oct 19, 2018

2opremio commented Aug 15, 2018 •

edited

Loading

2opremio commented Aug 16, 2018 •

edited

Loading