kubeaware-cloudpool-proxy

The kubeaware-cloudpool-proxy is a proxy that is placed between a cloudpool and its clients (for example, an autoscaler). In essence, the kubeaware-cloudpool-proxy adds Kubernetes-awareness to an existing cloudpool implementation. The Kubernetes-awareness allows worker node scale-downs to be handled with less disruption by taking the current Kubernetes cluster state into account, carefully selecting a node, and evacuating its pods prior to terminating the cloud machine instead of just brutally killing a "random" worker node (at least appearing "random" from the Kubernetes-perspective).

The kubeaware-cloudpool-proxy delegates all cloud-specific actions to its backend cloudpool. In fact, most REST API operations are directly forwarded to the backend cloudpool as-is. There are two notable exceptions, that require the proxy to take action, both of which could lead to a scale-down:

set desired size: If a scale-down is suggested (desiredSize lower than the current pool size), victims need to be carefully selected and gracefully shut down (see below).
terminate machine: Is only allowed if the machine is a viable scale-down victim and if so, the machine needs to be gracefully shut down (see below).

When a node needs to be removed, the kubeaware-cloudpool-proxy communicates with the Kubernetes API server to determine the current cluster state. These interactions are illustrated in the image below.

When asked to scale down, the kubeaware-cloudpool-proxy takes care of taking down nodes in a controlled manner by:

Carefully determining which (if any) nodes are candidates for being removed. A node qualifies as a scale-down candidate if it satisfies all of the following conditions:
- the node must not be protected with a cluster-autoscaler.kubernetes.io/scale-down-disabled annotation.
- the node must not be a master node (as indicated by it running a pod in namespace kube-system named kube-apiserver-<host> or having a component label with value kube-apiserver)
- there must be other remaining non-master nodes that are Ready and Schedulable
- the node's pods must be possible to evacuate to the remaining nodes:
  - the sum of pod-requested CPU/memory on the node must not exceed free space on remaining nodes
  - the node must not have any pods without controller (such as deployment/replication controller), since such pods would not be recreated on a different node when evicted.
  - the node must not have any pods with (node-)local storage
  - the node must not have pods with a pod disruption budget that would be violated
  - taints on the remaining nodes must not prevent the node's pods from being evacuated (the pods must have matching tolerations for such cases)
  - the node pods must not have node selectors that prevent them from being moved
  - the node pods must not have node-affinity constraints that prevent them from being moved
Selecting the "best" victim node to kill (if at least one candidate was found in the prior step). In this context, the "best" node is typically the least loaded node -- the node with the least amount of pods that need to be evacuated to another node.
If a victim node is found, it needs to be evacuated before it can be killed. This happens as follows:
- The node is marked unschedulable via a node taint (to avoid new pods being scheduled onto the node).
- The node is drained: all non-system pods are evicted (and will be rescheduled to the remaining nodes).
- The node is deleted from the Kubernetes cluster.
- Finally, the node is terminated in the cloud through the terminate machine call to the backend cloudpool.

Building

build.sh builds the binary and runs all tests (build.sh --help for build options).

The built binary is placed under bin. The main binary is kubeaware-cloudpool-proxy.

Test coverage output is placed under build/coverage/ and can be viewed as HTML via:

go tool cover -html build/coverage/<package>.out

Configuring

The kubeaware-cloudpool-proxy requires a JSON-formatted configuration file. It has the following structure:

{
  "server": {
      "timeout": "60s"
  },

  "apiServer": {
      "url": "https://<host>:<port>",
      "auth": {
        ... authentication mechanism ...
      },
      "timeout": "10s",
  },

  "backend": {
      "url": "http://<host>:<port>",
      "timeout": "300s",
  }

}

The authentication part can be specified either with a concrete client certicate/key pair and a CA cert or via a kubeconfig file.

With a kubeconfig file, the auth is specified as follows:

...
  "apiServer": {
      "url": "https://<host>:<port>",
      "auth": {
        "kubeConfigPath": "/home/me/.kube/config"
      }
  },
...

With a specific client cert/key the auth configuration looks as follows:

...
  "apiServer": {
      "url": "https://<host>:<port>",
      "auth": {
        "clientCertPath": "/path/to/admin.pem",
        "clientKeyPath": "/path/to/admin-key.pem",
        "caCertPath": "/path/to/ca.pem",
      }
  },
...

The fields carry the following semantics:

server: proxy server settings
- timeout: read timeout on client requests. Default: 60s
apiServer: settings for the Kubernets API server
- url: URL is the base address used to contact the API server. For example, https://master:6443.
- auth: client authentication credentials
  - kubeConfigPath: a file system path to a kubeconfig file, the type of configuration file that is used by kubectl. When specified, any other auth fields are ignored (as they are all included in the kubeconfig). The kubeconfig must contain cluster credentials for a cluster with an API server with the specified url.
  - clientCertPath: a file system path to a pem-encoded API server client/admin cert. Ignored if kubeConfigPath is specified.
  - clientKeyPath: a file system path to a pem-encoded API server client/admin key. Ignored if kubeConfigPath is specified.
  - caCertPath: a file system path to a pem-encoded CA cert for the API server. Ignored if kubeConfigPath is specified.
- timeout: request timeout used when communicating with the API server. Default: 60s.
backend: settings for communicating with the backend cloudpool that the proxy sits in front of.
- url: the base URL where the cloudpool REST API can be reached. For example, http://cloudpool:9010.
- timeout: the connection timeout to use when contacting the backend. Default: 300s. Note: you may need to set a quite substantial timeout for the backend since some cloudprovider operations may be quite time-consuming (e.g. terminating a machine in Azure)

Running

After building, run the proxy via:

./bin/kubeaware-cloudpool-proxy --config-file=<path>

To enable a different glog log level use something like:

./bin/kubeaware-cloudpool-proxy --config-file=<path> --v=4

Docker

To build a docker image, run

./build.sh --docker

To run the docker image, run something similar to:

docker run --rm -p 8080:8080 \
   -v <config-dir>:/etc/elastisys \
   -v <kubessl-dir>:/etc/kubessl \
   elastisys/kubeaware-cloudpool-proxy:1.0.0 \
   --config-file=/etc/elastisys/config.json --port 8080

In this example, <config-dir> is a host directory that contains a config.json file for the kubeaware-cloudpool-proxy. Furthermore, <kubessl-dir> must contain the pem-encoded certificate/key/CA files required to talk to the Kubernetes API server. These cert files are referenced from the config.json which, in this case, could look something like:

{
    "apiServer": {
        "url": "https://<hostname>",
        "auth": {
            "clientCertPath": "/etc/kubessl/admin.pem",
            "clientKeyPath": "/etc/kubessl/admin-key.pem",
            "caCertPath": "/etc/kubessl/ca.pem"
        }
    },
    "backend": {
        "url": "http://<hostname>:9010",
        "timeout": "10s"
    }
}

Developer notes

Dependencies

dep is used for dependency management. Make sure it is installed.

To introduce a new dependency, add it to Gopkg.toml, edit some piece of code to import a package from the dependency, and then run:

dep ensure

to get the right version into the vendor folder.

Testing

The regular go test command can be used for testing.

To test a certain package, and to see logs (for a certain glog v-level), run something like:

go test -v ./pkg/kube -args -v=4 -logtostderr=true

For some tests, mock clients are used to fake interactions with "backend services". More specifically, these interfaces are KubeClient, CloudPoolClient, and NodeScaler. Should any of these interfaces change, the mocks need to be recreated (before editing the test code to modify expectations, etc). This can be achieved via the mockery tool.

Installing mockery: go get github.com/vektra/mockery/...

Generating the mocks

mockery -dir pkg/kube/ -name KubeClient -output pkg/kube/mocks

mockery -dir pkg/kube/ -name NodeScaler -output pkg/proxy/mocks
mockery -dir pkg/cloudpool/ -name CloudPoolClient -output pkg/proxy/mocks

The generated mocks should end up under pkg/mocks/

Useful references

[1] kubectl drain code
[2] cluster autoscaler scale-down code
[3] kubernetes scheduler overview
[4] kubernetes scheduler predicates code

Ideas for future work

In some cases, we would like to see more rapid utilization of newly introduced worker nodes, to make sure that it immediately starts accepting a share of the workload. Typically, what we've seen so far, is that a new node gets started, but once it is up it is typically very lightly loaded (if at all). It would be nice to see some pods being pushed over to the node. Furthermore, it would be useful to make sure that all required docker images are pulled to new nodes as early as possible to avoid unnecssary delays later when pods are scheduled onto the node.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
cmd/kubeaware-cloudpool-proxy		cmd/kubeaware-cloudpool-proxy
docs/images		docs/images
pkg		pkg
samples		samples
vendor		vendor
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
Gopkg.lock		Gopkg.lock
Gopkg.toml		Gopkg.toml
LICENSE		LICENSE
README.md		README.md
VERSION.txt		VERSION.txt
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kubeaware-cloudpool-proxy

Building

Configuring

Running

Docker

Developer notes

Dependencies

Testing

Useful references

Ideas for future work

About

Releases

Packages

Languages

License

elastisys/kubeaware-cloudpool-proxy

Folders and files

Latest commit

History

Repository files navigation

kubeaware-cloudpool-proxy

Building

Configuring

Running

Docker

Developer notes

Dependencies

Testing

Useful references

Ideas for future work

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages