Skip to content

Latest commit



731 lines (522 loc) · 33.9 KB

File metadata and controls

731 lines (522 loc) · 33.9 KB

BKPR components

Logging stack


Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

BKPR uses Fluentd to collect container logs from all containers from all namespaces, and also system logs from the underlying Kubernetes infrastructure (read the FLuentd section for a more detailed explanantion of which system logs are ingested). This data is stored in Elasticsearch and can be queried using Kibana.


BKPR uses Elasticsearch as packaged by Bitnami. By default it runs 3 non-root pods under the kubeprod namespace forming an Elasticsearch cluster named elasticsearch-cluster. It is implemented in the file manifests/components/elasticsearch.jsonnet. This manifest defines what an Elasticsearch pod and its nested containers look like:

  • An Elasticsearch node
  • A Prometheus exporter for collecting various metrics about Elasticsearch

Inside the manifest there is also a Kubernetes Service declaration used to allow other components (Kibana and Fluentd) access to the Elasticsearch cluster, and also used by Elasticsearch itself to perform node discovery in the cluster.


Elasticsearch Kubernetes Service uses default Elasticsearch ports:

  • Port 9200/tcp, used for end-user, HTTP-based access
  • Port 9300/tcp, used for internal communication between Elasticsearch nodes within the cluster

To assure durability of the underlying Elasticsearch storage, each pod relies on a Kubernetes PersistentVolume named data-elasticsearch-logging-%i where %i is an index that matches the pod index. By default, each PersistentVolume is allocated 100Gi of storage.


The following deployment parameters are supported, tested, and will be honored across upgrades. Any other detail of the configuration may also be overridden, but may change on subsequent releases.

Override pod replicas
// Example kubeprod-manifest.jsonnet with override
// Cluster-specific configuration
(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    elasticsearch+: {
        replicas: 5,
        // min_master_nodes > round(replicas / 2)
        min_master_nodes: 3,


Fluentd is an open source data collector for unified logging layer. Fluentd allows you to unify data collection and consumption for a better use and understanding of data.


BKPR uses Fluentd as packaged by Bitnami. It is implemented as a Kubernetes DaemonSet named fluentd-es under the kubeprod namespace. This maps to one Fluentd pod per Kubelet.


To have your logs collected by Fluentd and injected into Elasticsearch automatically, just must have the processes running inside the containers write to standard output and standard error streams in one of the log formats recognized by Fluentd built-in parsers. Fluentd allows for writing custom parsers when the built-in ones are not sufficient.


Fluentd configuration is split across several configuration files which have been downloaded from upstream by the manifests/fluentd-es-config/ tool. These configuration files are assembled into a Kubernetes ConfigMap and injected into the Fluentd container.

Configuration reloading

Fluentd supports reloading configuration file by gracefully restarting the worker process when it receives the SIGHUP signal. However, BKPR does not currently implement a mechanism to deliver a SIGHUP signal to Fluentd when any of the configuration files (assembled into a Kubernetes ConfigMap) are changed.


Fluentd sends its log stream to the Elasticsearch cluster over TCP. Elasticsearch networking requirements are described in the corresponding Elasticsearch section.


Fluentd uses the Elasticsearch Output Plugin to process system and Docker daemon logs and streams them into Elasticsearch. System logs are collected from the Kubelet's /var/log directory (via HostPath). Docker daemon logs are not collected from the Docker deamons but from the Kubelet's /var/lib/docker/containers (via HostPath). Fluentd requires a small amount of local storage on the host machine under /var/log/fluentd-pos/ in the form of pos files to record the position it last read into each log file.


The following deployment parameters are supported, tested, and will be honored across upgrades. Any other detail of the configuration may also be overridden, but may change on subsequent releases.

Resource requirements

To override pod memory or CPU:

// Example kubeprod-manifest.jsonnet with override
// Cluster-specific configuration
(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    fluentd_es+: {
        daemonset+: {
            spec+: {
                template+: {
                    spec+: {
                        containers_+: {
                            fluentd_es+: {
                                resources: {
                                    limits: {
                                        memory: "600Mi"
                                    requests: {
                                        cpu: "200m",
                                        memory: "300Mi",


Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack.

Kibana is externally accessible at https://kibana.${dns-zone} where ${dns-zone} is the literal for the DNS zone specified when BKPR was installed.


BKPR uses Kibana as packaged by Bitnami. By default it runs 1 non-root pod named kibana and also a Kubernetes Ingress resource named kibana-logging which allows end-user access to Kibana from the Internet. BKPR implements automatic DNS name registration for the kibana-logging Ingress resource based on the DNS suffix name specified when installing BKPR and also HTTP/S support (see cert-manager component for automatic management of X.509 certificates via Letsencrypt).

All these Kubernetes resources live under the kubeprod namespace.


Kibana exposes port 5601/tcp internally to the Kubernetes cluster, but allows external access via HTTP/S by means of the deployed nginx-ingress-controller. Kibana connects to the Elasticsearch cluster via the elasticsearch-logging Kubernetes Service defined in the Elasticsearch manifest.


Kibana is a stateless component and therefore does not have any persistent storage requirements.


Kibana plug-ins

Add-on functionality for Kibana is implemented with plug-in modules. Some known plug-ins for Kibana are listed here: Please ensure the plugins you install are compatible with the version of Kibana installed by your BKPR version.

To install additional plug-ins, override the plugins item inside the kibana scope, like this:

(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    kibana+: {
        plugins+: {
            "enhanced-table": {
                version: "1.2.0",
                url: "",
            "network_vis": {
                version: "6.7.2-3",
                url: "",

Monitoring stack


Prometheus is a popular open-source monitoring system and time series database written in Go. It features a multi-dimensional data model, a flexible query language, efficient time series database and modern alerting approach and integrates aspects all the way from client-side instrumentation to alerting.


BKPR uses Prometheus as packaged by Bitnami. It is implemented as a Kubernetes StatefulSet with just 1 pod named prometheus-0 under the kubeprod namespace.

Prometheus scrapes several elements for relevant data which is stored as metrics in timeseries and can be queried using Prometheus query language from the Prometheus console. The prometheus console is externally accessible at https://prometheus.${dns-zone}, where ${dns-zone} is the literal for the DNS zone specified when BKPR was installed.


Among the elements scraped by our default Prometheus configuration:

  • API servers
  • Nodes
  • Ingress and Service resources, which are probed using Prometheus Blackbox exporter
  • Pods
Kubernetes Annotations

The following Kubernetes annotations on pods allow a fine control of the scraping process:

  • true to include the pod in the scraping process
  • required if the metrics path is not /metrics
  • required if the pod must be scraped on the indicated port instead of the pod’s declared ports

Adding these annotations to your own pods will cause Prometheus to also collect metrics from your service.

Synthetic Labels

Our default configuration adds two synthetic labels to help with querying data:

  • kubernetes_namespace is the Kubernetes namespace of the pod the metric comes from. This label can be used to distinguish between the same component running in two separate namespaces.
  • kubernetes_pod_name is the name of the pod the metric comes from. This label can be used to distinguish between metrics from different pods of the same Deployment or DaemonSet.


Prometheus configuration is split across the two following files:

  • manifests/components/prometheus-config.jsonnet, which describes the Kubernetes objects that are scraped (e.g. pods, ingresses, nodes, etc.)
  • manifests/components/prometheus.jsonnet, which contains the set of monitoring rules and alerts.

This configuration is assembled into a Kubernetes ConfigMap and injected into the Prometheus container as several YAML configuration files, named basic.yaml, monitoring.yaml and prometheus.yanl.

Configuration reloading

Inside a Prometheus pod there is a container named configmap-reload that watches for updates to the Kubernetes ConfigMap that describes the Prometheus configuration. When this Kubernetes ConfigMap is updated, configmap-reloader will issue the following HTTP request to Prometheus, which cause a configuration reload: http://localhost:9090/-/reload.


Prometheus Kubernetes Service uses the default port:

  • Port 9090/tcp

To assure persistence of the timeseries database, each pod relies on a Kubernetes PersistentVolume named data-prometheus-%i where %i is an index that matches the pod index. By default, each PersistentVolume is allocated a default storage of 6 months or 8GiB. In the Overrides section below there are instructions for reconfiguring this.


The following deployment parameters are supported, tested, and will be honored across upgrades. Any other detail of the configuration may also be overridden, but may change on subsequent releases.

Override storage parameters

The following example shows how to override the retention days and storage volume size.

// Example kubeprod-manifest.jsonnet with override
// Cluster-specific configuration
(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    prometheus+: {
        retention_days:: 366,
        storage:: 16384,  // (in Mi)
Override for additional rules

The following example shows how to add additional monitoring rules. The default configuration shipped with Prometheus brings in two different groups of rules, namely basic.rules and monitoring.rules, but you can create additional groups if you need to. Next we show how to add an additional monitoring rule:

// Example kubeprod-manifest.jsonnet with override
// Cluster-specific configuration
(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    prometheus+: {
        monitoring_rules+: {
            ElasticsearchDown: {
                expr: "sum(elasticsearch_cluster_health_up) < 2",
                "for": "10m",
                labels: {severity: "critical"},
                annotations: {
                    summary: "Elastichsearch is unhealthy",
                    description: "Elasticsearch cluster quorum is not healthy",


Alertmanager is an open source component that handles alerts sent by the Prometheus server. It performs deduplication, grouping and delivery to the correct receiver and also takes care of silencing and inhibition of alerts.


It runs on top of Kubernetes as a Kubernetes StatefulSet with just 1 pod named alertmanager-0 under the kubeprod namespace. It is implemented inside the Prometheus manifest.

Alertmanager is accessible from within the cluster via a Kubernetes Service named alertmanager inside the kubeprod namespace.


The Alertmanager manifest reads its configuration from manifests/components/alertmanager-config.jsonnet. Its contents are exposed in the Alertmanager container as a read-only YAML configuration file named /config/config.yml.

The following sections document a very small subset of Alertmanager configuration. Please read Alertmanager's documentation for a detailed description of all Alertmanager configuration options.


A receiver defines the mechanism or protocol used to deliver alerts to a set of recipients. For example:

  • A receiver named email to deliver alerts over e-mail to the primary and secondary on-call e-mail aliases
  • A receiver named pager to deliver alerts to a SkyTel pager
  • A receiver named sms to deliver alerts over SMS to a pre-configured phone number

BKPR configures Alertmanager with a receiver named email in order to deliver alerts over e-mail. However, BKPR's default configuration requires you to specify the list of intended e-mail recipients. Use the following override construct to specify one or multiple e-mail addresses for delivering alerts over e-mail:

// Example kubeprod-manifest.jsonnet with override
// Cluster-specific configuration
(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    prometheus+: {
        am_config+:: {
            receivers_+:: {
                email: {
                    email_configs: [
                        { to: "" },
                        { to: " },

Alertmanager listens on port 9093/tcp for client requests. Prometheus is the main client, but nothing prevents you from talking to Alertmanager via its exposed Kubernetes Service.


To assure the resilience of Alertmanager, each pod relies on a Kubernetes PersistentVolume named data-alertmanager-%i where %i is an index that matches the pod index. By default, each PersistentVolume is allocated a default storage of 5GiB. In the Overrides section below there are instructions for reconfiguring this.


The following deployment parameters are supported, tested, and will be honored across upgrades. Any other details of the configuration may also be overridden, but may change on subsequent releases.

Override storage parameters

The following example shows how to override the amount of persistent storage required by Alertmanager:

// Example kubeprod-manifest.jsonnet with override
// Cluster-specific configuration
(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    prometheus+: {
        alertmanager+: {
            storage:: "9Gi",


Grafana is an open source metric analytics & visualization suite. It is most commonly used for visualizing time series data for infrastructure and application analytics but many use it in other domains including industrial sensors, home automation, weather and process control.


Grafana runs on top of Kubernetes as a StatefulSet with just 1 non-root pod named grafana-0 under the kubeprod namespace and also provides a Kubernetes Ingress resource named grafana which allows end-user access to Grafana from the Internet. It is implemented inside the Grafana manifest.


Grafana delegates authentication to OAuth2 Proxy. Once authenticated, the user is allowed access to Grafana as an administrator.


Grafana ships with a default configuration that configures Prometheus as a datasource for Grafana. This configuration is implemented by the datasources ConfigMap resource which defines a single datasource named BKPR Prometheus pointing to BKPR's Prometheus instance.


Grafana exposes port 3000/tcp internally to the Kubernetes cluster, but allows external access via HTTP/S by means of the deployed nginx-ingress-controller. Grafana can connect to Prometheus via a datasource that ships with the default configuration.


To assure durability of the underlying Grafana storage, each pod relies on a Kubernetes PersistentVolume named datadir-grafana-%i where %i is an index that matches the pod index. By default, each PersistentVolume is allocated 1Gi of storage. In the Overrides section below there are instructions for reconfiguring this.


The following deployment parameters are supported, tested, and will be honored across upgrades. Any other details of the configuration may also be overridden, but may change on subsequent releases.

Override storage parameters

The following example shows how to override the amount of persistent storage required by Grafana:

// Example kubeprod-manifest.jsonnet with override
// Cluster-specific configuration
(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    grafana+: {
        storage:: "9Gi",

Grafana dashboards

Grafana comes with a preset of dashboards that are meant to give an overview of the Kubernetes cluster. This set of dashboards is imported from bitnami-labs/kubernetes-grafana-dashboards and loaded in Grafana via ConfigMaps. Check the Grafana documentation to understand how the Grafana provisioning works.

If you are interested in provisioning your own dashboards, you should extend the dashboard_provider object in order to add different Grafana folders where you can store your dashboards:

dashboards_provider: kube.ConfigMap($.p + "grafana-dashboards-configuration") + $.metadata {
  local this = self,
  dashboard_provider:: {
    // Grafana dashboards configuration
    "kubernetes": {
      folder: "Kubernetes",
      type: "file",
      disableDeletion: false,
      editable: false,
      options: {
        path: utils.path_join(GRAFANA_DASHBOARDS_CONFIG, "kubernetes"),
    "custom": {
      folder: "Custom",
      type: "file",
      disableDeletion: false,
      editable: false,
      options: {
        path: utils.path_join(GRAFANA_DASHBOARDS_CONFIG, "custom"),
  data+: {
    _config:: {
      apiVersion: 1,
      providers: kube.mapToNamedList(this.dashboard_provider),
    "dashboards_provider.yml": kubecfg.manifestYaml(self._config),

Furthermore, you will have to create a ConfigMap that imports the dashboard:

custom_dashboards: kube.ConfigMap($.p + "grafana-custom-dashboards") + $.metadata {
  local this = self,
  data+: {
    "custom_dashboard.json": importstr "path/to/custom/dashbord/custom_dashboard.json",

And mount that ConfigMap in the desired location inside the Grafana pod.

Grafana Plugins

Plugins enables users to add support for various types of datasources, panels and apps to Grafana. Checkout the official Plugin Repository to discover the available plugins for Grafana.

To install additional plug-ins, override the plugins item inside the grafana scope, like this:

(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    grafana+: {
        plugins+: [

Ingress stack

NGINX Ingress Controller

nginx-ingress is an open source Kubernetes Ingress controller based on NGINX.

An Ingress is a Kubernetes resource that lets you configure an HTTP load balancer for your Kubernetes services. Such a load balancer usually exposes your services to clients outside of your Kubernetes cluster. An Ingress resource supports exposing services and configuring TLS termination for each exposed host name.


It runs on top of Kubernetes and is implemented as a Kubernetes Deployment resource named nginx-ingress-controller inside the kubeprod namespace. A HorizontalPodAutoscaler resource is associated with this Deployment in order to auto-scale the number of nginx-ingress-controller pod replicas based on the incoming load.

It also relies on ExternalDNS to handle registration of Kubernetes Ingress resources in the DNS zone specified when BKPR was installed and cert-manager to request X.509 certificates for Kubernetes Ingress resources in order to provide transparent TLS termination.

The manifests/components/nginx-ingress.jsonnet manifest defines two Kubernetes Services:

  • nginx-ingress-controller, which wraps the NGINX server running as a reverse proxy and the logic to derive its configuration and routing rules from Kubernetes Ingress and Service resources.
  • default-http-backend, which is configured to respond to /healthz requests (liveness/readiness probes) and to return 404 Not Found for any URL that does not match any of the known routing rules.

nginx-ingress-controller is configured to forward any URL that does not match any of the known routing rules to the default-http-backend Service.


No explicit configuration is required by the NGINX Ingress Controller.


The following ports are exposed:

  • The nginx-ingress-controller Service exposes ports:
    • 80/tcp and 443/tcp to service HTTP and HTTP/S requests
    • 10254/tcp for /healthz (liveness/readiness probes) and /metrics (Prometheus) endpoints.
  • The default-http-backend Service exposes port 80/tcp to render a 404 Not Found error page for URLs that do not match any routing rule.

NGINX Ingress Controller currently exposes a /metrics endpoint for exposing metrics to Prometheus. Some of the metrics exported are:

  • connections_total
  • requests_total
  • read_bytes_total
  • write_bytes_total
  • request_duration_seconds (histogram)
  • response_duration_seconds (histogram)
  • request_size (histogram)
  • response_size (histogram)

For additional information, read the source code.


NGINX Ingress Controller is a stateless component and therefore does not have any persistent storage requirements.


The following deployment parameters are supported, tested, and will be honored across upgrades. Any other details of the configuration may also be overridden, but may change on subsequent releases.

Override maximum number of replicas

The following example shows how to override the maximum number of replicas for NGINX Ingress Controller:

// Example kubeprod-manifest.jsonnet with override
// Cluster-specific configuration
(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    nginx_ingress+: {
        hpa+: {
            spec+: {
                maxReplicas: 10


cert-manager is a Kubernetes add-on to automate the management and issuance of TLS certificates. It will ensure certificates are valid and up to date periodically, and attempt to renew certificates at an appropriate time before expiry.


The ingress-shim component of cert-manager watches for Kubernetes Ingress resources across the cluster. If it observes an Ingress resource annotated with true, it will ensure a Certificate resource exists with the same name as the Ingress. A Certificate is a namespaced Kubernetes resource that references an Issuer or ClusterIssuer for information on how to obtain the certificate and current spec (commonName, dnsNames, etc.) and status (like last renewal time). cert-manager in BKPR is configured to use Let's Encrypt as the Certificate Authority for TLS certificates.


$ kubectl --namespace=kubeprod get certificates
NAME                 AGE
kibana-logging-tls   20d
prometheus-tls       20d


$ kubectl --namespace=kubeprod describe certificates kibana-logging-tls
Name:         kibana-logging-tls
Namespace:    kubeprod
Labels:       <none>
Annotations:  <none>
API Version:
Kind:         Certificate
  Cluster Name:
  Creation Timestamp:  2018-10-01T10:47:44Z
  Generation:          0
  Owner References:
    API Version:           extensions/v1beta1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Ingress
    Name:                  kibana-logging
    UID:                   5f439d5a-c567-11e8-b84a-0a58ac1f25fb
  Resource Version:        3557
  Self Link:               /apis/
  UID:                     6d529a2f-c567-11e8-b84a-0a58ac1f25fb
      Http 01:
        Ingress Class:  nginx
  Common Name:
  Dns Names:
  Issuer Ref:
    Kind:       ClusterIssuer
    Name:       letsencrypt-prod
  Secret Name:  kibana-logging-tls

(kibana.${dns-zone} will use the actual DNS domain specified in the --dns-zone command-line argument to kubeprod).

Let's Encrypt Environments

Let's Encrypt suppports two environments:

  • Production: meant for production deployments, enforces rate-limits to prevent abuse so it is not suitable for testing or requesting multiple certificates for the same domain in a short period of time.
  • Staging: for testing before using the production environment, has a lower rate-limits than the production environment.

cert-manager exposes a Prometheus /metrics endpoint over port 9042/tcp. cert-manager also requires Internet connectivity in order to communicate with Let's Encrypt servers.


Certificates managed by cert-manager are stored as namespaced Kubernetes Certificates resources.


The following deployment parameters are supported, tested, and will be honored across upgrades. Any other detail of the configuration may also be overridden, but may change on subsequent releases.

Override Let's Encrypt Environment

The following example shows how to request the use of Let's Encrypt staging environment:

// Example kubeprod-manifest.jsonnet with override
// Cluster-specific configuration
(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    cert_manager+: {
        letsencrypt_environment:: "staging",

OAuth2 Proxy

OAuth2 Proxy is an open source reverse proxy and static file server that uses the underlying platform OAuth2 support to provide authentication for Kubernetes Ingress resources.


It runs on top of Kubernetes and is implemented as a Kubernetes Deployment resource named oauth2-proxy inside the kubeprod namespace. A HorizontalPodAutoscaler resource is associated with this Deployment in order to auto-scale the number of oauth2-proxy pod replicas based on the incoming load.

The manifests/components/oauth2-proxy.jsonnet manifest defines a Secret resource that protects the client ID, client secret and client cookie required by the underlying platform implementation of the OAuth2 protocol. These are populated for you at the root manifest kubeprod-manifest.jsonnet by reading the corresponding entries from the kubeprod-autogen.json file.


No explicit configuration is required by OAuth2 Proxy.


OAuth2 Proxy relies on a Kubernetes Service resource to expose port 4180/tcp, where OAuth2 callbacks are processed.


OAuth2 Proxy is a stateless component and therefore does not have any persistent storage requirements.


The following deployment parameters are supported, tested, and will be honored across upgrades. Any other details of the configuration may also be overridden, but may change on subsequent releases.

Override maximum number of replicas

The following example shows how to override the maximum number of replicas for OAuth2 Proxy:

// Example kubeprod-manifest.jsonnet with override
// Cluster-specific configuration
(import "../../manifests/platforms/aks.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    oauth2_proxy+: {
        hpa+: {
            spec+: {
                maxReplicas: 10


ExternalDNS makes Kubernetes resources discoverable via public DNS servers by controlling DNS records dynamically via Kubernetes resources in a DNS provider-agnostic way.


ExternalDNS runs on top of Kubernetes and is implemented as a Kubernetes Deployment resource named external-dns inside the kubeprod namespace. It retrieves a list of Kubernetes Service and Ingress resources from the Kubernetes API to determine a desired list of DNS records, then ensures that the DNS zone configured when BKPR was installed is updated with this information. It uses the underlying platform's DNS implementation (e.g. Azure DNS, Google CloudDNS, etc.) for its operations.


$ kubectl --namespace=kubeprod get ingress
NAME             HOSTS                                               ADDRESS   PORTS     AGE
kibana-logging,    80, 443   1d
prometheus,   80, 443   1d

ExternalDNS will ensure that will resolve to and that will resolve to

$ nslookup

Non-authoritative answer:

ExternalDNS requires access to the Kubernetes API and a subset of the underlying platform's API in order to configure DNS.


ExternalDNS is a stateless component and therefore does not have any persistent storage requirements.