Skip to content

SCS K8s cluster standardization #181

@garloff

Description

@garloff

As DevOps team (=SCS user), I want to have the ability to create and use clusters on many different SCS-compliant container providers, where all relevant properties are either predefined by the SCS standard or can be controlled by a provider-independent cluster-settings.yaml file.
Relevant properties are those that tend to create trouble for the application deployment, e.g. k8s versions, CNI features, persistent volumes, ingress/load-balancers, anti-affinity rules (avoiding to have k8s nodes on the same host) ...

These properties should either be fixed by SCS (and then of course only evolve slowly over time) or be controllable by the customer (via a standardized, provider-independent cluster-params.yaml. For the controllable properties, we mandate existence and syntax and we may mandate all or some of the supported options. In any case, the supported options need to be discoverable (and the mechanism for discoverability should include the fixed properties as well).

Note that there is value in standardizing things that are not mandatory, in order for providers to use the same name/semantics for same things. (Obviously optional features may become mandatory for providers in the future if we decide so.)

Hints:

Extensibility: We allow for extensions, but they must be clearly distinguishable from standardized properties.

This epic should list the standardization proposals / ADRs as issues that we as SCS community want to define as SCS-compliant relevant. Some of the proposals might not make it for a v1 of the SCS standard (because they are not ready or deemed not important enough or downgraded to recommendations). The individual proposed properties / ADRs should come with a rationale and with (ideally comprehensive) conformance tests. We want to evolve the reference implementation(s) in parallel to the standardization, but intellectually keep a clear distinction b/w standards and implementation.

We need to create conformance tests for these properties; it is useful to define standards in terms of tests that must pass. (Test-driven standardization!) Obviously, using existing test suites (such as CNCF/sonobouy or aqua/kube-bench) and possibly contributing to them is a good way to do this.

Inspiration for the list below:

Individual topics for standardization:

Networking

  • Standardize k8s networking policies (CNI)

    • Description: CNI capabilities / k8s network policy support Standardize k8s networking policies (CNI)
    • Current state: Blocked
    • Discussion #211
  • Service type LoadBalancer with externalTrafficPolicy: Local

    • Description: Service type LoadBalancer with externalTrafficPolicy: Local needs to work out of the box
    • Current state: Unknown
    • Discussion #212
  • Ingress Support (OPTIONAL)

    • Description:
    • Current state: Action required

Container Registry

  • Container registry feature overview

    • Description: Container registry (OPT-IN), Container registry: Create overview of needed and desirable features and map OSS solutions against it
    • Current state: Completed
    • Decision Record #263
  • Registry Standard from DR SCS-0212

    • Description: Derive a standard from the DR created in the previous registry issue Split already existing document into a standard and a Decision Record only concerning the SCS cluster
    • Current state: Waiting for next issue
    • Decision Record | Standard #270
    • Test #662

Meta

  • Supported k8s versions

    • Version: scs-0210-v1
    • Description:
    • Current state: Completed
    • Standard #219
  • K8s version support period

    • Version: scs-0210-v2
    • Description: Include the K8s version support period into the SCS standards
    • Current state: Completed
    • Standard #386
    • Test #505
      • Conformance tests #488
      • Improve Tests !499
      • Restore scs-0210-v1 conformance tests !503
  • KaaS ControlPlane/worker machine flavors

    • Description: ControlPlane and Worker machine flavors and counts (translation from SCS flavors needed for non-SCS IaaS?)
    • Current state: Backlog
    • Discussion #421
  • Cluster management API

    • Description:
    • Current state: Action required

Automation

  • KaaS Cluster Management Gitops Controller

    • Description: Gitops controller for Cluster Mmgt
    • Current state: Backlog
    • Discussion #419
  • KaaS Gitops/CI tooling

    • Description: Gitops/CI tooling (flux/argo)?? (OPTIONAL)
    • Current state: Doing
    • Discussion #420

Identity Management

  • Understand the requirements towards the IdP Broker to support the container layer

    • Description: Identity federation via OIDC, Understand the requirements towards the IdP Broker to support the container layer
    • Current state: Backlog
    • Discussion #194
  • Implement Machine Identities

    • Description: Implement Machine Identities
    • Current state: Backlog
    • Discussion #163
  • KaaS IAM federation with ID broker

    • Description: IAM federation with ID broker (keycloak in our current ref impl)?
    • Current state: Backlog
    • Discussion #417

Logging & Metrics

  • Metrics server support (OPT-OUT)(OPTIONAL)

    • Description:
    • Current state: Backlog
    • Discussion #224
  • Logging/Monitoring/Tracing features? (OPTIONAL)

    • Description:
    • Current state: Backlog
    • Discussion #418

Security & Robustness

  • Forwarding-porting and retesting of upstream intel patchset for SGX and OpenStack

    • Description: Kube API access controls, Add ability to limit access to k8s API k8s-cluster-api-provider
    • Current state: Doing
    • Issue #246
  • K8s cluster baseline security setup K8s cluster hardening

    • Baseline security setups: External CA, protected kubeAPI, Security patching for nodes?
    • Current state: Doing
    • Standard #415
    • Standard update #475
  • Move Keycloak onto kubernetes powered runtime on management plane

    • Description: Control plane backup/ maintenance, etcd maintenance k8s-cluster-api-provider
    • Current state: Backlog
    • Issue #258
  • KaaS Optional Cert-Manager

    • DescriptioN: Cert manager (OPTIONAL)
    • Current state: Backlog
    • Discussion #416
  • Distributed K8s nodes to ensure Anti-Affinity

    • Version: scs-0214-v2
    • Description: Anti-affinity policies (for control-plane and -- possibly distinctly -- for workers) Anti-affinity for k8s nodes (control-plane and workers)
    • Current state: Doing
    • Decision Record #226
    • Standard #434
    • Standard v2 #494
    • Test #477
    • Test updates #489
    • Follow-up for stabilization standards/#639
  • KaaS Robustness features

    • Description: Robustness features: Rate limiting kube-api, etcd compaction/defragmentation, etcd backup, CA expiration avoidance, node-problem-detector
    • Current state: Waiting for next issue
    • Standard #414
    • Test #549

Storage

  • Standardize additional storage classes
    • Issue #214
    • Decision Record
    • Standard

Tests

Definition of Done:

  • We have a number of individual standards agreed and have reference implementations ready (or have otherwise created confidence that we can get them ready soon and without any potential blockers). Agreement includes reaching out to relevant communities, potentially also outside of the current SCS universe.
  • We have agreed on the subset of standards that we want to pull into a v1 of SCS-standard k8s platform
  • The included standards have good coverage by conformance tests
  • There is Documentation on the standard, with links to individual ADRs

Metadata

Metadata

Labels

ContainerIssues or pull requests relevant for Team 2: Container Infra and ToolingSCS is standardizedSCS is standardizedepicIssues that are spread across multiple sprintslongtermIssues or pull requests that relevent for longterm support

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions