Skip to content

Container Runtimes solution trade off

Cyrille edited this page Oct 1, 2021 · 1 revision

Use cases

The purpose of this trade-off is to evaluate the best runtime to execute containers in the Kubernetes cluster.

Several components compose Kubernetes container management.

K8S CRI OCI

We will evaluate here container runtime interfaces (CRI) as well as the container runtime solutions.

Note: Kubernetes introduced a stable version of Runtime Class in v1.20 that allows a pod to select a particular container runtime.

Container runtime interfaces (CRI)

Requirements

  • It has to implement Kubernetes CRI
  • It has to support OCI runtime-spec and OCI image-spec
  • It has to be an active and alive project
  • It must be mature enough to be used in a production environment

Analysis

4 solutions will be evaulated here as container runtime interface: Containerd, CRI-O, Docker, PouchContainer

CRI Community Support
Containerd 9.3k stars / 1.8k forks / 384 contributors graduated from CNCF
CRI-O 3.6k stars / 677 forks / 194 contributors CNCF incubating project
Docker 61.7k stars / 17.7k forks / 2131 contributors Part of Moby project
PouchContainer 4.5k stars / 960 forks / 110 contributors Alibaba

Containerd

Pros:

  • very mature since it comes from Docker itself and it is CNCF graduated
  • officially supported by Kubernetes
  • default and officially supported by AKS, EKS, GKE, k3s
  • present in most Kubernetes cluster installations
  • fully support OCI runtime-spec and OCI image-spec
  • support Windows Kubernetes nodes
  • follow plugin model

CRI-O

Pros:

  • lightweight
  • its releases follow Kubernetes releases
  • dedicated for Kubernetes
  • officially supported by Kubernetes
  • shipped in OpenShift, supported by Prisma Cloud
  • compliant with OCI runtimes and OCI images
  • specific custom config by Pod annotations
    • support user namespaces
    • high-performance mode

Cons:

  • not widely officially supported yet

Docker

Pros:

  • Very mature: existed before Kubernetes and power clusters since Kubernetes first release.
  • officially supported by Kubernetes
  • compliant with OCI runtimes and OCI images
  • integrate containerd with all its features

Cons:

  • deprecated by Kubernetes since v1.20
  • provides lots of features unnecessary in a Kubernetes cluster
  • add an unnecessary layer between the runtime and the kubelet

PouchContainer

Pros:

  • P2P image distribution
  • compatible with old kernel versions
  • compatible with OCI runtimes and OCI images

Cons:

  • provides features unnecessary in a Kubernetes cluster
  • not officially supported by Kubernetes
  • not active project: the last commit is from September 2020

Results

Containerd and CRI-O are the two solutions meeting the requirements stated above. They are both very mature and stable solutions for a production Kubernetes cluster. The container runtime interface choice doesn't improve or impact the business strategy related to the project.

About CRI-O supporting user namespaces

This new namespace introduced by Linux kernel 3.8 brings containers security to another level. It makes the container believes it runs as privileged while re-map it to a less-privileged user on the host. Kubernetes has long time issues (#127, #2101) related to this feature but nothing upstream yet. Currently, running containers with user namespaces brings big challenges and complexity for stateful applications and mounting shared filesystems. There are several patches done on Linux kernel introducing idmapped mount for fat, ext4, xfs (v5.12) and btrfs (v5.15) but it does not support overlayfs yet. There is also some work in progress on containerd to support idmapped mount (#5888).
To conclude on this feature, it is a neat security improvement for Pod to Pod and Pod to node isolation in Kubernetes but yet still a work in progress from the communities in Linux Kernel, Kubernetes, CRI-O, and containerd.

Regarding CRI-O high-performance config

This feature allows the admin to disable cpu load-balancing and CFS quota in case of latency-sensitive workloads. This feature does not reflect our needs.

Conclusion

Containerd features are sufficient for this project. Moreover, it is the safest choice regarding its broad adoption. Thus we decided to go with containerd.

References

Container runtimes

Requirements

  • It has to be compliant with OCI runtime-spec to work with the container runtime interface (CRI).
  • It has to be open-source.
  • It has to be mature enough and have a solid community.

Methodology

To evaluate each product, we rely on their official pages and different benchmarks and analysis.

Analysis

4 products meet the requirements stated above:

OCI Runtime Performance cost Security Community Support
crun very lightweight / can run app with PID 1 / require < 1M memory / 50% faster than runc to execute containers default* 1.2k stars / 127 forks / 53 contributors Part of Containers project on GitHub
gVisor syscall overhead / slow networking / bandwidth overhead / IO overhead default* + system calls isolation / only 67/350 syscalls sent to host kernel 11.7k stars / 966 forks / 148 contributors Google
Kata Containers big memory footprint / 100Mb overhead for virtual machine and guest OS / slow IO (but possibility of passthrough hardware) / default* + lightweight VM - hardware simulation) 1.5k stars / 253 forks / 172 contributors OpenStack Foundation, 99cloud, AWcloud, Canonical, China Mobile, City Network, CoreOS, Dell/EMC, EasyStack, Fiberhome, Google, Huawei, JD .com, Mirantis, NetApp, Red Hat, SUSE, Tencent, Ucloud, UnitedStack and ZTE.
runc standard implemented by most CRI (the one we compare against) default* 8.4k stars / 1.6k forks / 275 contributors Open Container Initiative (OCI)

*A container default security is based on the following:

  • isolation by namespaces
  • cgroups to control resources access
  • limited system calls with seccomp profiles
  • Linux Capabilities for privilege access rights.
  • Mandatory Access Control (MAC) to restrict objects access (AppArmor / SELinux)

crun

Pros:

  • very lightweight footprint
  • faster to execute containers
  • binary 50x smaller than runc

Cons:

  • seccomp and MAC security is difficult to adjust properly
  • language more error-prone

gVisor

Pros:

  • good security
  • raw computing as efficient as runc

Cons:

  • performances loss (networking / IO) due to syscalls overhead

Kata Containers

Pros:

  • strong security
  • can exploit VM features (like hardware passthrough)
  • good performances overall
  • big and active community

Cons:

  • heavy memory footprint

runc

Pros:

  • the default implemented in most CRI
  • officially supported by Containerd and CRI-O
  • good community

Cons:

  • seccomp and MAC security is difficult to adjust properly

Results

crun offers a scalability boost for spinning up containers faster than runc does. However, in this project, the containers will execute java code in the business workflow. Hence, the speed advantage of crun is insignificant compared to the application and jvm speed.

gVisor and Kata containers push container security further in protecting the host from possible containers breakout when security is vitally important for the platform. However, it also comes with additional complexity and performances flaws. For this project, such complexity is not necessary and does not reflect our reality.

Therefore, we decide to deploy runc on the Kubernetes nodes and rely on default container security and Kubernetes policies to ensure cluster security.

runc is a good choice in most cases as it already proves itself in many Kubernetes clusters in production for its efficiency and stability.

References

Kubernetes Runtime Class

There is growing interest in using different runtimes within a cluster. Sandboxes are the primary motivator for this right now, with Kata containers and gVisor looking to integrate with Kubernetes. Other runtime models such as Windows containers or even remote runtimes will also require support in the future. RuntimeClass provides a way to select between different runtimes configured in the cluster and surface their properties (both to the cluster & the user).

Since v1.20, Kubernetes implement stable version of Runtime Class. Users can define the container runtime for a Pod with a field in the pod/deployment definition.

The v1.16 introduced in beta the possibility to set constraints to ensure the Pods running with a RuntimeClass get scheduled to nodes that support it.

References: