Container Runtimes solution trade off

Use cases

The purpose of this trade-off is to evaluate the best runtime to execute containers in the Kubernetes cluster.

Several components compose Kubernetes container management.

K8S CRI OCI

We will evaluate here container runtime interfaces (CRI) as well as the container runtime solutions.

Note: Kubernetes introduced a stable version of Runtime Class in v1.20 that allows a pod to select a particular container runtime.

Container runtime interfaces (CRI)

Requirements

It has to implement Kubernetes CRI
It has to support OCI runtime-spec and OCI image-spec
It has to be an active and alive project
It must be mature enough to be used in a production environment

Analysis

4 solutions will be evaulated here as container runtime interface: Containerd, CRI-O, Docker, PouchContainer

CRI	Community	Support
Containerd	9.3k stars / 1.8k forks / 384 contributors	graduated from CNCF
CRI-O	3.6k stars / 677 forks / 194 contributors	CNCF incubating project
Docker	61.7k stars / 17.7k forks / 2131 contributors	Part of Moby project
PouchContainer	4.5k stars / 960 forks / 110 contributors	Alibaba

Containerd

Pros:

very mature since it comes from Docker itself and it is CNCF graduated
officially supported by Kubernetes
default and officially supported by AKS, EKS, GKE, k3s
present in most Kubernetes cluster installations
fully support OCI runtime-spec and OCI image-spec
support Windows Kubernetes nodes
follow plugin model

CRI-O

Pros:

lightweight
its releases follow Kubernetes releases
dedicated for Kubernetes
officially supported by Kubernetes
shipped in OpenShift, supported by Prisma Cloud
compliant with OCI runtimes and OCI images
specific custom config by Pod annotations
- support user namespaces
- high-performance mode

Cons:

not widely officially supported yet

Docker

Pros:

Very mature: existed before Kubernetes and power clusters since Kubernetes first release.
officially supported by Kubernetes
compliant with OCI runtimes and OCI images
integrate containerd with all its features

Cons:

deprecated by Kubernetes since v1.20
provides lots of features unnecessary in a Kubernetes cluster
add an unnecessary layer between the runtime and the kubelet

PouchContainer

Pros:

P2P image distribution
compatible with old kernel versions
compatible with OCI runtimes and OCI images

Cons:

provides features unnecessary in a Kubernetes cluster
not officially supported by Kubernetes
not active project: the last commit is from September 2020

Results

Containerd and CRI-O are the two solutions meeting the requirements stated above. They are both very mature and stable solutions for a production Kubernetes cluster. The container runtime interface choice doesn't improve or impact the business strategy related to the project.

About CRI-O supporting user namespaces

This new namespace introduced by Linux kernel 3.8 brings containers security to another level. It makes the container believes it runs as privileged while re-map it to a less-privileged user on the host. Kubernetes has long time issues (#127, #2101) related to this feature but nothing upstream yet. Currently, running containers with user namespaces brings big challenges and complexity for stateful applications and mounting shared filesystems. There are several patches done on Linux kernel introducing idmapped mount for fat, ext4, xfs (v5.12) and btrfs (v5.15) but it does not support overlayfs yet. There is also some work in progress on containerd to support idmapped mount (#5888).
To conclude on this feature, it is a neat security improvement for Pod to Pod and Pod to node isolation in Kubernetes but yet still a work in progress from the communities in Linux Kernel, Kubernetes, CRI-O, and containerd.

Regarding CRI-O high-performance config

This feature allows the admin to disable cpu load-balancing and CFS quota in case of latency-sensitive workloads. This feature does not reflect our needs.

Conclusion

Containerd features are sufficient for this project. Moreover, it is the safest choice regarding its broad adoption. Thus we decided to go with containerd.

References

Container runtimes

Requirements

It has to be compliant with OCI runtime-spec to work with the container runtime interface (CRI).
It has to be open-source.
It has to be mature enough and have a solid community.

Methodology

To evaluate each product, we rely on their official pages and different benchmarks and analysis.

Analysis

4 products meet the requirements stated above:

OCI Runtime	Performance cost	Security	Community	Support
crun	very lightweight / can run app with PID 1 / require < 1M memory / 50% faster than runc to execute containers	default*	1.2k stars / 127 forks / 53 contributors	Part of Containers project on GitHub
gVisor	syscall overhead / slow networking / bandwidth overhead / IO overhead	default* + system calls isolation / only 67/350 syscalls sent to host kernel	11.7k stars / 966 forks / 148 contributors	Google
Kata Containers	big memory footprint / 100Mb overhead for virtual machine and guest OS / slow IO (but possibility of passthrough hardware) /	default* + lightweight VM - hardware simulation)	1.5k stars / 253 forks / 172 contributors	OpenStack Foundation, 99cloud, AWcloud, Canonical, China Mobile, City Network, CoreOS, Dell/EMC, EasyStack, Fiberhome, Google, Huawei, JD .com, Mirantis, NetApp, Red Hat, SUSE, Tencent, Ucloud, UnitedStack and ZTE.
runc	standard implemented by most CRI (the one we compare against)	default*	8.4k stars / 1.6k forks / 275 contributors	Open Container Initiative (OCI)

*A container default security is based on the following:

isolation by namespaces
cgroups to control resources access
limited system calls with seccomp profiles
Linux Capabilities for privilege access rights.
Mandatory Access Control (MAC) to restrict objects access (AppArmor / SELinux)

crun

Pros:

very lightweight footprint
faster to execute containers
binary 50x smaller than runc

Cons:

seccomp and MAC security is difficult to adjust properly
language more error-prone

gVisor

Pros:

good security
raw computing as efficient as runc

Cons:

performances loss (networking / IO) due to syscalls overhead

Kata Containers

Pros:

strong security
can exploit VM features (like hardware passthrough)
good performances overall
big and active community

Cons:

heavy memory footprint

runc

Pros:

the default implemented in most CRI
officially supported by Containerd and CRI-O
good community

Cons:

seccomp and MAC security is difficult to adjust properly

Results

crun offers a scalability boost for spinning up containers faster than runc does. However, in this project, the containers will execute java code in the business workflow. Hence, the speed advantage of crun is insignificant compared to the application and jvm speed.

gVisor and Kata containers push container security further in protecting the host from possible containers breakout when security is vitally important for the platform. However, it also comes with additional complexity and performances flaws. For this project, such complexity is not necessary and does not reflect our reality.

Therefore, we decide to deploy runc on the Kubernetes nodes and rely on default container security and Kubernetes policies to ensure cluster security.

runc is a good choice in most cases as it already proves itself in many Kubernetes clusters in production for its efficiency and stability.

References

Kubernetes Runtime Class

There is growing interest in using different runtimes within a cluster. Sandboxes are the primary motivator for this right now, with Kata containers and gVisor looking to integrate with Kubernetes. Other runtime models such as Windows containers or even remote runtimes will also require support in the future. RuntimeClass provides a way to select between different runtimes configured in the cluster and surface their properties (both to the cluster & the user).

Since v1.20, Kubernetes implement stable version of Runtime Class. Users can define the container runtime for a Pod with a field in the pod/deployment definition.

The v1.16 introduced in beta the possibility to set constraints to ensure the Pods running with a RuntimeClass get scheduled to nodes that support it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container Runtimes solution trade off

Use cases

Container runtime interfaces (CRI)

Requirements

Analysis

Containerd

CRI-O

Docker

PouchContainer

Results

About CRI-O supporting user namespaces

Regarding CRI-O high-performance config

Conclusion

References

Container runtimes

Requirements

Methodology

Analysis

crun

gVisor

Kata Containers

runc

Results

References

Kubernetes Runtime Class

References:

Clone this wiki locally