Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

kata containers memory footprint #295

Closed
bergwolf opened this issue May 9, 2018 · 8 comments · Fixed by #296
Closed

kata containers memory footprint #295

bergwolf opened this issue May 9, 2018 · 8 comments · Fixed by #296
Assignees

Comments

@bergwolf
Copy link
Member

bergwolf commented May 9, 2018

The memory footprint of kata containers is pretty high even inside the guest. Comparing to what runv+hyperstart has for a 256MB busybox container:

/ # free -m
             total       used       free     shared    buffers     cached
Mem:           235         33        202         18          0         19
-/+ buffers/cache:         13        222
Swap:            0          0          0

kata container only leaves less than 120MB free memory for the container app:

/ # free -m
             total       used       free     shared    buffers     cached
Mem:           195         82        113         26          0         26
-/+ buffers/cache:         55        139
Swap:            0          0          0

There are two places we need to look into:

  1. total memory: both cases are specifying 256 MB in qemu arguments but kata container guest only sees around 200MB of total memory, as also confirmed by agent log system-memory=\"200572 kB.
  2. used memory: likely this is caused by C vs. go. IWO, the go runtime costs around 50MB more than a plain C agent. I guess it's a price we'll have to pay with having a go agent. I wonder if there is some way to still reduce the agent memory footprint?

@sboeuf @WeiZhang555 @amshinde @devimc @egernst @laijs any thought?

@egernst
Copy link
Member

egernst commented May 9, 2018

Thanks for opening - this is something I'm actively look at, as is @devimc. @mcastelino - FYI.

A couple of things which came to mind when I started looking:

  1. Agent size, and if we can reduce. I would like to understand how much of this comes from gRPC, as
    well as the general switch from C->Go. @devimc did a quick experiment of moving from gRPC to ttRPC with limited results in footprint savings. Having said this, I was only seeing 1936 KB associated with the kata-agent RSS (checked via ps inside the guest with a console enabled).
  2. If there's anything newly enabled, or the memory/cpu settings for the VM itself. This is a WIP.
  3. While it doesn't help for the memory available to the container in the guest,KSM is providing a much better story when used (I saw ~75% reduction).

Can you check inside the guest the footprint of kata-agent? @mcastelino FYI.

@gnawux
Copy link
Member

gnawux commented May 9, 2018

and does this mean the kata kernel consumes more memory?

@egernst
Copy link
Member

egernst commented May 9, 2018

I think we need to (continue) to do some measurements of each piece. And, this highlights need for density CI to be merged. I'm working on making sure this happens ASAP.

devimc pushed a commit to devimc/kata-runtime that referenced this issue May 9, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container.

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this issue May 9, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this issue May 9, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this issue May 9, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
@devimc devimc self-assigned this May 9, 2018
@bergwolf
Copy link
Member Author

@gnawux @egernst, to be clear, I was using the same kernel and the same qemu binary to run the comparison. The only differences are:

  • initrd image: the kata initrd image is slightly bigger due to extra libraries in alpine (6.7MB vs. 8.5MB)
  • qemu command lines
  • agent inside the guest

One consequence of the large memory footprint is that we cannot boot a 128MB guest with kata (OOM killed), whereas runv can launch a 64MB guest.

@egernst
Copy link
Member

egernst commented May 10, 2018

@bergwolf what was the base image used before in the initrd? From what I saw, the footprint of the agent wasn't that significant, but I'd curious to see the data from your side too. I understand the consequence of this -- KSM doesn't give us help with OOM inside the guest! I saw the same result with 128MB falling over.

@egernst
Copy link
Member

egernst commented May 10, 2018

FYI: the CI has some density measurements per PR now: see http://kata-jenkins-ci.westus2.cloudapp.azure.com/job/kata-containers-runtime-density-PR/

The first test was just the standard baseline. You can see @devimc 's patch as job #2, which happens to have 100MB less footprint on the host. Of course, YMMV depending on the number of cores you have on the system...

@bergwolf
Copy link
Member Author

@egernst I was using alpine for the initrd image. And with #296, I'm seeing on a 128MB guest,
w/ kata + initrd

/ # free -m
             total       used       free     shared    buffers     cached
Mem:           110         47         62         26          0         26

w/ kata + rootfs image

/ # free -m
             total       used       free     shared    buffers     cached
Mem:           110         37         72          4          2          4

w/ runv

/ # free -m
             total       used       free     shared    buffers     cached
Mem:           110         33         77         18          0         19

Still slightly larger but much better now.

The large difference (~15MB) may largely be a result of hyperstart and kata agent binary size, since initramfs is uncompressed all into guest ram.

-rwxr-xr-x 1 root root 18M May  9 20:51 kata-agent
-rwxrwxr-x 1 bergwolf bergwolf 570K May  4 08:36 hyperstart

After striping the kata-agent binary, I'm getting with kata + initrd,

$ll kata-agent
-rwxr-xr-x 1 root root 11M May 10 12:44 kata-agent
/ # free -m
             total       used       free     shared    buffers     cached
Mem:           110         41         69         19          0         20

So the difference could be 5-8MB depending on whether initrd or rootfs image is used.

@egernst
Copy link
Member

egernst commented May 10, 2018

Much better. We're still investigating to see if there's more "low hanging fruit" we can address.

devimc pushed a commit to devimc/kata-runtime that referenced this issue May 11, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this issue May 11, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container if `default_maxvcpus` is <= 0 in
the runtime configuration file,  otherwise `default_maxvcpus` is used as the
maximum number of vCPUs.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this issue May 13, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container if `default_maxvcpus` is <= 0 in
the runtime configuration file,  otherwise `default_maxvcpus` is used as the
maximum number of vCPUs.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this issue May 14, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container if `default_maxvcpus` is <= 0 in
the runtime configuration file,  otherwise `default_maxvcpus` is used as the
maximum number of vCPUs.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this issue May 14, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container if `default_maxvcpus` is <= 0 in
the runtime configuration file,  otherwise `default_maxvcpus` is used as the
maximum number of vCPUs.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this issue May 14, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container if `default_maxvcpus` is <= 0 in
the runtime configuration file,  otherwise `default_maxvcpus` is used as the
maximum number of vCPUs.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this issue May 14, 2018
There is a relation between the maximum number of vCPUs and the
memory footprint, if QEMU maxcpus option and kernel nr_cpus
cmdline argument are big, then memory footprint is big, this
issue only occurs if CPU hotplug support is enabled in the kernel,
might be because of kernel needs to allocate resources to watch all
sockets waiting for a CPU to be connected (ACPI event).

For example

```
+---------------+-------------------------+
|               | Memory Footprint (KB)   |
+---------------+-------------------------+
| NR_CPUS=240   | 186501                  |
+---------------+-------------------------+
| NR_CPUS=8     | 110684                  |
+---------------+-------------------------+
```

In order to do not affect CPU hotplug and allow to users to have containers
with the same number of physical CPUs, this patch tries to mitigate the
big memory footprint by using the actual number of physical CPUs as the
maximum number of vCPUs for each container if `default_maxvcpus` is <= 0 in
the runtime configuration file,  otherwise `default_maxvcpus` is used as the
maximum number of vCPUs.

Before this patch a container with 256MB of RAM

```
              total        used        free      shared  buff/cache   available
Mem:           195M         40M        113M         26M         41M        112M
Swap:            0B          0B          0B
```

With this patch

```
              total        used        free      shared  buff/cache   available
Mem:           236M         11M        188M         26M         36M        186M
Swap:            0B          0B          0B
```

fixes kata-containers#295

Signed-off-by: Julio Montes <julio.montes@intel.com>
zklei pushed a commit to zklei/runtime that referenced this issue Jun 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants