Memory load includes the file cache on Docker Linux #49058

robertdavidsmith · 2021-03-03T14:07:11Z

Description

We have a complex C# application that makes heavy use of memory mapped files.

In Docker it spends most its time hitting the container memory limit, but this is only due to page cache so shouldn’t be a problem. Our app isn’t failing due to lack of memory. However it’s running at half the speed on Linux compared to Windows. G2 garbage collections are also way more frequent on Linux.

We suspected this was due to the process garbage collecting too aggressively, as it wrongly thought it was about to run out of memory (although it’s harmlessly hitting the limit due to page cache). We conducted an experiment to test this theory.
We tried allocating/writing to a large chunk of memory outside the container (swap is off so this stays in RAM). This stops the C# app from hitting the container memory limit because page cache size is now constrained by physical RAM on the machine, not by the cgroup limit. Allocating the large chunk of memory outside the container doubles the speed of our C# app, and reduces G2 collections by 8x!

This looks similar to #13371 but for the cgroup code path.

I tried working round the issue by setting System.GC.HighMemoryPercent to 100 – this makes no difference, probably because HighMemoryPercent is capped at 99 in the code (search for min (99, highmem_th_from_config) in https://raw.githubusercontent.com/dotnet/runtime/main/src/coreclr/gc/gc.cpp)

Configuration

This effect has been observed using both the .NET Core 3.1.3 and 5.0.0 runtimes.

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

$ uname -a
Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ docker -v
Docker version 18.06.1-ce, build e68fc7a

Regression?

This has always been the case AFAIK

Data

Typical example:
~700GiB Memory free at start: 10421 seconds, GC counts: G0=111932, G1=29224, G2=812
27GiB Memory free at start: 5592 seconds, GC Counts: G0=110433, G1=29013, G2=98

(Container memory limit is 36GiB, machine has 768GiB in total)

Analysis

Possibly issue in https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/unix/cgroup.cpp ?

The text was updated successfully, but these errors were encountered:

ghost · 2021-03-03T14:07:16Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

We have a complex C# application that makes heavy use of memory mapped files.

In Docker it spends most its time hitting the container memory limit, but this is only due to page cache so shouldn’t be a problem. Our app isn’t failing due to lack of memory. However it’s running at half the speed on Linux compared to Windows. G2 garbage collections are also way more frequent on Linux.

We suspected this was due to the process garbage collecting too aggressively, as it wrongly thought it was about to run out of memory (although it’s harmlessly hitting the limit due to page cache). We conducted an experiment to test this theory.
We tried allocating/writing to a large chunk of memory outside the container (swap is off so this stays in RAM). This stops the C# app from hitting the container memory limit because page cache size is now constrained by physical RAM on the machine, not by the cgroup limit. Allocating the large chunk of memory outside the container doubles the speed of our C# app, and reduces G2 collections by 8x!

This looks similar to #13371 but for the cgroup code path.

I tried working round the issue by setting System.GC.HighMemoryPercent to 100 – this makes no difference, probably because HighMemoryPercent is capped at 99 in the code (search for min (99, highmem_th_from_config) in https://raw.githubusercontent.com/dotnet/runtime/main/src/coreclr/gc/gc.cpp)

Configuration

This effect has been observed using both the .NET Core 3.1.3 and 5.0.0 runtimes.

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

$ uname -a
Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ docker -v
Docker version 18.06.1-ce, build e68fc7a

Regression?

This has always been the case AFAIK

Data

Typical example:
~700GiB Memory free at start: 10421 seconds, GC counts: G0=111932, G1=29224, G2=812
27GiB Memory free at start: 5592 seconds, GC Counts: G0=110433, G1=29013, G2=98

(Container memory limit is 36GiB, machine has 768GiB in total)

Analysis

Possibly issue in https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/unix/cgroup.cpp ?

Author:	robertdavidsmith
Assignees:	-
Labels:	`area-GC-coreclr`, `tenet-performance`, `untriaged`
Milestone:	-

mangod9 · 2021-03-03T17:59:07Z

Hey @robertdavidsmith, is there a standalone repro you could share for this issue?

robertdavidsmith · 2021-03-03T18:15:33Z

Not currently, the app is closed-source and relies on our internal infrastructure.

If the problem isn't obvious I can try and create one. It looks like https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/unix/cgroup.cpp is including file cache in memory load though.

mangod9 · 2021-03-03T18:28:11Z

Ok sure, we will take a look.

marcin-krystianc · 2021-03-26T12:52:07Z

Hi @mangod9, I'm working on this issue with @robertdavidsmith, I've prepared a standalone repro for it.

I'm using Docker on windows, v3.2.2, Hyper-V backend with 4GB Memory and 1 GB Swap limit. I've also tested this with Docker on Linux with no swap and my results were very similar.

Checkout & build the GcTesting app:
docker build -t gctesting .
Execute the GcTesting app:
docker run -i -t --env COMPlus_gcServer=0 --memory=3gb --name=gctesting gctesting --memorypressuretask=true --memorypressurerate=100mb --allocationunitsize=160Kb --minimummemoryusage=0mb
Every second it is going to print GC stats. We need to wait for a few minutes to get stable GC-Rate numbers, in my case I'm getting about 1920 GC/min:

...
Elapsed:103s, GC-Rate:1920, Gen012:3296,3296,3296, Total:2.91 MB, GcAllocated:10.07 GB, HeapSize:430.40 KB, MemoryLoad:0 Bytes, Committed:792.00 KB, Available:2.25 GB, HighMemoryLoadThreshold:2.70 GB, CGroupUsageInBytes:28.18 MB ...
Elapsed:104s, GC-Rate:1919, Gen012:3328,3328,3328, Total:1.81 MB, GcAllocated:10.16 GB, HeapSize:461.23 KB, MemoryLoad:0 Bytes, Committed:792.00 KB, Available:2.25 GB, HighMemoryLoadThreshold:2.70 GB, CGroupUsageInBytes:27.06 MB ...
Elapsed:105s, GC-Rate:1920, Gen012:3361,3361,3361, Total:2.44 MB, GcAllocated:10.26 GB, HeapSize:492.09 KB, MemoryLoad:0 Bytes, Committed:792.00 KB, Available:2.25 GB, HighMemoryLoadThreshold:2.70 GB, CGroupUsageInBytes:27.81 MB ...
...

Start an interactive bash shell on the running container.
docker exec -i -t gctesting /bin/bash
Run htop to see memory stats:
Start another interactive bash shell on the container.
Run another instance of the test app inside the same container, this time it will perform only I/O operations therefore it will fill up the file cache:
dotnet GcTesting.dll --memorypressuretask=false --filepressuretask=true --filepressuresize=4gb
Wait few more minutes for to get new GC-Rate numbers, in my case I'm getting about 9210 GC/min (almost 5x increase):

...
Elapsed:231s, GC-Rate:9216, Gen012:16296,16296,6956, Total:4.45 MB, GcAllocated:22.67 GB, HeapSize:4.32 MB, MemoryLoad:3.00 GB, Committed:4.46 MB, Available:2.25 GB, HighMemoryLoadThreshold:2.70 GB, CGroupUsageInBytes:3.00 GB ...
...

Kill the I/O instance, the GC-Rate will not go down as the I/O cache stays in use

...
Elapsed:298s, GC-Rate:9083, Gen012:26454,26454,8679, Total:1.42 MB, GcAllocated:29.21 GB, HeapSize:637.42 KB, MemoryLoad:2.97 GB, Committed:920.00 KB, Available:2.25 GB, HighMemoryLoadThreshold:2.70 GB, CGroupUsageInBytes:2.98 GB ...
...

To get the GC-rate to the initial level we need to purge the I/O cache so we run the GcTesting command again:
dotnet GcTesting.dll --memorypressuretask=false --unmanagedmemorypressuretask=true --minimumunmanagedmemoryusage=2gb --unmanagedmemorypressurerate=0mb
Kill the I/O cache purging instance. This time the GC-Rate goes back to initial level (about 1920/m).

...
Elapsed:432s, GC-Rate:1920, Gen012:37548,37548,12608, Total:3.94 MB, GcAllocated:42.31 GB, HeapSize:1.02 MB, MemoryLoad:983.04 MB, Committed:1.21 MB, Available:2.25 GB, HighMemoryLoadThreshold:2.70 GB, CGroupUsageInBytes:999.05 MB ...
...

mangod9 · 2021-03-26T16:29:11Z

@tmds @janvorli in case they have any thoughts on this.

robertdavidsmith · 2021-03-26T18:08:32Z

If you want to reproduce the crazy "performance doubles when the machine is low on memory" result you can do the following.

This was tested on a Linux machine with 8GB RAM and swap off - memory settings will need to be different if you have a different amount of RAM.

Open four console windows and type the following

Window 1 (GC tester, memorypressurerate set very high so limiting factor is dotnet performance not this setting):

git clone https://github.com/robertdavidsmith/runtime_49058 # This must be the fork under robertdavidsmith not marcin-krystianc
cd runtime_49058/GcTesting
docker build -t gctesting .
docker run -it --rm --env COMPlus_gcServer=0 --memory=5000mb --name=gctesting gctesting --memorypressurerate=100000mb --allocationunitsize=80Kb --minimummemoryusage=450mb --filepressuretask=false

Window 2 (Hammer the filesystem inside the same container):

docker exec -i -t gctesting dotnet GcTesting.dll --memorypressuretask=false --filepressuresize=6500mb

Window 3 (log stats very 5s):

cd /sys/fs/cgroup/memory/docker
while [ 1 ] ;do echo -------------------- && cat */memory.usage_in_bytes && cat */memory.stat | grep total_inactive_file || true&&free -m && sleep 5; done

Window 4 (waste memory):

git clone https://github.com/robertdavidsmith/wastemem
cd wastemem
gcc -o wastemem wastemem.c
./wastemem 2048 100000000 3600

The gc tester in Window 1 logs out various stats every second, including BlockAllocationsPerformed. BlockAllocationsPerformed is the amount of allocations performed in the last second. When memorypressurerate is large, BlockAllocationsPerformed is effectively a measure of performance.

When the memory waster isn't running, BlockAllocationsPerformed is around 3000 blocks per second
When the memory waster is running, BlockAllocationsPerformed is around 7000 blocks per second

Putting the machine under memory pressure more than doubles performance!

Looking at window 3, you can see memory.usage_in_bytes for the container.

When wastemem isn't running, memory.usage_in_bytes equals the container memory limit (although a lot of this is only file cache, see total_inactive_file in window 3).

When wastemem starts, the OS reclaims memory from the file cache, so the container is no longer at the memory limit. This more than doubles performance as the GC collects less agressively when it's not at the container limit. Also note G2 collections per second roughly halves.

janvorli · 2021-03-29T11:59:31Z

From the evidence provided above, it seems to me that we should really exclude the caches from the reported used memory.

tmds · 2021-04-06T11:43:04Z

@janvorli maybe we can explore using the events that come from cgroup related to memory usage instead of calculating usage ourselves?

I can have a look into this. I'd start by looking at the cgroupv2 memory.events file (https://www.kernel.org/doc/Documentation/cgroup-v2.txt) and share my findings here.

tmds · 2021-04-08T13:56:42Z

I can have a look into this. I'd start by looking at the cgroupv2 memory.events file (https://www.kernel.org/doc/Documentation/cgroup-v2.txt) and share my findings here.

I tried out some things, but it didn't work as I hoped. The memory.events file will only generate an interesting event when someone set the memory level in the memory.high file. By default it is at max and the event happens at oom, which is too late to trigger a GC.

From the evidence provided above, it seems to me that we should really exclude the caches from the reported used memory.

If we exclude the cache from usage, probably it makes sense to exclude it also from the max memory when we calculate load: load = (used - cache) / (max - cache).

For cgroup v2 we may be able to calculate something from memory.stat. A few of the fields related to pages reclaimable under memory pressure. I'm not sure what we should add up and subtract.

@robertdavidsmith you may be able to work around the issue using fadvise with FADV_DONTNEED. Calling this one or twice before you close a file may get it removed from the file cache. There is no managed API for it though, so you need to PInvoke. The managed api allows you to fadvise using FileOptions.RandomAccess and FileOptions.SequentialScan but I think they won't affect the file cache.

robertdavidsmith · 2021-04-09T16:40:45Z

Regarding cgroup v1 (I haven’t looked at v2):

I think a good starting point would be use load = (used - total_inactive_file) / max where used/max are the existing values and total_inactive_file comes from memory.stat.

I'm not sure whether total_active_file should be subtracted as well - it might be safer to just subtract total_inactive_file for now, it's the smaller change. That's also what docker stats does (see https://docs.docker.com/engine/reference/commandline/stats/#extended-description).

If you don't want to read a 3rd "file", note the existing value of used (which comes from "file" memory.usage_in_bytes) is roughly equal to total_cache+total_rss+total_swap from memory.stat. So an alternative way to calculate memory load could be (total_cache + total_rss + total_swap - total_inactive_file) / max with the first four values coming from memory.stat

That FADV_DONTNEED trick would probably work in many cases, our closed-source program uses the same set of files for most the time it's running, so isn't really suitable in our case though.

Note bug report #50414 - this suggests the GC is too relaxed sometimes. This bug may get worse if we relax the GC further.

tmds · 2021-04-13T11:02:40Z

it might be safer to just subtract total_inactive_file for now

I can't say whether this only includes reclaimable pages, or whether unevictable pages may also be counted here.
To be safe, we could subtract unevictable from total_inactive_file.

marcin-krystianc · 2021-04-23T15:01:24Z

@janvorli @robertdavidsmith @tmds I've opened a PR with my proposition of excluding page cache from memory load.

According to my knowledge, on Windows the page cache is not considered as used memory but modified/dirty memory is.
So I think we should use the value reported in usage_in_bytes as it was originally and then subtract clean pages of the page cache.

I wanted to avoid reading both memory.usage_in_bytes and memory.stat files, but I don't think it is possible to calculate the value of usage_in_bytes from values in memory.stat. For example I've tried total_cache+total_rss but that is always lower than usage_in_bytes.

robertdavidsmith added the tenet-performance Performance related issue label Mar 3, 2021

dotnet-issue-labeler bot added area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Mar 3, 2021

mangod9 removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2021

mangod9 added this to the 6.0.0 milestone Mar 3, 2021

janvorli mentioned this issue Mar 30, 2021

OutOfMemoryException in dotnet application running inside a Linux container #50414

Closed

marcin-krystianc mentioned this issue Apr 23, 2021

Do not count page cache towards cgroup’s memory load #51738

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 23, 2021

janvorli closed this as completed in #51738 May 6, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 6, 2021

marcin-krystianc mentioned this issue May 21, 2021

[release/5.0] Fix for Issue 49058 - Do not count page cache towards cgroup’s memory load #53074

Merged

ghost locked as resolved and limited conversation to collaborators Jun 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory load includes the file cache on Docker Linux #49058

Memory load includes the file cache on Docker Linux #49058

robertdavidsmith commented Mar 3, 2021

ghost commented Mar 3, 2021

Description

Configuration

Regression?

Data

Analysis

mangod9 commented Mar 3, 2021

robertdavidsmith commented Mar 3, 2021

mangod9 commented Mar 3, 2021

marcin-krystianc commented Mar 26, 2021

mangod9 commented Mar 26, 2021

robertdavidsmith commented Mar 26, 2021 •

edited

Loading

janvorli commented Mar 29, 2021

tmds commented Apr 6, 2021

tmds commented Apr 8, 2021

robertdavidsmith commented Apr 9, 2021 •

edited

Loading

tmds commented Apr 13, 2021

marcin-krystianc commented Apr 23, 2021

Memory load includes the file cache on Docker Linux #49058

Memory load includes the file cache on Docker Linux #49058

Comments

robertdavidsmith commented Mar 3, 2021

Description

Configuration

Regression?

Data

Analysis

ghost commented Mar 3, 2021

Description

Configuration

Regression?

Data

Analysis

mangod9 commented Mar 3, 2021

robertdavidsmith commented Mar 3, 2021

mangod9 commented Mar 3, 2021

marcin-krystianc commented Mar 26, 2021

mangod9 commented Mar 26, 2021

robertdavidsmith commented Mar 26, 2021 • edited Loading

janvorli commented Mar 29, 2021

tmds commented Apr 6, 2021

tmds commented Apr 8, 2021

robertdavidsmith commented Apr 9, 2021 • edited Loading

tmds commented Apr 13, 2021

marcin-krystianc commented Apr 23, 2021

robertdavidsmith commented Mar 26, 2021 •

edited

Loading

robertdavidsmith commented Apr 9, 2021 •

edited

Loading