Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Execution State] Linux page cache is holding 425+ GB RAM, so make it drop checkpoint files to free up 294-394GB RAM #2261

Closed
fxamacker opened this issue Apr 6, 2022 · 1 comment · Fixed by #2280
Assignees
Labels
Execution Cadence Execution Team Performance

Comments

@fxamacker
Copy link
Member

fxamacker commented Apr 6, 2022

Problem

On execution nodes, Linux eventually holds 425+ GB RAM in its file cache (shown as buff/cache in top). This is caused by Linux automatically caching files we read or write.

Among other problems, Grafana's EN Memory Usage chart (and other tools) doesn't exclude the Linux page cache, which obscures Go's memory usage patterns. E.g. Grafana doesn't show operational memory dropped by 250+ GB from PR #1944.

UPDATE: As of June 15 (mainnet18 spork) checkpoint files are 98GB.

Reading and writing large (98GB) files can cause Linux to cache them even after the program exits. For example, checkpointing includes:

  • reading 98 GB from old checkpoint file
  • writing 98 GB to a new checkpoint file

This 196GB growth in the file cache after each checkpointing is cumulative, and Linux can end up automatically caching 3-5 checkpoint files in memory.

Updates epic #1744

The Proposed Solution

  • Avoid clearing out the entire file system cache.
  • Drop the new checkpoint file (that was created) from the cache
  • Drop the old checkpoint file (that was read) from the cache
  • Optionally, also do this for WAL files

Proof of concept

On benchnet (using 53GB files), checkpoint creation began with OS file cache at around 2 GB. Once checkpoint file loading and creation activity begins, the OS cache use might peak at 106GB and then continue using about 105GB after the benchmark program exits.

image

  1. Run checkpoint.00003464 creation benchmark.
    OS file cache will be around 105 GB after benchmark program exits.

  2. Run dd if=checkpoint.00003464 iflag=nocache count=0 (these params won't modify files).
    OS file cache will immediately drop by the checkpoint file size (around 53GB).

Outside of benchnet, @zhangchiqing confirmed using the dd command on 3 checkpoint files also reduced the memory used by OS cache by the combined file sizes.

Caveats

  • This is primarily aimed at having Grafana, etc. show expected memory use (to avoid hunting for nonexistent leaks, etc.)
  • May need to look into special considerations when running inside a container.
@fxamacker fxamacker changed the title [Execution State] Drop checkpoint files from OS file cache to free up 50-100GB RAM after checkpoint creation [Execution State] Drop checkpoint files from OS file cache to free up 105GB RAM after checkpoint creation Apr 6, 2022
@fxamacker fxamacker self-assigned this Apr 15, 2022
@fxamacker fxamacker changed the title [Execution State] Drop checkpoint files from OS file cache to free up 105GB RAM after checkpoint creation [Execution State] Linux file cache is holding 425+ GB RAM, so make it drop checkpoint files to free up 132-264+GB RAM Apr 18, 2022
@fxamacker fxamacker changed the title [Execution State] Linux file cache is holding 425+ GB RAM, so make it drop checkpoint files to free up 132-264+GB RAM [Execution State] Linux file cache is holding 425+ GB RAM, so make it drop checkpoint files to free up 198-264+GB RAM Apr 18, 2022
@fxamacker
Copy link
Member Author

Closed by #2280 on April 18, 2022.

@fxamacker fxamacker changed the title [Execution State] Linux file cache is holding 425+ GB RAM, so make it drop checkpoint files to free up 198-264+GB RAM [Execution State] Linux page cache is holding 425+ GB RAM, so make it drop checkpoint files to free up 294-394GB RAM Jun 16, 2022
@fxamacker fxamacker added the Execution Cadence Execution Team label Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Execution Cadence Execution Team Performance
Projects
None yet
1 participant