Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Linux Page Cache from hoarding 294GB - 394GB RAM on Execution Nodes #2336

Merged
merged 6 commits into from
Apr 26, 2022

Conversation

fxamacker
Copy link
Member

@fxamacker fxamacker commented Apr 21, 2022

Description

Linux automatically caches files when we read or write files, so this PR uses posix_fadvise(2) to evict some very large files after they are used (98GB each as of June 15, 2022).

Prior to this PR, Linux was increasing its page cache by 196GB after each checkpointing (+98GB after reading old file and another +98GB after creating new file). After just two checkpointings, Linux page cache can potentially grow nearly 400GB which was caused various problems such as Grafana EN Memory Usage chart being obfuscated.

This PR replaces PR #2281 which didn't do anything because we didn't want to deploy a file it requires (/bin/dd). Unlike PR #2291, this PR doesn't require /bin/dd.

Updates Epic #1744 because Grafana EN Memory Usage Chart was previously obfuscating/hiding the impact of ledger and checkpoint memory optimizations (e.g. PR #1944).

Changes:

  • At startup, advise Linux to evict all checkpoint files (up to 5 x 98GB) from Linux page cache.
  • Eliminate logs about dd not being found in /usr/bin and /bin
  • Use sys/unix package to advise Linux to evict from cache:
    • existing checkpoint file after it is loaded
    • new checkpoint file after it is created

Next Steps

A separate PR will advise Linux to evict segments of checkpoint files during use, which will provide more fine-grained control as described in Other Approaches section of PR #2280.

Use sys/unix package to advise Linux to evict from cache:

* existing checkpoint file after it is loaded
* new checkpoint file after it is created

This prevents 132+ GB from accumulating in Linux page cache
after each checkpointing.
At startup, advise Linux to evict all checkpoint files from
Linux page cache.
ledger/complete/wal/syncrename.go Outdated Show resolved Hide resolved
logger.Info().Msgf("run %q to drop file from OS file cache", cmd.String())
// evictFileFromLinuxPageCache advises Linux to evict a file from Linux page cache.
// A use case is when a new checkpoint is loaded or created, Linux may cache big
// checkpoint files in memory until evictFileFromLinuxPageCache causes them to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// checkpoint files in memory until evictFileFromLinuxPageCache causes them to be
// checkpoint files in memory until evictFileFromLinuxPageCache is called to force them to be

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evictFileFromLinuxPageCache calls posix_fadvise under the hood. And posix_fadvise docs say that it's not binding advice and merely constitutes an expectation on behalf of the application.

In benchnet tests, the file is always removed from Linux page cache after calling evictFileFromLinuxPageCache.

@codecov-commenter
Copy link

Codecov Report

Merging #2336 (15cdac5) into master (ba1f157) will decrease coverage by 0.06%.
The diff coverage is 44.58%.

@@            Coverage Diff             @@
##           master    #2336      +/-   ##
==========================================
- Coverage   57.47%   57.40%   -0.07%     
==========================================
  Files         645      646       +1     
  Lines       38428    38634     +206     
==========================================
+ Hits        22085    22179      +94     
- Misses      13530    13623      +93     
- Partials     2813     2832      +19     
Flag Coverage Δ
unittests 57.40% <44.58%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
engine/common/rpc/convert/convert.go 16.07% <0.00%> (+2.13%) ⬆️
storage/badger/operation/transaction_results.go 0.00% <0.00%> (ø)
storage/badger/transaction_results.go 62.60% <31.42%> (-12.40%) ⬇️
ledger/complete/wal/syncrename.go 25.80% <33.33%> (ø)
engine/access/rpc/backend/backend_transactions.go 52.86% <43.92%> (-4.50%) ⬇️
ledger/complete/wal/fadvise_linux.go 50.00% <50.00%> (ø)
ledger/complete/wal/checkpointer.go 60.59% <51.21%> (-1.45%) ⬇️
engine/execution/rpc/engine.go 53.22% <65.78%> (+2.27%) ⬆️
fvm/handler/contract.go 75.32% <0.00%> (-2.60%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4516276...15cdac5. Read the comment docs.

@fxamacker fxamacker merged commit b341963 into master Apr 26, 2022
@fxamacker fxamacker deleted the fxamacker/drop-file-cache-using-sys branch April 26, 2022 20:43
@fxamacker fxamacker changed the title Advise Linux to drop files from page cache (it was holding 425 GB of which 198+ GB were 3 checkpoint files) Drop large files from Linux page cache without external cmd to prevent cumulative cache growth in 148GB increments May 16, 2022
@fxamacker fxamacker changed the title Drop large files from Linux page cache without external cmd to prevent cumulative cache growth in 148GB increments Prevent Linux Page Cache from hoarding 294GB - 394GB RAM on Execution Nodes Jun 16, 2022
@fxamacker fxamacker added the Execution Cadence Execution Team label Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants