-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ws-daemon] Add page-cache reclaim #8139
Conversation
Codecov Report
@@ Coverage Diff @@
## main #8139 +/- ##
==========================================
+ Coverage 10.03% 15.15% +5.12%
==========================================
Files 24 45 +21
Lines 1944 4375 +2431
==========================================
+ Hits 195 663 +468
- Misses 1742 3655 +1913
- Partials 7 57 +50
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
3dc4b76
to
71fdf9c
Compare
467790b
to
251cef7
Compare
if cache > uint64(float64(limit)*0.15) { | ||
err := ioutil.WriteFile(filepath.Join(memCgroupPath, "memory.force_empty"), []byte("1"), 0644) | ||
if err != nil { | ||
return nil, xerrors.Errorf("cannot read memory.force_empty: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be cannot write
I think
|
||
p, err := strconv.ParseInt(s, 10, 64) | ||
if err != nil { | ||
return 0, xerrors.Errorf("cannot parse memory.limit_in_bytes: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add original value that we are trying to parse here.
251cef7
to
46ee4be
Compare
46ee4be
to
64c7ea2
Compare
Description
This PR introduces a page cache reclaim mechanism for workspace cgroups. The kubelet computes memory pressure using the cgroup memory controller, which includes page-caches. This means that when a workspace does a lot of file IO, we're likely to overestimate the memory pressure, and evict pods when we should not - or fail the scheduling process.
With this change, we regularly check for the page cache use for each workspace on the node. If we consume more than 15% of the cgroup memory limit (1.8GiB) in caches, we'll trigger a page cache reclaim using
memory.force_reclaim
. This should keep the cgroup memory closer to actual memory consumption at the expense of file IO and kernel CPU time.Related Issue(s)
Fixes #7969
How to test
Make sure your workspace has a memory limit set, e.g. by inspecting/modifying ws-manager's config. Beware: on core-dev we currently don't set workspace resource limits.
in a new terminal
once the tar is done, look at the entries in
log.txt
. They should climb for a few seconds (max 15) and drop sharply afterwards. Then they'll climb again.You would expect something like this behaviour:

Release Notes