Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS does not honor "echo 3 > drop_cache" #12810

Closed
shodanshok opened this issue Nov 30, 2021 · 4 comments
Closed

ZFS does not honor "echo 3 > drop_cache" #12810

shodanshok opened this issue Nov 30, 2021 · 4 comments
Labels
Component: Memory Management kernel memory management Status: Understood The root cause of the issue is known Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@shodanshok
Copy link
Contributor

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 7
Kernel Version 3.10.0-1160.45.1.el7.x86_64
Architecture x86_64
OpenZFS Version 2.0.6-1

Describe the problem you're observing

ZFS does not drop caches when kernel is asked to do that via echo 3 > /proc/sys/vm/drop_caches

Practical example:

[root@localhost ~]# arc_summary -p 1

------------------------------------------------------------------------
ZFS Subsystem Report                            Tue Nov 30 23:43:46 2021
ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                1.26M
        Mutex Misses:                           133
        Evict Skips:                            21.43k

ARC Size:                               96.47%  30.04   GiB
        Target Size: (Adaptive)         96.88%  30.17   GiB
        Min Size (Hard Limit):          6.25%   1.95    GiB
        Max Size (High Water):          16:1    31.14   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       1.22%   367.89  MiB
        Frequently Used Cache Size:     98.78%  29.05   GiB
        Metadata Size (Hard Limit):     75.00%  23.35   GiB
        Metadata Size:                  3.76%   898.85  MiB
        Dnode Size (Hard Limit):        10.00%  2.34    GiB
        Dnode Size:                     0.44%   10.41   MiB

ARC Hash Breakdown:
        Elements Max:                           9.69M
        Elements Current:               53.51%  5.18M
        Collisions:                             39.58M
        Chain Max:                              10
        Chains:                                 1.07M

[root@localhost ~]# echo 3 > /proc/sys/vm/drop_caches
[root@localhost ~]# arc_summary -p 1

------------------------------------------------------------------------
ZFS Subsystem Report                            Tue Nov 30 23:45:20 2021
ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                1.26M
        Mutex Misses:                           133
        Evict Skips:                            21.43k

ARC Size:                               95.48%  29.73   GiB
        Target Size: (Adaptive)         95.57%  29.76   GiB
        Min Size (Hard Limit):          6.25%   1.95    GiB
        Max Size (High Water):          16:1    31.14   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       0.91%   270.48  MiB
        Frequently Used Cache Size:     99.09%  28.84   GiB
        Metadata Size (Hard Limit):     75.00%  23.35   GiB
        Metadata Size:                  3.80%   909.33  MiB
        Dnode Size (Hard Limit):        10.00%  2.34    GiB
        Dnode Size:                     0.44%   10.45   MiB

ARC Hash Breakdown:
        Elements Max:                           9.69M
        Elements Current:               53.50%  5.18M
        Collisions:                             39.58M
        Chain Max:                              10
        Chains:                                 1.07M

Please see how little differs the first, pre-drop arc_summary from the second, post-drop one. It almost seems as now echo 3 > /proc/sys/vm/drop_caches only runs a single iteration of the ARC shrinker.

The very same test works flawlessly on ZFS 0.7 and 0.8 releases. Not being able to release memory in emergency situations when using ZFS 2.0.x can be a significant issue.

Describe how to reproduce the problem

Bring some data in ARC and then try to release them via echo 3 > /proc/sys/vm/drop_caches

Include any warning/errors/backtraces from the system logs

@shodanshok shodanshok added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 30, 2021
@behlendorf
Copy link
Contributor

This is likely a consequence of the ARC changes made in PR #10600 (commit 3442c2a) which limited how much of the ARC could be reclaimed in a single pass. In part, the motivation here was the prevent the case where too much was reclaimed from the ARC when memory was low. This could lead to a poor hit rate and large latency spikes while the reclaim was in progress.

Interestingly, I wasn't able to easily reproduce this problem with OpenZFS 2.1 with the RHEL 4.18.0-305.25.1 kernel. However, I see you're using the 3.10.0-1160.45.1 which may behave a bit differently (there were significant kernel changes in this area). If you're game, I can suggest a workaround which may work for you and I'd be interested to know the results.

  • Set the zfs_arc_shrinker_limit=0 module option. This will disable the ARC reclaim limiting I mentioned above which should allow echo 3 > /proc/sys/vm/drop_caches to reclaim everything.
     zfs_arc_shrinker_limit=10000 (int)
             This is a limit on how many pages the ARC shrinker makes available for eviction in response to one page
             allocation attempt.  Note that in practice, the kernel's shrinker can ask us to evict up to about four
             times this for one allocation attempt.

             The default limit of 10000 (in practice, 160MB per allocation attempt with 4kB pages) limits the amount of
             time spent attempting to reclaim ARC memory to less than 100ms per allocation attempt, even with a small
             average compressed block size of ~8kB.

             The parameter can be set to 0 (zero) to disable the limit, and only applies on Linux.

@behlendorf behlendorf added the Component: Memory Management kernel memory management label Dec 1, 2021
@behlendorf
Copy link
Contributor

Related PR #12228.

@shodanshok
Copy link
Contributor Author

shodanshok commented Dec 1, 2021

Your suggested solution (zfs_arc_shrinker_limit=0) worked:

[root@localhost parameters]# echo 0 > zfs_arc_shrinker_limit
[root@localhost parameters]# echo 3 > /proc/sys/vm/drop_caches
[root@localhost parameters]# arc_summary -p 1

------------------------------------------------------------------------
ZFS Subsystem Report                            Wed Dec 01 01:27:12 2021
ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                1.99M
        Mutex Misses:                           145
        Evict Skips:                            21.43k

ARC Size:                               4.32%   1.35    GiB
        Target Size: (Adaptive)         6.25%   1.95    GiB
        Min Size (Hard Limit):          6.25%   1.95    GiB
        Max Size (High Water):          16:1    31.14   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       6.61%   59.40   MiB
        Frequently Used Cache Size:     93.39%  838.69  MiB
        Metadata Size (Hard Limit):     75.00%  23.35   GiB
        Metadata Size:                  2.35%   562.08  MiB
        Dnode Size (Hard Limit):        10.00%  2.34    GiB
        Dnode Size:                     0.41%   9.69    MiB

Thank you for the quick feedback and for pointing me to the relative commit.

@stale
Copy link

stale bot commented Dec 3, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Dec 3, 2022
@stale stale bot closed this as completed Mar 19, 2023
@behlendorf behlendorf added Status: Understood The root cause of the issue is known and removed Status: Stale No recent activity for issue labels Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Memory Management kernel memory management Status: Understood The root cause of the issue is known Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants