-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux: Set spl_kmem_cache_slab_limit when page size !4K #12152
Conversation
Wouldn't it be better to make this a |
@AttilaFueloep 4k, 16k, and 64k pages are all supported for aarch64, but a specific page size needs to be selected when compiling the kernel. So there shouldn't be an issue setting this at compile time.
|
Yes, but what happens if you compile the module on a box with a 4K page sized kernel and then load it on a box with a 64k kernel? Then there is a discrepancy between the compile time and the running on page sizes. Or am I missing something here? |
I think we should at least detect page size mismatches and refuse to load the module if so. |
I believe this should already be the case even without additional changes. Though I'm not having much luck finding any documentation to confirm this. |
@omarkilani thanks for the testing, I opted for setting the cutoff at the page size because the kernel's slab implementation(s) should allocate a single page per slab. As long as the object size is less than the page size I'd expect the kernel's slab to be both faster and more space efficient since it's highly optimized. Your testing seems to back that up. Where that story changes is a bit when allocations start requiring multiple pages. |
Yeah, I couldn't find any documentation either and even grepping the Linux and kmod sources didn't brought up something revealing. Unfortunately I don't have a box supporting multiple page sizes to test this on. So I'll take this for granted, it seems to be quite a corner case anyway. As a side note, not directly related to your change, the mixed usages of |
The mixed usages comes from the fact that In the platform specific code ( |
I see, thanks for the thorough explanation. No reason to change anything then. |
FYI that isn't true for caches with entries >=256 bytes, at least on PAGESIZE=4K systems:
Given the possibility of fragmentation, this aspect of the kernel's slab implementation seems like a bad design decision to me. |
@ahrens yes, upon re-reading what I wrote I clearly left out more than a few important qualifiers. Looking at this again, I wonder if it wouldn't be preferable to simply set the cutoff at 16k for all architectures. Based on @omarkilani's testing any performance impact seems negligible. Thoughts? |
This is probably a dumb question but how does one run the Googling |
@omarkilani |
I spent most of the day rebuilding kernels and dependencies to get
I did some more tests with In both cases nothing above
I ran IMHO, 16k is probably fine for a default setting if you're worried about fragmentation. I'm not sure it's the "optimal" setting for all workloads, but nothing bad's going to happen and people can just up the limit for Postgres or whatever. Anything's better than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good. Thanks for fixing the issue.
For small objects the kernel's slab implemention is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to PAGE_SIZE on systems using larger pages. Since 16,384 bytes was experimentally determined to yield the best performance on 4K page systems this is used as the cutoff. This means on 4K page systems there is no functional change. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#11429 Closes openzfs#11574 Closes openzfs#12150
I've gone ahead and updated the PR to always use a 16K cutoff.. This keeps the behavior the same across platforms add should help mitigate any issues we might otherwise end up seeing with fragmentation. |
For small objects the kernel's slab implementation is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to 16K for all architectures. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12152 Closes openzfs#11429 Closes openzfs#11574 Closes openzfs#12150
For small objects the kernel's slab implementation is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to 16K for all architectures. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12152 Closes openzfs#11429 Closes openzfs#11574 Closes openzfs#12150
For small objects the kernel's slab implementation is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to 16K for all architectures. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12152 Closes openzfs#11429 Closes openzfs#11574 Closes openzfs#12150
For small objects the kernel's slab implementation is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to 16K for all architectures. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12152 Closes openzfs#11429 Closes openzfs#11574 Closes openzfs#12150
For small objects the kernel's slab implementation is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to 16K for all architectures. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12152 Closes #11429 Closes #11574 Closes #12150
Motivation and Context
It was observed on systems using a 64k page size that running
zpool scrub
could exhaust memory and trigger the OOM killer.This should never be possible.
Issues #11429 #11574 #12150
Description
For small objects the kernel's slab implemention is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.
This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).
To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to PAGE_SIZE on systems using larger pages. Since 16,384 bytes
was experimentally determined to yield the best performance on
4K page systems this is used as the cutoff. This means on 4K
page systems there is no functional change.
This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.
How Has This Been Tested?
Locally compiled but it has not yet been tested on a system
using a large page size.
Types of changes
Checklist:
Signed-off-by
.