-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove usable_size
APIs
#17
Comments
When not using the returned Do you think we should use |
With these changes it would be easier to write a wrapper implementation, because less methods would have to be overwritten. |
jemalloc has a function that does this ( Some allocators can query the usable size, by doing a separate FFI call. Just because that can be done does not mean that one should do it, e.g., some of this functionality in some allocators is for "debugging only" and not very fast. Still, you could implement Even when doing the FFI call, as mentioned in the OP, if the FFI function is read only, e.g., because it has the When implementing this functionality in Rust, e.g., by just returning |
I find The problem with using a tuple is that their layout is unspecified, and |
This comment has been minimized.
This comment has been minimized.
I experimented around a bit with removing Removing Removing I neither like the name Summary (what I'd prefer):
If this is plausible, we may close this in favor of a new issue Rename A matter of taste, but I don't like tuple structs and would prefer |
The main use case of AFAIK the only other situation in which it is useful is for debugging / introspection purposes, but this use case is in general better covered by allocator-specific APIs, which allow to query much more information than what "usable_size" is able to provide. Allocators can expose these APIs already, and we do so for jemalloc in jemallocator and in the jemalloc-ctl crates. The question is more of whether it is worth it to have this API in the
How is this any different from calling In practice, most allocators do not really care about the allocation size internally. The first thing they do is compute the size class, and then just use that. The worst case is when you have a memory page where you only use 1 byte, and that requires zeroing the whole page, but that only happens for "large" allocations, and the main optimization that
I wonder whether that's better or worse than just let a = alloc(layout)?
Self { ptr: a.0, size: a.1 } would become let a = alloc(layout)?
Self { ptr: a.0, size: if let Some(sz) = a.1 { sz } else { layout.size() } } where we have an extra branch. Is this worth it? I don't know, seems to add little value, since the allocator could have just set I think that if the point of the
Something like that would LGTM. |
This is also the guarantee, that
Yes, as only the allocator knows of the actual implementation and size hints. Additionally,
Basically the same answer as above:
As I have requested 10 bytes, this is fine.
As we design a trait for the general case, I would not rely on those statements. Rust is also used embedded, where zeroing memory is maybe more expensive, depending on the allocator. "most allocators" is not enough for a trait on a system language IMO.
This is the reason I did an edit to the post and marked it as outdated 🙂 |
Sure, but I do not see why we would need a method that provides this guarantee. We can just say that one is only allowed to pass all methods that need the Layout of an already existing allocation Layouts with the same alignment, and sizes in range In fact, Consider an allocator with 2 size classes, 8 bytes, 16 bytes, and large allocations. Then when requesting 10 bytes, the allocator will compute quickly that it fits in the 16 bytes size class. However, for This information can be interesting, and the introspection methods of the allocators allow fetching the whole size class hierarchy and computing it, but that's often very slow (e.g. glibc's If instead we just say that all methods modifying an allocation and taking its Layout work for Layouts in range
The default implementation of For all allocators I know for which one could override So... we can just provide default implementations of One other aspect to consider is that if |
Thanks for pointing this out, this is indeed a weird behavior! Another strange behavior is combining this with However, I don't like to assume, that every allocator can trivially return an excess, so I don't like to drop
I see three ways to solve this problem:
I think the latter is the best of both worlds. I'll play around with dropping Generally, a potential performance loss is not an option. If we find one use case, where performance is dropped, there will be a bunch of others. By the way, even it turns out we drop the excess API, #21 gets even more unlikely, but this is another topic. |
The problem is that the assumption is true: every allocator can trivially compute the correct excess with zero-overhead and in O(1) space a time by just returning An example would be an "accurate" implementation of
This is incorrect. |
https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487 claims:
|
Sorry, I don't get your point. Sure, every allocator can return
The compiler cannot optimize out the branch for |
That's fixed with rust-lang/rust#58327 which is in FCP with disposition merge. |
Yes, but you cannot assume, that every allocator which uses FFI uses |
If your allocator is written in Rust code, you don't need that. That already works today. Those attributes only help in the only case in which this does not work, which is if your excess computation is an |
Why do you assume every allocator is written in Rust or |
The only thing I assume is that computing usable size is a side-effect free operation, which is the case for all allocators that we currently support. |
I don't mean to offend you, but this is a false assumption IMO.
Could you elaborate on this? Are there allocators we won't support? |
Basically, you are arguing that I disagree that this is useful:
You are claiming that there exist allocators for which:
AFAICT this is not true for any of the widely-used Sure, I can imagine that an |
I might be missing something. jemalloc’s |
@SimonSapin jemalloc jemalloc (also) and tcmalloc have a From the other widely-used allocators, the other similar API is |
Just to get this right, do you argue only against |
I argue that we should remove The returned upper bound on the size only needs to be correct - it does not need to be accurate, which allows just returning the requested size for those allocators that do not support it, or for which only a slow computation exists (e.g. intended for debugging, but not as a hot path). This allows allocators that can provide an accurate allocator size quickly to do better, and don't make users avoid This fully solves the telemetry issue (which is IMO what adds the most value), because the allocator always knows the sizes that the user actually wanted to allocate (e.g., there is no simple way to call This also avoids useless This does not prevent allocators that have a very slow The argument that The argument that this would make allocators with a slow size computation that mutates global state unnecessary slow for things like The argument that this complicates the implementation of the Alloc trait for users is debatable. This change cuts the allocator API in half (due to the removal of the |
I like to come back to this issue. With #14 merged, and #13 would be the next logical step, the I did a test: unsafe fn main_excess() -> Result<NonNull<i32>, AllocErr> {
let allocation = Global.alloc_excess(Layout::new::<i32>())?;
Ok(allocation.0.cast())
}
unsafe fn main() -> Result<NonNull<i32>, AllocErr> {
let allocation = Global.alloc(Layout::new::<i32>())?;
Ok(allocation.cast())
} Both snippets results in mov edi, 4
mov esi, 4
jmp qword ptr [rip + __rust_alloc@GOTPCREL] So replacing Another thing:
|
This is a comparison, on how this could be implemented: rust-lang/rust@cd5441f...TimDiekmann:excess |
Remove `usable_size` APIs This removes the usable size APIs: - remove `usable_size` (obv) - change return type of allocating methods to include the allocated size - remove `_excess` API r? @Amanieu closes rust-lang/wg-allocators#17
Remove `usable_size` APIs This removes the usable size APIs: - remove `usable_size` (obv) - change return type of allocating methods to include the allocated size - remove `_excess` API r? @Amanieu closes rust-lang/wg-allocators#17
A use-case for #[derive(MallocSizeOf)]
struct Foo {
bar: HashMap<String, Vec<u8>>,
baz: Arc<Mutex<Db>>,
}
fn mem_used_by(foo: &Foo) -> usize {
foo.malloc_size_of()
} How would one implement this now? |
Such functionality in inherently allocator-specific. If you know what allocator your code is using then you can call directly into e.g. jemalloc to get the size of an allocation. Alternatively you could create an extension trait on |
That's the problem
if it's implemented in a library, nothing stops users from defining a global allocator they want. |
The issue here only applies to local allocators. |
The
Alloc::usable_size
method is, in general, used like this:This has two big downsides.
The main issue is that
usable_size
makes it impossible to obtain meaningful telemetry information about which allocations the program actually wants to perform, resulting in a program that is essentially "un-tuneable" from the allocator perspective.For example, if one were to dump allocation statistics with jemalloc for a program that uses
usable_size
, this program would appear to be using the allocator's size classes perfectly, independently of how bad these are.The other issue is that code that uses
usable_size
computes the allocation size classes twice, once in theusable_size
call, and once in the call toalloc
(alloc
just gets a layout, it then needs to find the size class appropriate for the layout, which is exactly whatusable_size
already did, butalloc
does not know).So IMO we should remove
usable_size
, and instead, make the allocation functions always return theusable_size
, such that code like the above is now written as:This makes sure that allocators are always able to print accurate statistics of the requested allocators, which can then be used to tune allocator performance for particular applications, and allows allocators to perform the size class computation only once (e.g. jemalloc's
smallocx
). Most ABIs do support two return registers in their calling convention, such that a pointer and a size can be always returned in most platforms without incurring any extra costs (e.g. like having to pass this via memory).Allocators that do not support size class computation (e.g. glibc's
malloc
), can just return the requested layout size in Rust code, which can be inlined. If the usable size of the return is not used, inlining can remove its computation for allocators that do need to perform an extra call to obtain it, as long as that function isreadonly
(e.g. jemalloc'smallocx+nallocx
).This means that all
Alloc::..._excess
functions can be removed, since the normal functions do their job by default.closes #13
The text was updated successfully, but these errors were encountered: