-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There's no way to get the managed object size in memory #24200
Comments
There is an internal mechanism |
@alden-menzalgy would you be interested in writing up a form api proposal for this? |
@joperezr yup, I'm interested. |
@danmosemsft you're right, that's exactly what I'm talking about. |
Yes, here is a very good example of how an Api proposal should look: https://github.com/dotnet/corefx/issues/271. You can also find more info about our review process in general here https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md in case you need. |
Public API to detect the size of the managed object.As mentioned in this article, we don't have such API to get the size in the runtime. So far, if we want to do so, we have a choice to use some tools or techniques (not at the runtime), such as:
On of the very good example of that need, in the case we build customized cache and we want to keep the maximum size limit under control. The developers around the world still asking and requesting such API. How we can implement such thingThe good news that we already have this mechanism already implemented internally by SizedReference Type.
What we gonna do is just to expose it as Public API Proposed APII suggest to add some class under System.Runtime to as a wrapper to SizedReference or just as extension to the object.
|
@joperezr thanks you for your help, please find above proposal. |
The current internal SizedReference API has the following characteristics:
Are you happy with these performance characteristics?
This wrapper cannot be built using the current SizedReference . The current SizedReference design requires you to create the handle first, then you have to wait for Gen2 GC to happen, and then you can see the approximate size at the time of the last Gen2 GC. |
@KKhurin cc |
We only update the size for sized ref handles in gen2 GC because gen2 GCs already need to traverse these object graphs anyway which means it adds no overhead aside from having to write the size into the data associated with these handles; in Server GC there is a caveat because Server GC has multiple GC threads and since these handles were supposed to get the inclusive size, ie, as long as something is in the object graph for the handle it's counted, even if it's referred to by other sources like the stack (so we have all threads finish with the sized ref handle traversing before we start with other sources). Things to think about for the usage scenario for this -
there are ways to not have GC do this - you could do this via debugging APIs (ICorDebugProcess5::GetTypeLayout and related), profiling APIs (similar to debugging ones) or reflection (you could get the fields of an object and calculate the size of each). not to say these are easy to do or don't have their own set of problems - for example, if you use the debugging/profiling APIs you can't attach another live debugger/profiler to the same process. |
There hasn't been any progress on this issue for a couple of years, including no follow-up to the questions posed about whether the significant implications of the proposal still make it useful. As such, I'm going to close this. Thanks. |
So, FWIW, this looks to be a space that Roslyn might get some significant benefit from. The user case, unsurprisingly is exactly the one mentioned at the start:
We would like to have a cache that can keep the max size of things under control. However, we'd like to avoid having to build a system whereby we have to determine that size ourselves. First, it's non-trivial to determine that. Second, it could very easily change in the future as types change and things get refactored. Keeping the computations in check could be challenging. That said, we're in the speculative design space right now, so i wouldn't want this to eb done unless we were really certain this was the design we were going with and that this issue would be of serious help to us. |
+1. I don't see any real reasons not to expose this and it would be useful for several areas of caching in one of my apps. It's a seperate API, but a dangerous |
Given issues outlined by Maoni and Jan do we still consider implementation based on SizedReference? Alternative route would be calcualtion based on fields and their sizes (along with MethodTable ref, and object header), like the way SOS already doing this: https://github.com/dotnet/diagnostics/blob/master/src/SOS/Strike/sos.cpp#L170. And a bit of side question: should we count ObjHeader while we are calculating object size? I mean its has... intresting position in the layout and both answers on question above could lead to some pitfalls. |
I'd like to present alternative API proposal. Summary of implementation approachesAs outlined in comments above, currently we have couple of ways to get size of object each with its own drawbacks:
ImplementationIn this proposal I'd like to advocate for reflection-based approach. While it's not ideal (by reasons outlined in previous section) it has its own merits too: it is transparent for end user, more accessible for maintenance and provides educational insight on inner working of CLR for curious (since it's just C# code). Summarizing, object size calculation would be done using sum of sizes of all declared fields in object's class and in its parent classes, with accounting for sizes of object header and method table pointer. Exclusive size calculations treat references to other heap objects as pointer width fields. Inclusive size calculations add up sizes of referenced objects recursively. Public API proposal
TestabilityAs discussed in implementation section of this proposal, automated testing should be our main way to keep size calculation implementation in check with CLR implementation. To keep testing accurate we could utilize debugging API mentioned by @Maoni0, since its acceptable to deny external debugging for testing phase of build pipeline for this feature. Alternatively, we could define sizes of testing objects as constants (extracting them from debugger, for example) and compare results to them but this way we fail on keeping size calculation implementation in check with CLR implementations. Updates
|
My 2 cents. Our company would be using this for cache size limits and my understanding from reading this issues is what most would use it for. Not having way of calculating object size recursively would mean everyone would have to write their own code to loop through objects fields and call this GetExclusiveSize method for every one of those fields. Because strings are objects would this also mean strings wouldn't be included in this size calculation? It's good to have exclusive way of calculating object size but API should have way to calculating object size plus size of all its referenced objects and what they reference and so on. |
@wanton7 Since most of the groundwork for size calculation would be done for exclusive size anyway, I don't see any issues exposing inclusive size API. Let me update proposal. |
@leorik reflection makes AOT much harder (or outright impossible where interpreter is not allowed) or app noticeably slower (reflection is known to be slow in general) and this also should be taken into account as drawbacks |
Another minus from my limited understanding of reflection you need to do boxing for struct fields when using reflection. I would really like if that method that calculates objects size wouldn't do any heap allocations. |
While I like the API, reflection blocks AOT so unfortunately this app would be useless for me and at least one other I know who would be using this API |
This is not necessary to ship 5.0. Moving milestone. |
Any updates, it would be really useful to use in our app, otherwise we need to traverse object graph by our selves + estimate the exclusive sizes based on platform we running in. |
Based on these requests the problem doesn't seem to be object size per se but rather a collection that has an upper memory limit and when that limit is hit, objects are collected. Some sort of smart cache that one can place objects in and when the GC detects that memory limit is close, it can look in the collection and collect objects that are have a large inclusive memory foot print. Is that accurate or is the memory that an arbitrary object references through its graph matter for another reason? |
It's largely from the current API of the .Net Memory Cache. It's completely on the developer to set a size limit for the cache, and to determine a size for each cached item. This is currently infeasible to do for objects. An alternative is storing serialized data instead of the raw objects. The serialized data would have a definite size, but has runtime overhead reading and writing to the cache. |
I stumbled across this issue while porting a .NET application to .NET Core, recognizing that System.Runtime.Caching.MemoryCache does not provide the approximate size anymore. I thought to give it a shot and implemented a library that provides an in-application "ObjSize" like function. To those interested it can be found here. I have written some unit tests that basically run ClrMD (on the test process itself) to compare what it would report as size to what my library calculates. However, test coverage is far from "complete", regarding possible types. Most ideas and partly code has been taken from the runtime itself. And while I think I have learned quite some things about the topic, I must admit that I mostly work on evidence based resulsts (i.e. if ClrMD reports the same size as my library, I'm good so far). So if you even think about adopting this code/library for your own purposes, keep this in mind! |
This code depends on undocumented internal implementation details of the runtime. These implementation details are subject to change and differ between .NET runtimes. This code only works for current CoreCLR version, it is not guaranteed to work anywhere else. |
Yes, thank you. I‘m aware of that. It is meant as a „prototype“ really. For a how an in process solution could look like. And to show how a unit test using a „debugger“ to validate results (which was discussed in a comment above) could look. Anyway, I will make this clearer in the readme tomorrow. If not asked too much, can you elaborate how it is possibly more dependent on internals or undocumented(?) behavior than, say, what ClrMD does? I can fully acknowledge that the only robust way this could be pulled off, would have to be a solution from MS (that is updated when internals change) or if the CLR provides necessary APIs to built upon. I still thought an attempt at it provided some merit. |
Yes, it is similar. Current version of ClrMD is not expected to work on future runtime version. ClrMD typically requires updates for every new runtime version or variant. |
I would love to see something like this. In my case, I'm working on a stateful distributed processing framework. Many real world pipelines will be bottlenecked by memory - think of eg lots of distributed table joins, where each operator needs to hold the full state of each input table in memory. Now I'd love to have some way of displaying the approximate state size of each operation on each cluster, as that would be an extremely easy-to-observe piece of information to anyone designing a new pipeline where their major bottlenecks lie. For this use case, it would be particularly useful to be able to get inclusive sizes - and ideally, incur relatively little computation overhead (which slightly speaks against eg a reflection-based approach). Re-using GC info from gen2 collections would work very well here. Having said that, any implementation could likely be made to work, even if there is a non-insignificant computation overhead. In short: Would be amazing to have official support for this type of API. Being able to fetch inclusive memory sizes would be very beneficial. Other implementation/API details are slightly less critical to me personally |
@fabianoliver I totally agree! |
Public API to detect the size of the managed object.
As mentioned in this article, we don't have such API to get the size in the runtime.
So far, if we want to do so, we have a choice to use some tools or techniques (not at the runtime), such as:
On of the very good example of that need, in the case we build customized cache and we want to keep the maximum size limit under control.
The developers around the world still asking and requesting such API.
How we can implement such thing
The good news that we already have this mechanism already implemented internally by SizedReference Type.
What we gonna do is just to expose it as Public API
Proposed API
I suggest to add some class under System.Runtime to as a wrapper to SizedReference
or just as extension to the object.
@alden-menzalgy commented on Sun Nov 05 2017
Hi,
There is no way to get the actual (or even an approximate) size of the cached object in the memory.
The case is, we have +100 customized cache for different purposes to accelerate our platform, for each cache we have
Whenever we insert some object to be cached, we should detect its size to re-calculate aforementioned cache properties.
I know that's a complex issue and depends on many factors, so far we have some workarounds but non of them is official and we afraid to be changed in any minor or major release.
Should you add any API to get the object size in the memory ? or at least Type size and then we can add the object-specific data length.
Related Topics
Workaround 1
What Microsoft says about this issue
@JanEggers commented on Sat Nov 11 2017
you can use structs for your cache items and Marshal.Sizeof
@alden-menzalgy commented on Mon Nov 13 2017
It requires changing the structure that we've built the application with.
It may consider a workaround but not permanent solution
The text was updated successfully, but these errors were encountered: