Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There's no way to get the managed object size in memory #24200

Open
danmoseley opened this issue Nov 20, 2017 · 32 comments
Open

There's no way to get the managed object size in memory #24200

danmoseley opened this issue Nov 20, 2017 · 32 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@danmoseley
Copy link
Member

Public API to detect the size of the managed object.

As mentioned in this article, we don't have such API to get the size in the runtime.

So far, if we want to do so, we have a choice to use some tools or techniques (not at the runtime), such as:

  • using dotMemory.
  • using WinDBG and calculate the object fields one by one.
  • measuring the GC before and after ccreating the objects.

On of the very good example of that need, in the case we build customized cache and we want to keep the maximum size limit under control.

The developers around the world still asking and requesting such API.

How we can implement such thing

The good news that we already have this mechanism already implemented internally by SizedReference Type.

internal class SizedReference : IDisposable
{	
	//some code here

	public SizedReference(Object target) { ... }
	
	public Object Target { get {...} }

	public Int64 ApproximateSize { get { ... } }
	
	public void Dispose() {}
}

What we gonna do is just to expose it as Public API

Proposed API

I suggest to add some class under System.Runtime to as a wrapper to SizedReference

or just as extension to the object.

/// <summary>
/// Get the approximate size allocated for this object in the memory in bytes
/// </summary>
/// <param name="obj">the object that you want to know its size</param>
/// <returns>size in bytes</returns>
public static long GetSize(this object obj)
{
	//using SizedReference Type to get the approximate size
}


@alden-menzalgy commented on Sun Nov 05 2017

Hi,

There is no way to get the actual (or even an approximate) size of the cached object in the memory.

The case is, we have +100 customized cache for different purposes to accelerate our platform, for each cache we have

  • Cache MaxSize
  • Cache ConsumedSize
  • Cache RemainingSize

Whenever we insert some object to be cached, we should detect its size to re-calculate aforementioned cache properties.

I know that's a complex issue and depends on many factors, so far we have some workarounds but non of them is official and we afraid to be changed in any minor or major release.

Should you add any API to get the object size in the memory ? or at least Type size and then we can add the object-specific data length.

Related Topics

Workaround 1
What Microsoft says about this issue


@JanEggers commented on Sat Nov 11 2017

you can use structs for your cache items and Marshal.Sizeof


@alden-menzalgy commented on Mon Nov 13 2017

It requires changing the structure that we've built the application with.
It may consider a workaround but not permanent solution

@danmoseley
Copy link
Member Author

@danmoseley
Copy link
Member Author

There is an internal mechanism SizedReference in Desktop and the linked issue is proposing porting this to .NET Core and making it public, which seems like it's what you need assuming you want the transitive size (which is what I assume it returned)

@joperezr
Copy link
Member

@alden-menzalgy would you be interested in writing up a form api proposal for this?

@almez
Copy link

almez commented Nov 22, 2017

@joperezr yup, I'm interested.
Is there any procedure should be done before starting with the proposal ?

@almez
Copy link

almez commented Nov 22, 2017

@danmosemsft you're right, that's exactly what I'm talking about.

@joperezr
Copy link
Member

Yes, here is a very good example of how an Api proposal should look: https://github.com/dotnet/corefx/issues/271. You can also find more info about our review process in general here https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md in case you need.

@almez
Copy link

almez commented Nov 23, 2017

Public API to detect the size of the managed object.

As mentioned in this article, we don't have such API to get the size in the runtime.

So far, if we want to do so, we have a choice to use some tools or techniques (not at the runtime), such as:

  • using dotMemory.
  • using WinDBG and calculate the object fields one by one.
  • measuring the GC before and after ccreating the objects.

On of the very good example of that need, in the case we build customized cache and we want to keep the maximum size limit under control.

The developers around the world still asking and requesting such API.

How we can implement such thing

The good news that we already have this mechanism already implemented internally by SizedReference Type.

internal class SizedReference : IDisposable
{	
	//some code here

	public SizedReference(Object target) { ... }
	
	public Object Target { get {...} }

	public Int64 ApproximateSize { get { ... } }
	
	public void Dispose() {}
}

What we gonna do is just to expose it as Public API

Proposed API

I suggest to add some class under System.Runtime to as a wrapper to SizedReference

or just as extension to the object.

/// <summary>
/// Get the approximate size allocated for this object in the memory in bytes
/// </summary>
/// <param name="obj">the object that you want to know its size</param>
/// <returns>size in bytes</returns>
public static long GetSize(this object obj)
{
	//using SizedReference Type to get the approximate size
}

@almez
Copy link

almez commented Nov 23, 2017

@joperezr thanks you for your help, please find above proposal.

@jkotas
Copy link
Member

jkotas commented Nov 23, 2017

The current internal SizedReference API has the following characteristics:

  • The size is accurate only after Gen2 GC.
  • These handles are expensive. Basically, the GC has to traverse the whole object graph for each to compute the size.

Are you happy with these performance characteristics?

I suggest to add some class under System.Runtime to as a wrapper to SizedReference

This wrapper cannot be built using the current SizedReference . The current SizedReference design requires you to create the handle first, then you have to wait for Gen2 GC to happen, and then you can see the approximate size at the time of the last Gen2 GC.

@jkotas
Copy link
Member

jkotas commented Nov 23, 2017

cc @Maoni0 @swgillespie

@danmoseley
Copy link
Member Author

@KKhurin cc

@Maoni0
Copy link
Member

Maoni0 commented Nov 28, 2017

We only update the size for sized ref handles in gen2 GC because gen2 GCs already need to traverse these object graphs anyway which means it adds no overhead aside from having to write the size into the data associated with these handles; in Server GC there is a caveat because Server GC has multiple GC threads and since these handles were supposed to get the inclusive size, ie, as long as something is in the object graph for the handle it's counted, even if it's referred to by other sources like the stack (so we have all threads finish with the sized ref handle traversing before we start with other sources).

Things to think about for the usage scenario for this -

  • do you want inclusive size or exclusive size? asp.net folks originally wanted inclusive size; now we are discussing using the exclusive size instead based on their usage scenario.

  • does this need to be accurate? right now we do this in gen2 because it's cheap (gen2 already needs to do this anyway). also if you want to get this size in gen0/1 GCs there's also complications of an object being alive because an older generation object is holding onto it. and if you want to get the size while the user threads are running the graph can be modified as you are getting the size.

there are ways to not have GC do this - you could do this via debugging APIs (ICorDebugProcess5::GetTypeLayout and related), profiling APIs (similar to debugging ones) or reflection (you could get the fields of an object and calculate the size of each). not to say these are easy to do or don't have their own set of problems - for example, if you use the debugging/profiling APIs you can't attach another live debugger/profiler to the same process.

@stephentoub
Copy link
Member

There hasn't been any progress on this issue for a couple of years, including no follow-up to the questions posed about whether the significant implications of the proposal still make it useful. As such, I'm going to close this. Thanks.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@CyrusNajmabadi
Copy link
Member

So, FWIW, this looks to be a space that Roslyn might get some significant benefit from. The user case, unsurprisingly is exactly the one mentioned at the start:

On of the very good example of that need, in the case we build customized cache and we want to keep the maximum size limit under control.

We would like to have a cache that can keep the max size of things under control. However, we'd like to avoid having to build a system whereby we have to determine that size ourselves. First, it's non-trivial to determine that. Second, it could very easily change in the future as types change and things get refactored. Keeping the computations in check could be challenging.

That said, we're in the speculative design space right now, so i wouldn't want this to eb done unless we were really certain this was the design we were going with and that this issue would be of serious help to us.

@john-h-k
Copy link
Contributor

+1. I don't see any real reasons not to expose this and it would be useful for several areas of caching in one of my apps.

It's a seperate API, but a dangerous ZeroObject or something in runtime helpers that safely zeroes the entire memory of an object would also be useful for caching

@leorik
Copy link

leorik commented Jun 5, 2020

Given issues outlined by Maoni and Jan do we still consider implementation based on SizedReference?

Alternative route would be calcualtion based on fields and their sizes (along with MethodTable ref, and object header), like the way SOS already doing this: https://github.com/dotnet/diagnostics/blob/master/src/SOS/Strike/sos.cpp#L170.

And a bit of side question: should we count ObjHeader while we are calculating object size? I mean its has... intresting position in the layout and both answers on question above could lead to some pitfalls.

@leorik
Copy link

leorik commented Jun 15, 2020

I'd like to present alternative API proposal.
I'm sorry in advance if I'm using user mentions too liberally: I wasn't able to locate anything about mentions etiquette in community or contributions guidelines. And since mentioned people participated in discussion before I judged it's warranted.

Summary of implementation approaches

As outlined in comments above, currently we have couple of ways to get size of object each with its own drawbacks:

  1. Expose internal SizedReference, as proposed by @danmosemsft in this comment. Drawbacks to this approach was described by @jkotas in this comment: it's either inaccurate (because it requires object traversal to be accurate) or expensive (if we force object graph traversal).
  2. Use debugger API to get object size as proposed by @Maoni0 here. Drawbacks was outlined in same comment, using debugger API we could deny usage of real debugger with code using our API.
  3. Use reflection to enumerate type members as proposed by @Maoni0. Obvious drawback to this approach is that we required to calculate size of members based on some assumptions about object layout (e.g.: based observed behavior, for now CLR align structs when embedding them to objects) and these calculations are divorced from actual runtime implementations.

Implementation

In this proposal I'd like to advocate for reflection-based approach. While it's not ideal (by reasons outlined in previous section) it has its own merits too: it is transparent for end user, more accessible for maintenance and provides educational insight on inner working of CLR for curious (since it's just C# code).
Outlined drawbacks could be mitigated by automated unit testing.
Besides, SOS already using calculation-based approach. I think if calculation drawbacks are acceptable for debugger, I don't see reasons why they are show stoppers for library code.

Summarizing, object size calculation would be done using sum of sizes of all declared fields in object's class and in its parent classes, with accounting for sizes of object header and method table pointer. Exclusive size calculations treat references to other heap objects as pointer width fields. Inclusive size calculations add up sizes of referenced objects recursively.

Public API proposal

namespace System.Reflection
{
    /// <summary>
    ///     Extension methods for reflection-based calculation of runtime memory footprint of managed objects.
    /// </summary>
    public static class ObjectSizeExtensions
    {
        /// <summary>
        ///     Calculates approximate memory footprint of object itself, not accounting for sizes of referenced objects. 
        /// </summary>
        /// <param name="obj">Object to calculate size of. </param>
        /// <exception cref="System.ArgumentNullException">If provided object is null. </exception>
        /// <returns>Approximate size of managed object. </returns>
        public static int GetExclusiveSize(this object obj);
		
        /// <summary>
        ///     Calculates approximate memory footprint of object and its reference graph, recursively adding up sizes of referenced objects.
        /// </summary>
        /// <param name="obj">Object to calculate size of. </param>
        /// <exception cref="System.ArgumentNullException">If provided object is null. </exception>
        /// <returns>Approximate size of managed object and its reference graph. </returns>
        public static int GetInclusiveSize(this object obj);
    }
}

Testability

As discussed in implementation section of this proposal, automated testing should be our main way to keep size calculation implementation in check with CLR implementation. To keep testing accurate we could utilize debugging API mentioned by @Maoni0, since its acceptable to deny external debugging for testing phase of build pipeline for this feature.
Here's my qualifications end and I'd like to take more informed suggestions on how we should implement this. One way I see it, we could host managed test project inside other native testing project and try to programmatically attach debugger to it.

Alternatively, we could define sizes of testing objects as constants (extracting them from debugger, for example) and compare results to them but this way we fail on keeping size calculation implementation in check with CLR implementations.

Updates

  • Added proposal for inclusive size API

@wanton7
Copy link

wanton7 commented Jun 15, 2020

My 2 cents. Our company would be using this for cache size limits and my understanding from reading this issues is what most would use it for. Not having way of calculating object size recursively would mean everyone would have to write their own code to loop through objects fields and call this GetExclusiveSize method for every one of those fields. Because strings are objects would this also mean strings wouldn't be included in this size calculation?

It's good to have exclusive way of calculating object size but API should have way to calculating object size plus size of all its referenced objects and what they reference and so on.

@leorik
Copy link

leorik commented Jun 15, 2020

@wanton7 Since most of the groundwork for size calculation would be done for exclusive size anyway, I don't see any issues exposing inclusive size API. Let me update proposal.

@BreyerW
Copy link

BreyerW commented Jun 15, 2020

@leorik reflection makes AOT much harder (or outright impossible where interpreter is not allowed) or app noticeably slower (reflection is known to be slow in general) and this also should be taken into account as drawbacks

@wanton7
Copy link

wanton7 commented Jun 15, 2020

Another minus from my limited understanding of reflection you need to do boxing for struct fields when using reflection. I would really like if that method that calculates objects size wouldn't do any heap allocations.

@john-h-k
Copy link
Contributor

john-h-k commented Jun 15, 2020

In this proposal I'd like to advocate for reflection-based approach.

While I like the API, reflection blocks AOT so unfortunately this app would be useless for me and at least one other I know who would be using this API

@danmoseley danmoseley modified the milestones: 5.0.0, Future Jul 31, 2020
@danmoseley
Copy link
Member Author

This is not necessary to ship 5.0. Moving milestone.

@videokojot
Copy link

Any updates, it would be really useful to use in our app, otherwise we need to traverse object graph by our selves + estimate the exclusive sizes based on platform we running in.

@AaronRobinsonMSFT
Copy link
Member

Based on these requests the problem doesn't seem to be object size per se but rather a collection that has an upper memory limit and when that limit is hit, objects are collected. Some sort of smart cache that one can place objects in and when the GC detects that memory limit is close, it can look in the collection and collect objects that are have a large inclusive memory foot print.

Is that accurate or is the memory that an arbitrary object references through its graph matter for another reason?

@cjstevenson
Copy link

It's largely from the current API of the .Net Memory Cache.

It's completely on the developer to set a size limit for the cache, and to determine a size for each cached item. This is currently infeasible to do for objects.

An alternative is storing serialized data instead of the raw objects. The serialized data would have a definite size, but has runtime overhead reading and writing to the cache.

@cklutz
Copy link

cklutz commented Jan 10, 2023

I stumbled across this issue while porting a .NET application to .NET Core, recognizing that System.Runtime.Caching.MemoryCache does not provide the approximate size anymore.

I thought to give it a shot and implemented a library that provides an in-application "ObjSize" like function. To those interested it can be found here.

I have written some unit tests that basically run ClrMD (on the test process itself) to compare what it would report as size to what my library calculates. However, test coverage is far from "complete", regarding possible types.

Most ideas and partly code has been taken from the runtime itself. And while I think I have learned quite some things about the topic, I must admit that I mostly work on evidence based resulsts (i.e. if ClrMD reports the same size as my library, I'm good so far). So if you even think about adopting this code/library for your own purposes, keep this in mind!

@jkotas
Copy link
Member

jkotas commented Jan 10, 2023

if you even think about adopting this code/library for your own purposes, keep this in mind!

This code depends on undocumented internal implementation details of the runtime. These implementation details are subject to change and differ between .NET runtimes. This code only works for current CoreCLR version, it is not guaranteed to work anywhere else.

@cklutz
Copy link

cklutz commented Jan 10, 2023

Yes, thank you. I‘m aware of that. It is meant as a „prototype“ really. For a how an in process solution could look like.

And to show how a unit test using a „debugger“ to validate results (which was discussed in a comment above) could look.

Anyway, I will make this clearer in the readme tomorrow.

If not asked too much, can you elaborate how it is possibly more dependent on internals or undocumented(?) behavior than, say, what ClrMD does?

I can fully acknowledge that the only robust way this could be pulled off, would have to be a solution from MS (that is updated when internals change) or if the CLR provides necessary APIs to built upon. I still thought an attempt at it provided some merit.

@jkotas
Copy link
Member

jkotas commented Jan 10, 2023

can you elaborate how it is possibly more dependent on internals or undocumented(?) behavior than, say, what ClrMD does?

Yes, it is similar. Current version of ClrMD is not expected to work on future runtime version. ClrMD typically requires updates for every new runtime version or variant.

@fabianoliver
Copy link

fabianoliver commented Nov 27, 2023

I would love to see something like this.

In my case, I'm working on a stateful distributed processing framework. Many real world pipelines will be bottlenecked by memory - think of eg lots of distributed table joins, where each operator needs to hold the full state of each input table in memory.

Now I'd love to have some way of displaying the approximate state size of each operation on each cluster, as that would be an extremely easy-to-observe piece of information to anyone designing a new pipeline where their major bottlenecks lie.
Sure, you can do that with a proper memory dump, but realistically, consolidating different dumps from different nodes, trying to figure out which state belongs to which operation and so forth .. that's not very suitable for quick iteration while whipping up new pipelines.

For this use case, it would be particularly useful to be able to get inclusive sizes - and ideally, incur relatively little computation overhead (which slightly speaks against eg a reflection-based approach). Re-using GC info from gen2 collections would work very well here. Having said that, any implementation could likely be made to work, even if there is a non-insignificant computation overhead.

In short: Would be amazing to have official support for this type of API. Being able to fetch inclusive memory sizes would be very beneficial. Other implementation/API details are slightly less critical to me personally

@Hotkey
Copy link

Hotkey commented Jan 14, 2024

@fabianoliver I totally agree!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests