-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should Base.summarysize
include alignment?
#32881
Comments
|
Didn't we at some point explicitly state in the docs that it's a good-effort rough estimate and should not be relied upon? Or perhaps that was just a Jameson comment on GitHub that never got incorporated into the docs. In any case, it seems like we can improve the good-effort to best-effort in some places, and perhaps for the others we should document that it'll never be 100% perfect for the remainder. |
This made me think of a different, more precise way to implement this functionality. Instead of using heuristics to try to guess which memory belongs to which object, track an ordered set of non-overlapping, non-adjacent intervals of memory which are in use. When a new bit of memory is added to the data structure, it is either merged with one or more existing intervals that it touches, or it is inserted as a new, isolated interval in the structure. Then at the end, the total memory reached is the sum of the lengths of the intervals. This gives the precise amount of memory that is reachable from any set of objects. It also handles the recursion aspect: you recur into references only if they point into memory that is not already in some interval. |
Yes, it's a rough estimate and shouldn't be relied on for anything serious. There are some real reasons for that, for example that whether immutable objects are copied is subject to change. But we can always improve it to be as accurate as reasonably possible. Another fundamental issue is that we can't really follow the user's mental model of object ownership. For example say X points to A and B, and Y points to B and C. 3 objects are reachable from each of X and Y, but there are only 5 objects total. And perhaps B "belongs" to X in some semantic sense, but Y's reference is incidental. A somewhat real example is:
We can tell you how big the representation of the |
Well, it would be useful to have a function that computes the size using total reachability of data objects at least (types, functions, modules I understand why it would be hard to count). It is today quite hard to figure out how much memory your object uses in the end and how much memory pressure could be saved by doing for example some memoization or caching. Total number of allocated bytes using @time I find a better estimate than summarysize.
I would have sort of expected that Int was built in and constant (and thus have size 0). But if it was of size 176, I would have expected the |
Yes, that is expected due to counting unique objects as you said, and also because the summarysize code is reused for all types and not specialized. Computing it very quickly would not be worth the compilation overhead.
Yes, that number should be quite accurate, but measures something totally different, including garbage objects. If you like that number, you wouldn't want Modules seem to be a different case. It can be useful to do But I'm not arguing --- we should make these improvements to summarysize. |
- 0-field mutable structs take 1 word - include alignment in object sizes - take uniqueness into account for Strings - include union selector bytes for Arrays
- 0-field mutable structs take 1 word - take uniqueness into account for Strings - include union selector bytes for Arrays
Mostly fixed by #32886, but repurposing to discuss whether alignment/gc overhead should be included. |
Base.summarysize
include alignment?
Base.summarysize
seems inconsistent and feels like the return value cannot be relied upon. The documentation also says different things than the actual implementation.It says it will compute the memory used by unique objects.
However, strings are not checked for uniqueness (I would expect it to find references that are identical, not identical content):
Then it says it will compute the memory used by objects, and as far as I know memory allocations are aligned so I would expect all sizes to be at least multiples of 4, but the string "a" has a size of 9:
Then we have the fact that sizeof() is used to calculate sizes of each field, or in arrays. An array with 100 elements that is a union of 3 different types all with sizeof=0 will not consume 0 memory:
Wouldn't it make sense to compute this in a C function where the user cannot overload sizeof operators, etc and actually return the size that the data consumes in memory (including overhead)? The garbage collector should know the sizes of all objects after all. A function like that should give the user a much better hint of how much memory an object consumes.
I tested the code in
and
The text was updated successfully, but these errors were encountered: