Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does sizeof return? #54007

Open
LilithHafner opened this issue Apr 9, 2024 · 9 comments
Open

What does sizeof return? #54007

LilithHafner opened this issue Apr 9, 2024 · 9 comments
Labels
docs This change adds or pertains to documentation

Comments

@LilithHafner
Copy link
Member

From the docstring, for non DataType values,

the size, in bytes, of object obj

There are multiple ways of measuring size:

sizeof ∘ typeof returns the immediate size of an object (i.e. how many bytes it takes up)
Base.summarysize returns the total amount of referenced memory by an object.

sizeof(x) seems a little different & more domain/semantic specific. The default is Core.sizeof, which is not documented but appears to be sizeof ∘ typeof.

Some examples

julia> sizeof(falses(100))
16

julia> sizeof(fill(false, 100))
100

julia> sizeof("abc")
3

julia> sizeof("aβc")
4

julia> sizeof(['a', 'b', 'c'])
12

julia> sizeof(view(['a', 'b', 'c'], 1:2))
8

julia> sizeof(SubString("aβc", 2))
3

I'd like the answer to this question to appear in help?> sizeof.

@LilithHafner LilithHafner added the docs This change adds or pertains to documentation label Apr 9, 2024
@mbauman
Copy link
Member

mbauman commented Apr 9, 2024

The whole docstring is this:

sizeof(T::DataType)
sizeof(obj)

Size, in bytes, of the canonical binary representation of the given DataType T, if any.
Or the size, in bytes, of object obj if it is not a DataType.

Which seems a little confused. The two descriptions here actually seem backwards. In talking about what the size of a type is, it really seems like we shouldn't need to use any of the words "canonical" or "binary" or "representation," do we? It's just the number of bytes needed for the struct or primitive type.

It's when we're talking about the generic behaviors of arbitrary objects where the "canonical binary representation" seems to be both relevant and necessary. To be concrete, I think of sizeof as x->length(sprint(write, x)).

Cf. #12791 (comment)

@LilithHafner
Copy link
Member Author

julia> using Serialization

julia> sizeofs(x) = sizeof(x), length(sprint(serialize, x)), write(devnull, x)
sizeofs (generic function with 1 method)

julia> sizeofs(falses(100))
(16, 61, 16)

julia> sizeofs(fill(false, 100))
(100, 16, 100)

julia> sizeofs("abc")
(3, 13, 3)

julia> sizeofs("aβc")
(4, 13, 4)

julia> sizeofs(['a', 'b', 'c'])
(12, 24, 12)

julia> sizeofs(view(['a', 'b', 'c'], 1:2))
(8, 99, 8)

julia> sizeofs(SubString("aβc", 2))
(3, 37, 3)

@mbauman
Copy link
Member

mbauman commented Apr 9, 2024

I'm not sure what you're trying to say there, but yes, I initially wrote serialize and then edited to write. It's write's docstring that mirrors the "canonical binary representation" language. But if you look further down in #12791 you can see that it wasn't 100% clear back then, either.

Good thing the help for sizeof and write do not mention each other.

:)

@Seelengrab
Copy link
Contributor

To me, sizeof is "the number of contiguous bytes the object takes up in memory, without following pointers". That may or may not match write, if you consider e.g. padding!

@oscardssmith
Copy link
Member

oscardssmith commented Apr 9, 2024

I will say that this isn't just a user-level question. @vtjnash and I spent at least an hour trying to figure out what sizeof and Core.sizeof were supposed to return in the case of Memory and Array written on top of Memory. (and I don't remember what we ended up going with and am only ~60% sure we chose something reasonable).

@Seelengrab
Copy link
Contributor

Seelengrab commented Apr 9, 2024

My intuition tells me that x -> sizeof(x) == sizeof(typeof(x)) should always be true, but seems like that's not the case for Array anymore.. So is that true for Core.sizeof then, since it's not documented? Quite unexpected, why is there a difference between the two in the first place?

@oscardssmith
Copy link
Member

oscardssmith commented Apr 9, 2024

Yeah that would have been very sensible. Unfortunately, sizeof(rand(100)) returns 800 as far back as at least 1.0, and I think probably closer to ~0.1 so we decided that changing that behavior would have been very breaking.

@mbauman
Copy link
Member

mbauman commented Apr 9, 2024

at x -> sizeof(x) == sizeof(typeof(x)) should always be true, but seems like that's not the case for Array anymore

That's never been the case for Array, but Array is a case where it didn't use to have a Julia-visible struct backing it — you couldn't ask for sizeof(Vector{Int}) before 1.11.

(edit: oh, sorry for the duplicate content; I had a stale page and didn't see Oscar's response until I posted)

@Seelengrab
Copy link
Contributor

Ok, so are there any other exceptions to the intuition I posted above, other than Array? That is, is the contract of sizeof:

  • The number of bytes (with padding) the object takes up in memory, without recursion into fields
  • For Arrays, the number of bytes needed to reference all of the stored objects. (Note: this doesn't include the reserved capacity!)
  • For Strings, the number of bytes needed to store the contents of that String in memory. (Note: this doesn't include the stored length & possible trailing null)

FWIW, I don't think we'll get around having very low level details mentioned here. The meaning of sizeof is intrinsically linked to how objects are stored in Julia. Without those details, the docstring really isn't useful when you really need to know exactly what sizeof means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation
Projects
None yet
Development

No branches or pull requests

4 participants