-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Something is wrong with arrays of weird-length primitives #26026
Comments
Is it because Vector has a layout that matches C layout which requires aligments to be a power of 2, so odd element lengths get padded. |
Yes, something definitely gets confused about alignment.
So we see that Int24 requires an alignment of 4 bytes and has a size of 3. This means that it should have size 4 in isolation / in arrays, and size 3 when packing structs. Since weird-size primitives are ultimately only good for alignment/size hacking of structs, I kinda would prefer more explicit placement control for structs (optional syntax to just specify what goes where) and deprecation of |
Why is deprecation of |
It is not necessary, but might be convenient. I don't know how to bootstrap instead; use builtins? Just don't export I mean: What can users possibly do with The only example I see is foo2 given above: The struct wastes no padding and allows aligned access to x. Do you see other advantages? Disadvantage of primitive type: Complexity in codegen. As long as user-definable primitive types of weird length exist, codegen needs to cope with weird cases where the size in isolation (4 bytes) differs from the size in structs (3 bytes, but 4-byte-aligned). The simplest possible fix would be to just not export the creation of primitive types; then codegen only has to deal correctly with primitive types that actually exist in base ("it's a feature, not a bug"). As an apology to bit-fiddlers one could allow more low-level struct definitions; I think these could replace all current uses of user-defined primitive types. |
Chips generally have loads for power of two sizes and load power of two aligned data faster than unaligned data, and in some cases require alignment, so members of composites (in all languages, not just Julia) tend to be aligned by default. Some languages do allow optional packing on unaligned boundaries, but its never the default for those performance reasons. So any composite that is sharable with C/C++/Fortran must be aligned and AFAICT Julia does not support packed composites anyway. Alignment is rounded up to the next power of two. So in a composite of Int8s each one is on a power of two boundary (1) and in a composite an Int24 is padded to a power of two boundary (4). Note that although an object of Int24 has a size of 3 it will always be allocated a power of two aligned space in the heap or stack, and so will the next object in memory, so your Int24 still use (at least) 4 bytes, its just you can't see the padding. As for complicating codegen, the way it is its always aligned, so its actually simpler since it matches the chips. |
Discussion on https://discourse.julialang.org/t/odd-byte-length-primitive-types-and-reinterpret/9025/3.
Reproduce:
So we see that the
arrayref
behaves like Int24 had size 4 (tree bytes plus one padding). The same happens for pointerref:Of course that means that we see oob memory and corrupt on writes:
Tested both on 0.62 and current master 0.7.
The text was updated successfully, but these errors were encountered: