-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: usize
definition should be refined
#5185
Comments
Counter proposal:
The maximum across all the address spaces. This way we can also keep the ptr-to-usize (and usize-to-ptr) relationship (with the help of the |
I thought about that, and it has one problem: It will waste a lof of space. This has two advantages: Otherwise i would waste 50% of my memory with zero padded bytes by storing pointers where i could've used a type only having half of the size. |
Added some changes and updates to the original proposal |
Since Zig supports arbitrarily sized integer types, each OS could define the bit length of their virtual memory system. This would reduce the well known 'pointer bloat' in the executable. |
would you have to change the default typing rules on certain arithmetic events? like would the following operations make sense? usize + usize OK |
No. I don't think this is something that is such a huge error source that it would be a benefit more than a hassle. Subtracting (and thus) adding values of |
Is there any defense for The purpose we're trying to imply is "size of a pointer" and "size of a datablock", so I would simply lean toward This would keep the number of types the same, while increasing functionality and clarity. |
There are OSes that treat pointers as signed. Solaris maybe? That's not a big market :-) |
Real Life Use Case? 🤔 A pointer being treated as signed or not will not affect pointer arithmetic. The only possible situation for signed pointers is some sort of special bit value, but aside from null, what would it be? Zig doesn't even support non-0 null. In OS's with signed pointers, we can just explicitly cast to the appropriate signed type to shuffle the |
Disclaimer: I don't think the following is a well thought out idea as it stands, but my brain's completely pulling out the stops and I would feel remiss to not write it. What if we dropped the idea of an intrinsic platform-specific pointer size and memory size altogether, and allowed platforms to define their memory spaces? Something like this. const avr = @import("avr");
const avr_code_pointer_size = @PointerSize(avr.code_memoryspace);
const avr_code_memory_size = @MemorySize(avr.code_memoryspace);
const avr_data_pointer_size = @PointerSize(avr.data_memoryspace);
const avr_data_memory_size = @MemorySize(avr.data_memoryspace); User-defined names are obviously a no-go for cross-platform code, but any code that uses multiple memoryspaces wouldn't be cross-platform anyway, and at worst, converting a snippet to a single memoryspace architecture would be a find-replace. Feels a bit like vkDevice. Zig intends to be very specific about allocations. This takes it one level deeper. |
There is sometimes the need to do pointer/type erasure. Primary use case would be anything OS-relevant (as in you're coding an OS). That's where you work a lot with pointers-as-numbers instead of actual memory slices.
Yes! Memory/pointer distances. You cannot express an object size delta with I like the idea of |
If one memory space could allocate 2^16-1 bytes and the other 2^24-1, using a 24 bit value for both could waste memory more often than not, depending on which address space is used most often? |
(Having heard the rationale for
|
Re: In practice, for some targets, it will be important for performance to use register-sized integers. But there are plenty of tools for tracking down performance problems, and Zig makes it easy to modify code in those locations to use faster integers. Tracking down a rare integer overflow that only happens in production on one platform is far more difficult. There might still be an argument for |
As you command: #7693 |
C has become much more explicit about address handling lately. If you have two objects and try to compare their pointers, it is not guaranteed that you will get a result you expect if you have a single, uniform address space. IIRC, it is implementation specific or maybe even UB. Addresses on many platforms do not behave like integers. They either have very specific wrapping behavior (x86-64 with its requirements on the upper address bits), or have non-unique representations (16-bit DOS with segments and offsets) etc. Here are a few examples:
Pointer arithmetic is hard to get right. @floopfloopfloopfloopfloop's proposal comes the closest on this, IMHO. Personally I think that Sorry for the rant |
Specifically re. A64, those top bits aren't wasted -- they're used for pointer verification on newer versions. I think this is actually a plus: you could only get a pointer which is invalid in this way by casting a random integer, or going wayyy out of bounds, and the machine should yell at you if you do that. Re. pointer comparison, if you want to compare pointers you can compare pointers -- IMO the compiler should be aware of memory segmentation and know how to check if pointers are really equal. No need to drag integers into it. |
Oh, they are not wasted! The original plan was to use those for tagged pointers as is often done in Smalltalk, Lua etc. More recently they are also used for memory check tags or whatever they are calling it these days. By contrast AMD decided to go the other way and make it more painful to do anything with "unused" address bits. I think the jury is still out about which one of these options was the better idea.
Think about walking a linked list looking for an element. Every single time you move to a new node, you need to change your new pointer into normalized form for comparison. Whether that is fabricating a 32-bit pointer from the segment and offset on a 16-bit x86 CPU or masking off bits on Arm CPUs, you'll need to do that. As I just recently caught up to the discussion on Discord where @MasterQ32 gave a very good example of the AVR series and the "joys" of asymmetric Harvard architectures, I tried to expand on the ideas he had: I wrote up more in a comment in issue #653. |
If we really don't want the top bits to be available, we can simply define Perhaps I wasn't clear: I think that pointer comparison should not be integer comparison of pointers; Zig has a stronger type system than C, we can distinguish pointers from integers. So, when comparing pointers, the compiler will know the platform and how to convert pointers into normal form or mask them off to compare them -- the user won't have to do that manually. This may mean that equality comparison becomes multiple machine instructions, but we're not aiming to be a macro assembler. |
C functions that return pointers often use "negative pointers" for error conditions, for example |
If Zig aims to support diverse platforms, including some yet-unknown future platforms, no false promise should be made that all pointers are equal and can be converted back-and-forth to an integer. usize is exactly such a promise. I strongly favor the idea by @floopfloopfloopfloopfloop that platform headers should provide some function like |
Slicing off the bits that we can ignore (e.g. bits 48:64 on x64) is a bad idea. Primarily because the "size" of a memory address can change at runtime. For example, if you've enabled paging but haven't set |
The C language also has the See `https://stackoverflow.com/questions/1464174/size-t-vs-uintptr-t for more details, especially the answer by Alex Martelli https://stackoverflow.com/a/1464194. If #1738 is approved, |
I came across the definition of
usize
, which is currently defined as unsigned pointer sized integer and a question arose: Size of what pointer? Function pointer? Pointer to constant data? Pointer to mutable data?For most platforms, the answer is simple: There is only one address space.
But as Zig tries to target all platforms, we should bear in mind that this is not true for all platforms.
Case Study:
Zig supports AVR at the moment which has two memory spaces:
Both memory spaces have different adressing modes which can be used with the
Z
register, which is a 16 bit register. Thus, we could concloud that the pointer size is 16 bit. But the AVR instruction set also has aRAMPZ
register that is prepended to theZ
register to extend the memory space to 24 bit. Some modern AVRs have more than 128k ROM (e.g. Mega2560). This means that the effective pointer size 24 bit.The same problem arises when targeting the 8086 CPU with segmentation. The actual pointer is a 20 bit value that is calculated by combining two 16 bit values (segment + offset).
Problem:
usize
communicates that it stores the size of something, not the address. Right now, usize can contain values larger than the biggest (continously) adressable object in the language and it takes up more space than needed.C has two distinct types for that reason:
size_t
(can store the size of an adressable object)uintptr_t
(can store any pointer)AVR-GCC solves the problem of 24 bit pointers by ignoring it and creates shims for functions that are linked beyond the 128k boundary. Data beyond the 64k boundary cannot be adressed and afaik LLVM has the same restriction. I don't think Zig should ignore such platform specifics and should be able to represent them correctly.
Proposal:
Redefine
usize
to be can store the size of any object or array and introduce a new typeupointer
that is pointer sized integer. Same forisize
andipointer
.It should also be discussed if a
upointer
will have a guaranteed unique representation or may be ambiguous ("storing a linear address or storing segment + descriptor")?Changes that should be made as well:
@ptrToInt
and@intToPtr
should now returnupointer
instead ofusize
@sizeOf
will still returnusize
Pro:
Con:
Example:
Note:
I'm not quite sure about all of this yet as this is a very special case that only affects some platforms whereas most platforms don't have the object size is not pointer size restriction.
Resources:
Edit: Included answer to the question of @LemonBoy, added pro/con discussion, added example
The text was updated successfully, but these errors were encountered: