-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pointer Reform #770
Comments
If possible I would like to have some other symbol/keyword for:
This is because ^ isn't ergonomic on all keyboard layouts. On Windows with some international/non-English keyboard layouts you have to type it twice to get ^^ and then you have to remove the extra every time you write it. See this superuser question |
That's good to know. Do you have a suggestion for what other symbol to use? |
To be honest I have only few possibly reasonable ideas as most commonly used or reasonable sigils have been used for something in Zig already.
|
Is the new array syntax flexibly like multiply? Can I do these?
Or is it restricted to
|
@Hejsil great questions. I think that new array syntax is no good because of this. But we have to make something different than [N]T to distinguish from pointers. |
Perhaps I expect people will mistakenly declare their structs with |
I agree with @Ilariel, my keyboard is one of these (Portuguese) - If you press the I also liked @Ilariel's 2nd and 3rd suggestions. I don't dislike the idea of a ref/ptr keyword, but I find &?&&?T more readable than ref ? ref ref ? T for types. I think I'd prefer to keep the & as a reference type and instead use a builtin or an operator like As for the array syntax, would
@thejoshwolfe made a good point about
It communicates that you're creating a value type which is the result of putting N units of that value type together. If you allow
|
We can't use const State = 256**u8; // formerly known as [256]u8
const States = 4 ** State; // is this [1024]u8 or [4][256]u8? |
I thought the both the
Though I can see how this isn't ideal. EDIT Just to consolidating my 2 cents to the pointer-to discussion after a bit of thought. In C you have In Zig, the 'pointer to exactly one thing' is closer to a C++ reference than a C pointer, so it would make sense to stay close to their interface, and use a builtin @addressof function. |
So it is documented and can be discussed here, we came up a possible solution in IRC:
Advantages:
Disadvantages:
So if we went this route we'd need a new dereferencing operator. Personally I favor that anyway since prefix
One thought I had, and I realize this is a bit strange but hear me out, is postfix
so all we'd be doing is extending this property to non-structs, really.
Advantages:
Disadvantages:
So then we'd need new range operators too. Other options for deref:
|
Why not
Instead of this concept, introduce
Zig's grammar depends on knowing if we are at the end of an expression or in the middle of an operator. This means we can't have postfix operators that are identical to infix operators. Here's an example of the ambiguity using // this is ambiguous
const a = b^(1);
// b is a pointer to a function (or double pointer to a function),
// which is being called and given the parameter 1.
const a = (b^)(1);
// b is some integer being xor'ed with 1.
const a = b ^ 1; We absolutely cannot have ambiguity between infix and postfix operators. This means There is no problem with ambiguity of infix and prefix operators though, such as with
This actually does not suffer too horribly from the above ambiguity concern, because
I'm not too concerned about prefix vs postfix, since this language (and pretty much every other language) has operator precedences that will get weird sometimes like this. Instead of moving operators around, I would propose requiring parentheses sometimes, if the confusion is bad enough. See #114. The biggest problem I have with |
Ignoring the syntax, I am really happy with the idea of separating pointer-to-1 from pointer-to-unknown-quantity and so on. (Sad that caret doesn't work easily.) |
We could have a postfix
It doesn't imply indexing, no ambiguities with range syntax, analogous to fields in structs: get value at field vs get value at address. |
I don't think the postfix chaining argument, e.g. I think |
(I'm writing this all up in a GitHub issue, but this is intended to go into the documentation somewhere.) Note: In the following discussion, there is sometimes a distinction between the length of an array and the This proposal does not introduce any new tokens, which means for example that TypesHere
DereferencingFor a pointer This operator is implied in the following contexts:
These implicit dereferences do not apply to an expression that is the result of applying one of these implicit dereference rules. For example, Pointer ArithmeticFor a pointer
The following infix operators are allowed, but are not precisely defined here. Informally, these operations are defined similar to C, where
Pointer subtraction is also allowed in some cases. Given
Implicit CastingTODO |
@thejoshwolfe, you had slightly different syntax a couple of days ago. At the risk of bike shedding, it seems like the
So if If you want to make pointers to single objects a supported kind of thing in the language, perhaps make them act like transparent references:
C++ has moved to this separation of references and pointers (even though we all know that under the hood a reference is syntactic sugar around a pointer!) and it makes a lot of code cleaner. Think about functions that take or return references. This way, there is no dereference at all for an alias/reference. Then you can use On a more frivolous note. Here are some other ideas for the various kinds of arrays.
The first one is identical to a C99 dynamic array, the second to a normal C array and the third can be used for C strings. So then you get:
Apart from the frivolity, I really like the idea of having a pointer to one object (otherwise known as a reference or alias in other languages) and a pointer on which arithmetic can be done. This is really nice! |
@thejoshwolfe the proposal looks great, though I almost thought the While we're discussing different kinds of arrays, what do you think of an enum array? It has a length equal to the member count of the enum and can only be indexed with an enum value.
You can get close to this with status quo Zig by specifying the tag type and casting
But then if you change the number of elements, the backing type of the enum or override the values, you need to change a lot of code. And you could still access it with arbitrary integers, so if at any point the index into the array was hardcoded, it would have to be found. An enum array basically becomes a comptime-checked map! It could be approximately implemented in userland with something like this if we had a memberIndex built-in or something:
|
I broke that out into its own issue: #793 |
Does the null/0 have to be at index N in that case? C strings are stored in fixed length array but the string length can vary, it is not necessarily equal to the array size. The same applies to null-terminated C arrays. |
I usually wouldn't comment when I don't have competence in the area, in this case pointers, but I feel compelled to share my abstract thoughts. I hope one of these brainstorming ideas could either be or lead to useful ideas:
T is also where you could put your brackets if you need to declare the type as an array Speaking of arrays, something like |
See #770 To help automatically translate code, see the zig-fmt-pointer-reform-2 branch. This will convert all & into *. Due to the syntax ambiguity (which is why we are making this change), even address-of & will turn into *, so you'll have to manually fix thes instances. You will be guaranteed to get compile errors for them - expected 'type', found 'foo'
See #770 Currently it does not have any different behavior than `*` but it is now recommended to use `[*]` for unknown length pointers to be future-proof. Instead of [ * ] being separate tokens as the proposal suggested, this commit implements `[*]` as a single token.
|
I just pushed 96164ce which disables indexing for single-item pointers and enables pointer arithmetic for unknown length pointers. |
EDIT
&
only used for address-of, no longer designates a pointer type. Necessary because of #588^
pointer to exactly 1 thing.[*]
pointer to a block of memory of unknown length[*]null
pointer to block of memory, null-terminated (or 0 terminated for integers). proposal: type for null terminated pointer #265[]
pointer to a block of memory with runtime known length. status quo slices.[]null
pointer to a block of memory with runtime known length, with a null/0 at ptr[len][N]
pointer to a block of memory with comptime known length[N]null
pointer to a block of memory with comptime known length, and a null/0 at ptr[N]All of them support pointer indexing and slicing except
^
. Only[*]
supports pointer arithmetic. All of them implicitly cast to[*]
.[]null
and[N]null
implicitly cast to[*]null
.&ptr[x]
and&foo
always gives a^
.ptr[x..y]
with comptime known x and y gives a[N]
.array[x..]
gives a[N]
.new array syntax
Now it is clear whether you should do
&array
or&array[0]
. Don't use&array
. If you want a[N]T
, e.g. a pointer with comptime known length, usearray[0..]
. If the function wants to access more than one element, you'll do this. Otherwise,&array[0]
, will give^T
, which would trigger a compile error if the array was length 0, and only this element can be accessed via this pointer.This paves the way for #733
See also #386
See also #568
The text was updated successfully, but these errors were encountered: