Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constness inconsistencies in the standard library #9814

Open
jcmoyer opened this issue Sep 21, 2021 · 11 comments
Open

Constness inconsistencies in the standard library #9814

jcmoyer opened this issue Sep 21, 2021 · 11 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@jcmoyer
Copy link
Contributor

jcmoyer commented Sep 21, 2021

Reading the standard library is, in my opinion, the best way to learn Zig currently. However, I have noticed that a couple similar APIs differ in how self is passed to methods. This has caused me significant confusion when it comes to designing my own APIs, because it's not clear whether this is a semantic choice, a practical one, or neither.

In my mind there are two types of state:

  • Physical: the actual bitwise representation of an object
  • Logical: observable properties of an object

Zig's notion of const only enforces physical immutability; to achieve the latter you must design it into your API.

Consider:

  • std.fs.Dir.close(self: *Dir): closes the fd and sets self.* = undefined. This physically and logically mutates self.
  • std.fs.File.close(self: File): closes the fd and does nothing else. This is physically immutable, but logically mutates self.

Both of these functions do essentially the same thing: the fd is closed and the object goes into an unusable state afterwards, which implies mutation. File.close does not communicate this to the caller, but Dir.close does. Should it? One might argue that for a well-designed API, if any observable property of an object changes, you should pass *T, and T/*const T otherwise. It signals an intent to callers and can reduce the likelihood of API misuse. For example, you can freely pass around Files and know that they won't suddenly be closed in a far away part of your codebase (though I admit you cannot guarantee this in Zig.)

Designing around logical state in Zig is hard. For one, we cannot ensure that a Dir or File isn't copied, so it doesn't matter whether we have a *File or a File: closing the fd closes it for all Files that share that fd. Another problem is that there's no way to prevent consumers of an API from inspecting changes in object state, since all struct fields are public. Since we cannot make guarantees about logical state, it's reasonable to concede that it's simply more practical to pass T even for operations that logically mutate an object.

Regardless of which approach is correct, the current state of things causes some differences in usability. Dir cannot be closed as easily:

fn processDir(d: Dir) void {
    d.close(); // can't do this, because d is immutable
}
fn processDirPtr(d: *Dir) void {
    d.close(); // this is fine, close() is meant to work on *Dir
}
fn processDirConstPtr(d: *const Dir) void {
    d.close(); // can't do this for the same reason as processDir
}

fn processFile(f: File) void {
    f.close(); // this is fine, because close() works on immutable Files
}
fn processFilePtr(f: *File) void {
    f.close(); // this is fine, mutable values are always usable in immutable contexts
}
fn processFileConstPtr(f: *const File) void {
    f.close(); // this is fine for the same reason as processFile
}

I have not scanned the stdlib to find every example of this, however I have noticed that ArrayListAligned and ArrayListAlignedUnmanaged have the same by-value/by-pointer difference in their deinit methods.

EDIT: Also consider also the issue of const propagation. Imagine that you have a type that contains an ArrayList, and you depend on ArrayList.deinit() taking itself by-value (so your type's deinit() method also takes itself by-value). It is entirely reasonable that at some point, you may want to change your type to use ArrayListUnmanaged internally. Suddenly, your deinit() method also needs to take self by-mutable-pointer. This propagates through your codebase; everywhere you're calling MyType.deinit() you need to make sure you have a non-const MyType. Types that contain MyType and deinit by-value will now also need to deinit by-mutable-pointer, which propagates to types containing those types and so on. You can see how this can quickly become a maintenance nightmare.

Should these APIs be consistent in the way they take self, and if so, which style is preferred?

@andrewrk andrewrk added the standard library This issue involves writing Zig code for the standard library. label Nov 20, 2021
@andrewrk andrewrk added this to the 0.10.0 milestone Nov 20, 2021
@iacore
Copy link
Contributor

iacore commented Dec 23, 2021

Related to #1629

@marler8997
Copy link
Contributor

marler8997 commented Dec 23, 2021

I made a proposal that deinit(would also apply to close) never invalidate here: #6322 but it was rejected. So the current answer to your question is that definit/close should always invalidate, so File.close is currently using the wrong convention.

However I'm not convinced this is the right course of action yet. My argument for this is that allowing types to be const makes code simpler. It avoids the problems with const propgation, reduces cognitive load on mutable state and makes it more clear what the object is doing without having to read its source, etc. I argue these make the code more "readable". The counter-argument is that by requiring all instances to be mutable, the code can always invalidate during cleanup and will catch more bugs at runtime. So what's more important, code readability or always having these runtime checks? I also argue that in 90% of cases cleanup is done inside a defer block, which makes the invalidation much less useful since you can't access the data anyway. In the cases where invalidation is useful, developers still have the option to make a mutable version that does invalidate. The argument against this is that this requires 2 ways to cleanup an object, for example you would need both a close and a closeConst which adds complexity.

Andrew has decided that losing the ability to use const is less important than gaining the ability to invalidate all instances of an object. However, this would also imply that some of the std library needs to change, for example the Allocator interface would need to modify the free function to take a mutable reference to a slice. This would mean you could no longer do this:

const slice = allocator.alloc(u8, 100);
defer allocator.free(slice);

You would now have to do this:

var slice = allocator.alloc(u8, 100);
defer allocator.free(&slice);

I've asked Andrew to clarify if I understand his reasoning and whether this implication is consistent with it: #10213 (comment)

@marler8997
Copy link
Contributor

marler8997 commented Dec 23, 2021

Quick follow up. The ArrayList type is different from File and Dir because it pretty much needs to be mutable in order to work. File and Dir are handles to state that is kept in the kernel, the handles themselves never change. So I have no problem with ArrayList using self: *ArrayList in its cleanup unlike File and Dir. I consider the issue of const propogation when it comes to ArrayList a non-issue, since including an ArrayList inside a const object is almost certainly a bug. I suppose this is consistent with what @SpexGuy said about it making sense for some types to require mutability and others not.

@jcmoyer
Copy link
Contributor Author

jcmoyer commented Dec 27, 2021

@marler8997 I've been following those issues as I've seen them pop up, and I am in agreement with Andrew on File/Dir in particular, though maybe not for the same reasons. It seems there is a more fundamental disagreement about whether or not a type that has a pointer (or more generally any sort of handle) to an external resource should be able to mutate that resource when it itself is const. When I brought this up on IRC a couple people seemed to not understand why you shouldn't be able to do this.

My argument for this is that allowing types to be const makes code simpler. It avoids the problems with const propgation, reduces cognitive load on mutable state and makes it more clear what the object is doing without having to read its source, etc. I argue these make the code more "readable".

I agree with you in theory, but zig currently does not make strong guarantees that const actually means const and you have to read the source anyways to be absolutely sure. As soon as you involve pointers, all bets are off: suddenly a method taking self: T can change program state observable through self. You can even create very poorly behaved types that break the guarantees const does make, for example:

const T = struct {
    counter: u32,
    ptr: *u32,
    fn create(self: *T) void {
        self.counter = 0;
        self.ptr = &self.counter;
    }
    fn bump(self: T) u32 {
        self.ptr.* += 1;
        return self.ptr.*;
    }
};

var x: T = undefined;
T.create(&x);
const y = x;
assert(y.counter == 0);
y.bump(); // uh oh this was supposed to be const
assert(y.counter == 1);

While I don't think this is a particularly motivating example yet, it will be very easy if #2765 is implemented since you'll be able to skip var and go straight to const.

Even without such a contrived example, I would still make the argument that types like ArrayList should not allow you to mutate their elements through a const ArrayList for reasons outlined in the original post.

I also argue that in 90% of cases cleanup is done inside a defer block, which makes the invalidation much less useful since you can't access the data anyway.

Pointer aliasing does allow you to observe the invalidation elsewhere, something I've run into while writing multithreaded code. But this is probably a good point otherwise, since programs generally shouldn't be designed in a way that allows you to access a destroyed resource.

The argument against this is that this requires 2 ways to cleanup an object, for example you would need both a close and a closeConst which adds complexity.

I strongly agree. This is something that should be avoided as much as possible IMO. There are already types that make this sort of distinction, e.g. BoundedArray has two slicing methods (ironically because self constness propagates to struct members, and in this case the slice isn't behind a pointer)

The ArrayList type is different from File and Dir because it pretty much needs to be mutable in order to work.

This is true for most ArrayList operations, but deinit in particular doesn't technically require the ArrayList to be mutable, because Allocator will happily free through a const pointer to a buffer. The only reason you'd make it mutable is to invalidate the ArrayList, which I think is a good thing, but it's only done for one of the two ArrayList implementations.

File and Dir are handles to state that is kept in the kernel, the handles themselves never change.

True, but my issue is that methods taking const self allow observable state to change. I would make the argument that reading from a file should be done through *File because the operation advances a position somewhere, so it is technically a mutating operation.

I consider the issue of const propogation when it comes to ArrayList a non-issue, since including an ArrayList inside a const object is almost certainly a bug.

Maybe. But I'm more generally interested in what (if any) semantic meaning const is intended to have.

@iacore
Copy link
Contributor

iacore commented Dec 27, 2021

The semantic meaning of const is Zig is the memory region of an object will not change. Every "thing" in Zig has a size (@sizeOf), and that's the size of its representation in memory.

@marler8997
Copy link
Contributor

marler8997 commented Dec 27, 2021

I disagree with the idea that a handle should be mutable because the state that the handle represents is mutable. We don't do this with pointers so why do we do this with handles? This would make sense if const was transitive in Zig, maybe Zig should consider that but having used languages with (Dlang) and without I prefer without (it's more explicit). Extending this idea would also mean that all array indices that you use to index mutable arrays should also be mutable. This seems ridiculous to me.

var i: usize = 10;
array[i] = 100;

Can you imagine writing a function that takes an index to an array and taking a mutable pointer to it, not because you are writing to the index but because the array is mutable?

fn modifySlice(slice: []u8, index: *usize) void {
    slice[index.*] = 100;
}

Furthermore, regardless of whether a handle represents a "mutable object", the mutability of the handle itself has a semantic difference. Having a single handle value that can change is very different from one that can't, and that mutability objectively adds to the cognitve burden when working with that value. In Zig const is not transitive, using a convention to pretend it is only makes code more confusing in my experience.

@jcmoyer
Copy link
Contributor Author

jcmoyer commented Dec 28, 2021

The semantic meaning of const is Zig is the memory region of an object will not change. Every "thing" in Zig has a size (@sizeof), and that's the size of its representation in memory.

Does this meaning also extend to C APIs that use opaque handles? I've seen it done both ways, e.g. std.c.fread takes *FILE like C, but src/codegen/llvm/bindings.zig has a bunch of examples of functions that definitely mutate the memory region behind the pointer yet take *const @This(). The constness of self here also differs from the original C API.

I disagree with the idea that a handle should be mutable because the state that the handle represents is mutable. We don't do this with pointers so why do we do this with handles?

The reason I opened this issue is because there are cases where it appears zig does this with pointers, but perhaps I read too much into it and there's not anything more to the inconsistency than "sometimes we invalidate objects when destroyed, sometimes we don't."

Extending this idea would also mean that all array indices that you use to index mutable arrays should also be mutable. This seems ridiculous to me.

I draw the line at handles that manage a resource. An index is just an offset into a memory resource managed by a pointer, so it doesn't need to be mutable itself.

In Zig const is not transitive, using a convention to pretend it is only makes code more confusing in my experience.

Like I mentioned above, I'm more confused by the fact that I can have an apparently "const" File, call a function that says it takes a const File, and after the function returns, the File's position has advanced 100 bytes. That is not const in the way that I understand it, and IMO it is it way harder to reason about what my program is doing when this can happen anywhere I have a const object.

@marler8997
Copy link
Contributor

marler8997 commented Dec 28, 2021

I'm not sure how you can draw a line between array indices to arrays of objects and handles in this regard. It would be like drawing a line between an "array index" and an "integer", array indices are integers. Array indices are one of the most common form of handle implementation, it's what linux uses for file descriptors and what Andrew has suggested people use when replacing pointers with DOD-style array indices.

In fact I argue that handles, pointers and array indices are all equivalent in that they are all values that represent objects that "live elsewhere". The argument being made here is that if the object these values represent is mutable, then the value should also be mutable. The problems this causes are somewhat mitigated when you consider a handle that lives inside a struct, because you can modify the self parameter to be mutable without needing to modify the call syntax (i.e. file.deinit()). However, this becomes less reasonable when you consider allocator.free taking a mutable reference to a slice (still makes some sense because it allows invalidation), but turns ridiculous when you consider my modifySlice function in my previous comment (serves no function except to conform to your criteria that values representing mutable objects should also be mutable). However, all these cases satisfy the same condition that they are values representing mutable objects. If that's the only justification needed to require the value be mutable, then we have to apply this to all cases equally.

This being said, array indices, pointers and handles are obviously not identical in all respects. It's true that they are all values representing objects that live elsewhere, but they also have differences. So what I'm looking for is, what reason(s) that are unique to handles (don't apply to pointers and array indices) justify requiring that handles be mutable if the underlying object has mutable state (like a file handle)?

@jcmoyer
Copy link
Contributor Author

jcmoyer commented Dec 29, 2021

In fact I argue that handles, pointers and array indices are all equivalent in that they are all values that represent objects that "live elsewhere".

I generally agree with this statement, but it is a somewhat simplified model. I would take it even further and make the claim that handles are the abstraction over pointers, indices, etc.

First of all let me define what I mean when I use these terms:

  • A handle is some identifier that allows you to access a resource when paired with some context.
    • Examples: pointer to dynamically allocated memory, file descriptor, thread handle
  • A resource is anything that requires initialization and deinitialization. This isn't strictly important for the purposes of this discussion, but things that have a clear start- and end-of-life tend to be managed through handles.
    • Examples: dynamically allocated memory (malloc/free), File (open/close), Thread (spawn/join)

Pointers are handles, file descriptors are handles, etc. But the key difference is that pointers are implicitly paired with their context (memory) and once you have one, you can access the resource the pointer refers to freely. Other handles do not have this property. For example an array index is useless without a base pointer, a file descriptor is useless without an OS coordinating access to the filesystem, and so on.

Because pointers can be freely accessed, it makes sense to put a const qualifier on the pointee to control the level of access to the memory region beyond the pointer. With other handles, this is more difficult. Sometimes you can put a const qualifier on the resource itself (like with an array or slice), or maybe you have some abstraction over the resource as is the case with File where your only option is to mark mutating methods somehow. This is subtly different from your suggestion that, in my model, indices be necessarily passed by pointer - doing that does not actually restrict access to the underlying resource, which is the goal.

Anyways bringing this back from the abstract and into Zig-land... I admit there is not a clean 1:1 mapping of this idea. File carries a handle while simultaneously abstracting over the OS-specific context. It is similar to pointers the way I described above: the pairing is implicit, because the abstraction has conflated the two. But unlike pointers, there is currently no way to actually restrict access to the actual resource a File represents. I think ideally there should be some way to accomplish that. Hopefully this explains my position a bit better.

@Hejsil
Copy link
Contributor

Hejsil commented Jan 5, 2022

I'm also in the camp of deinit and the like should not take a mutable pointer. It disallows you creating some complicated structure in a function and then once you are done modifying it, you return it, and the caller can now make it const, indicating that "No, this is no longer gonna change, plz thanks".

However, that doesn't actually prevent us from invalidating in deinit, if we wonna be a little unsafe about it :)

fn deinit(self: *const @This()) void {
    self.allocator.free(self.items);
    discardConst(self).* = undefined;
}

This nearly gets us the best of both words. Nearly all structures, that you can deinit have to be stored in memory you are allowed to mutate, so discarding const is "actually safe™". I can only think of one scenario where this is actually a problem:

const stdout: fs.File = ...;

fn main() void {
    // Do all the io
   
    // I am now done with all the io. I can close stdout to communicate with the receiving process that I am done.
    stdout.close();

    // Do the rest
}

Here, if stdout.close invalidates with discardConst we probably crash the program, but this also shows that with close being mutable you cannot do this either (but at least you get a compiler error).

@andrewrk
Copy link
Member

andrewrk commented Jan 5, 2022

I'm not entirely confident either way and I'm willing to walk back the current convention, which is leaning towards mutable self pointer for deinit. I do think it makes a lot of sense in a lot of cases but I can see the value in e.g. std lib File being immutable.

Note that for File in particular, we might want mutability for event loop purposes but that remains to be seen for certain.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

5 participants