Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove fooAbsolute functions from std.fs #16736

Closed
squeek502 opened this issue Aug 8, 2023 · 11 comments
Closed

Remove fooAbsolute functions from std.fs #16736

squeek502 opened this issue Aug 8, 2023 · 11 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@squeek502
Copy link
Collaborator

squeek502 commented Aug 8, 2023

This is a proposal regarding the Absolute suffixed functions in the std.fs namespace, see https://ziglang.org/documentation/master/std/#A;std:fs

First, here's the 'soft' version of this proposal:

Remove any fooAbsolute functions that are purely a wrapper around Dir.foo

Many of the fooAbsolute functions are purely wrappers around the relevant Dir function. For example, here's openDirAbsolute:

zig/lib/std/fs.zig

Lines 2711 to 2714 in c6e2e1a

pub fn openDirAbsolute(absolute_path: []const u8, flags: Dir.OpenDirOptions) File.OpenError!Dir {
assert(path.isAbsolute(absolute_path));
return cwd().openDir(absolute_path, flags);
}

The only thing these functions add are assertions that the path is absolute, but those can easily be done by the user at the callsite instead.

Note also that pretty much all Absolute functions have poor test coverage to the point that some will give a compile error if anyone tried to actually use them (e.g. on Windows anything that tries to call the no-longer-existing os.windows.cStrToWin32PrefixedFileW function [these functions have never worked since their introduction in #5879 which ended up removing cStrToWin32PrefixedFileW before it was merged but left some calls of it around])

I believe the above should be fairly uncontroversial.

Now, for the real proposal:

Remove all fooAbsolute functions from std.fs

This is something of a counter-proposal to #7540

This proposal has two parts:

  1. All Dir APIs should be able to handle both relative and absolute paths (this is already the case AFAIK)
  2. All non-Dir APIs should be removed in favor of Dir-based APIs

This is something that std.fs has been moving towards slowly, but it's never been formalized and so the current std.fs is a bit confused (and can be confusing for those new to Zig, since the most libc-like APIs are the fooAbsolute ones). Removing the Absolute APIs would allow Zig to communicate the design of the fs API more clearly and lead more people to write code/libraries that get the benefits of the Dir-based API for free.

Why Dir-based for everything?

My understanding of the reasoning for the Dir-based design is the following:

1. For "better general protection against Time Of Check, Time Of Use bugs across all Zig codebases"

From #3813, where things being more Dir-based was initiated:

It should now be clear the pattern that the standard library file system API is approaching. The API encourages programmers to take advantage of directory traversal via directory handles, for better general protection against Time Of Check, Time Of Use bugs across all Zig codebases.

Here's an example:

pub fn main() !void {
    var c_dir = try std.fs.cwd().makeOpenPath("a/b/c", .{});
    defer c_dir.close();

    // Imagine this happens from some other thread or process
    try std.fs.cwd().rename("a", "a_moved");

    // Because we use `c_dir` here, this will still work
    // and end up writing to the path `a_moved/b/c/some_file`
    try c_dir.writeFile("some_file", "hello");
}

With a non-Dir-based API, the above might instead be written as something like:

pub fn main() !void {
    try std.fs.makePath("a/b/c", .{});

    // Imagine this happens from some other thread or process
    try std.fs.rename("a", "a_moved");

    // This will now fail, since the `a` directory has
    // been moved but we assume it hasn't been.
    try std.fs.writeFile("a/b/c/some_file", "hello");
}

2. For better cross-platform compatibility by default

(my understanding is that this is mostly just a nice side-benefit, and wasn't originally a reason for the fs API design)

Platforms like WASI don't have the concept of an absolute path, so any API without an explicit Dir would exclude that code from being able to target WASI (or require a decent amount of compatibility code to infer a Dir from any given path when targeting WASI). From https://github.com/WebAssembly/wasi-filesystem:

WASI filesystems' APIs are passed a directory handle along with a path, and the path is looked up relative to the given handle, and sandboxed to be resolved within that directory

With the Zig filesystem API being designed to encourage the same thing (every filesystem function having an associated Dir), Zig code will be able to essentially target WASI by default without needing to do some of the contortions that something like wasi-libc has to do to support the 'normal' path-based APIs.

Note: WASI also doesn't have the concept of a cwd. Currently, std.fs.cwd() when targeting WASI will treat the first preopen as the cwd by default, but that behavior can be changed by supplying a custom wasiCwd function via a std_options struct in the root source file.


If the strong version of this proposal is accepted, then the only top-level functions in std.fs that would remain would be:

  • atomicSymLink, although it seems very possible this could/should be made into a Dir function
  • cwd
  • getAppDataDir
  • openSelfExe
  • rename/renameW/renameZ (these use Dirs)
  • selfExeDirPath/selfExeDirPathAlloc
  • selfExePath/selfExePathAlloc/selfExePathW

See #16743 for how removing the Absolute suffixed std.fs functions would affect the Zig codebase.

EDIT: In making the changes in that PR, I came across another reason why the Absolute versions aren't very useful: the Absolute terminology is overloaded and it's hard/impossible to tell why the Absolute variant is being used--is it being used just because the programmer knows the path happens to be absolute, or is it being used because the programmer knows the path must be absolute for the code to work correctly? Removing the Absolute variation will remove this ambiguity since the programmer will be forced to make their intentions more explicit if they want to assert/check that a given path is absolute.

@rohlem
Copy link
Contributor

rohlem commented Aug 8, 2023

In my eyes the most useful path forward would be 3 variants: One which supports any path (checks isAbsolute to decide on systems where that is necessary), one -Relative, and one -Absolute (being disabled / a compile error on systems which don't support it, like WASI).
For "simple wrappers with assertions" it's true that users could write those wrappers, however I don't think std is meant to have the minimal possible surface area, and if we save many programs an extra call to assert I see that as a benefit.
Also, the fact that some of those wrappers error on Windows, because they intend to use a missing function for additional logic, makes me think that at least those instances must do more than just this single assertion, making them even more valuable to their users (which otherwise have to figure it out themselves).

I do generally agree with discouraging absolute path operations; every -Absolute function should include a doc comment mentioning benefits of relative / handle-based operations, but I think there'll be a couple of use cases for -Absolute operations.

@squeek502
Copy link
Collaborator Author

squeek502 commented Aug 8, 2023

Just to be clear:

The isAbsolute asserts in the existing Absolute suffixed functions are only there to discourage their use in the general case, and there is no use-case that they uniquely serve since most of them are just a wrapper over std.fs.cwd().foo(...) (so removing them entirely wouldn't break any use cases).

I could be wrong, but I believe there is no inherent benefit to splitting functions into absolute-only/relative-only variations. If you have a particular use-case in mind, or an implementation that a relative-only/absolute-only function could take advantage of, then please provide it.

The Dir functions can handle any path type, and that's the whole impetus behind this proposal. Right now this fact is obfuscated by the existence of the Absolute functions, and I think removing them would make it much clearer that the Dir API is the way to go for any path type.

The idea here is that something like this (which works as intended with status quo Zig):

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(gpa.deinit() == .ok);
    const allocator = gpa.allocator();

    const args = try std.process.argsAlloc(allocator);
    defer std.process.argsFree(allocator, args);

    if (args.len < 2) return error.MissingArgs;

    const filename = args[1];
    const contents = try std.fs.cwd().readFileAlloc(allocator, filename, std.math.maxInt(usize));
    defer allocator.free(contents);

    std.debug.print("contents len: {}\ncontents:\n{s}\n", .{ contents.len, contents });
}

will work for any target and any path type provided by the user:

C:\Users\Ryan\Programming\Zig\tmp> fsargs.exe hello.txt
contents len: 5
contents:
hello

C:\Users\Ryan\Programming\Zig\tmp> fsargs.exe C:\Users\Ryan\Programming\Zig\tmp\hello.txt
contents len: 5
contents:
hello

C:\Users\Ryan\Programming\Zig\tmp> fsargs.exe \\localhost\C$\Users\Ryan\Programming\Zig\tmp\hello.txt
contents len: 5
contents:
hello

And this is even true for WASI:

C:\Users\Ryan\Programming\Zig\tmp> zig build-exe fsargs.zig -target wasm32-wasi

C:\Users\Ryan\Programming\Zig\tmp> wasmtime fsargs.wasm --dir=. -- hello.txt
contents len: 5
contents:
hello

Having relative-only/absolute-only functions would just make it easier to write a version of this code that doesn't work for some targets for no discernible benefit (but, again, I could be wrong about that).

@rohlem
Copy link
Contributor

rohlem commented Aug 8, 2023

checks isAbsolute on systems where that is necessary

There are no such systems AFAIK.

I thought there was the point of WASI only understanding handle-relative paths.
The -Absolute variants could be @compileError on WASI, while the code path in the universal function can @panic.
Additionally if we spot an absolute path we can pass null/undefined for the handle argument, which seems safer than passing a handle that we hope won't be used because the path is supposed to be seen as absolute by the OS.
But I get if that is not seen as worth complicating the API.

I believe there is no inherent benefit to splitting functions into absolute-only/relative-only variations. If you have [use-case or example implementation in favor], then please provide it.

The Dir functions can handle any path type, and that's the whole impetus behind this proposal. [...] I think removing [-Absolute variants] would make it much clearer that the Dir API is the way to go for any path type.

I'm not particularly well-versed in the topic, and it seems like you thought about this more than I have. My 2c regardless:

The only benefit I would see is easier explicitness (+ corresponding bug safety), providing the isAbsolute-assertion for users.
I'm not really speaking from experience or anything, but if I want my application to only use paths relative to the handles I provide, I'd feel that better expressed by using functions that assert that is what's happening, rather than functions that would also work for absolute paths.
Not saying it's a realistic scenario, but an off-by-one error turns a/foo into /foo, which the -Relative function would catch but universal function would silently do something different with.
So that's why to me the two explicit variants seem worth the couple extra lines.

(And I still don't quite understand the comment/usage of/about cStrToWin32PrefixedFileW - was that always a no-op? Or is it some necessary logic that users have to figure out for absolute-path-symlinking on Windows?)

@squeek502
Copy link
Collaborator Author

squeek502 commented Aug 8, 2023

which seems safer than passing a handle that we hope won't be used because the path is supposed to be absolute.
But I get if that is not seen as worth complicating the API.

This wouldn't actually gain us anything I don't think, since the underlying syscalls already handle this type of stuff for us. For example, openat:

If the pathname given in pathname is relative, then it is interpreted relative to the directory referred to by the file descriptor dirfd

If pathname is absolute, then dirfd is ignored.


if I want my application to only use paths relative to the handles I provide

Then you are free to add a !std.fs.path.isAbsolute assert into your code. The point I was trying to make about that was that this is inherently a user decision, and doesn't really have any bearing on the implementation, so providing Absolute/Relative versions that do nothing besides an assertion that you could just as easily do yourself doesn't seem like a good reason to have Absolute/Relative versions.

EDIT: And 'absolute paths are not supported when targeting WASI' is not actually fully true in practical terms IIUC. As mentioned in the OP, when linking wasi-libc, absolute paths passed to libc functions will be attempted to be resolved into a dir + relative path combination by wasi-libc.

And I still don't quite understand the comment/usage of/about cStrToWin32PrefixedFileW - was that always a no-op? Or is it some necessary logic that users have to figure out for absolute-path-symlinking on Windows?

The only point there was showing that the Absolute versions are not well used/tested (since a fully broken function has existed on Windows for years without detection), but it's not very strong evidence. More compelling evidence would be to fully remove all usages of the Absolute suffixed functions from the Zig codebase (and there's not that many), which I intend to do, to show what this proposal would actually mean in terms of impact.

@marler8997
Copy link
Contributor

Here's another idea that might be worth consideration. Adding std.fs.root() which returns a directory that only works on absolute paths. The *Absolute functions could be removed in favor of this root directory instance and applications would still have a way to express their intention if they explicitly want to limit themselves to absolute paths.

Such an interface also lends itself to extensibility, namely, I could imagine adding a std_option that allows the root module to provide a "root" for platforms that dont' support it like WASI.

@Vexu Vexu added standard library This issue involves writing Zig code for the standard library. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Aug 8, 2023
@Vexu Vexu added this to the 0.13.0 milestone Aug 8, 2023
@squeek502
Copy link
Collaborator Author

squeek502 commented Aug 8, 2023

@marler8997 I'm still struggling to see why

// will panic/cause UB if path is not absolute
var file = try std.fs.openFileAbsolute(path, .{});

or

// will ??? if path is not absolute
var file = try root.openFile(path, .{});

would be preferable to

// handle non-absolute paths in whatever way makes sense for your program
// if there's some reason for that to be necessary
std.debug.assert(std.fs.path.isAbsolute(path));
// or
if (!std.fs.path.isAbsolute(path)) return error.NonAbsolutePath;

var file = try std.fs.cwd().openFile(path);

am I missing something that makes requiring absolute paths a common use case?

But besides that, there's no singular root on e.g. Windows (C:\, D:\ \\server\share\, etc are all possible roots, none of which are guaranteed to even exist).

@squeek502
Copy link
Collaborator Author

squeek502 commented Aug 8, 2023

See #16743 for how removing the Absolute suffixed std.fs functions would affect the Zig codebase.

EDIT: In making the changes in that PR, I came across another reason why the Absolute versions aren't very useful: the Absolute terminology is overloaded and it's hard/impossible to tell why the Absolute variant is being used--is it being used just because the programmer happens to know the path is absolute, or is it being used because the programmer knows the path must be absolute for the code to work correctly? Removing the Absolute variants will remove this ambiguity since the programmer will be forced to make their intentions more explicit if they want to assert/check that a given path is absolute.

@zimmi
Copy link
Contributor

zimmi commented Aug 11, 2023

Comment from the peanut gallery: Encouraging the explicit passing of directory handles is consistent with passing allocators and could have many of the same advantages: better testability (i.e. passing handles to in memory dirs), test dir implementations with file descriptor leak detection, nudging library developers toward portability, etc.

@squeek502
Copy link
Collaborator Author

squeek502 commented Aug 12, 2023

nudging library developers toward portability

I think this might actually be one of the biggest benefits of going all in on the Dir-based API. I went into that a bit here: #7540 (comment)

In hindsight, that comment isn't technically correct as the WASI implementation could try to infer a Dir from a path (as mentioned in the OP, wasi-libc does this), but by encouraging passing a Dir around (similar to Allocator, as you mentioned), libraries will essentially become portable-by-default, and someone that wants to target WASI will be able to use whatever libraries they want without needing it to specifically support targeting WASI.

@matu3ba
Copy link
Contributor

matu3ba commented Aug 16, 2023

Comment from the peanut gallery: Encouraging the explicit passing of directory handles is consistent with passing allocators and could have many of the same advantages: better testability (i.e. passing handles to in memory dirs), test dir implementations with file descriptor leak detection, nudging library developers toward portability, etc.

I disagree insofar, as this will leak directory handles into parallel spawned processes on xor force a full mutex to restrict process spawn xor close all unwanted ones (communicating via environment or stdin) with unnecessary expensive syscalls for every close.

This might be no problem in the Zig unit test system, since only 1 child is spawned. However, users wanting to spawn children of children might not want to have those drawbacks.

@squeek502
Copy link
Collaborator Author

Closing. The reasoning can be found here:

#16743 (comment)

While I think the API might benefit from the removal of Absolute functions in terms of how understandable it is / "one obvious way of doing things" / cross-platform-by-default, there is a use case for the Absolute functions when it's known that the path is absolute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

6 participants