Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove absolute path support in Dir.open functions #7540

Closed
marler8997 opened this issue Dec 24, 2020 · 15 comments
Closed

Remove absolute path support in Dir.open functions #7540

marler8997 opened this issue Dec 24, 2020 · 15 comments
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@marler8997
Copy link
Contributor

marler8997 commented Dec 24, 2020

Dir functions like openFile and openDir accept relative paths, but they also accept absolute paths. The implementation could be made simpler if they only worked with relative paths, and the interface would arguably be simpler/easier to understand. I'm not sure what benefit we get by making them also work with absolute paths. By removing absolute path support in these functions, the only thing we would lose is open functions that can work with relative and absolute paths, but these methods could be implemented as "free functions" without a Dir reference (if we wanted to support this mechanism). I propose removing support for absolute paths in the Dir functions.

@Vexu Vexu added enhancement Solving this issue will likely involve adding new logic or components to the codebase. standard library This issue involves writing Zig code for the standard library. labels Dec 24, 2020
@Vexu Vexu added this to the 0.8.0 milestone Dec 24, 2020
@squeek502
Copy link
Collaborator

squeek502 commented Dec 25, 2020

This goes against my understanding of the design of the Dir-based fs library. As far as I can tell (but this is not documented anywhere, which is a problem imo and has been a point of confusion for a lot of people), the reason for having everything go through Dir is that it allows for that one API to be maximally cross-platform--even for platforms that don't have the concept of an absolute path (e.g. WASM). That is, the point of Dir is not meant to distinguish between handling 'relative' and 'absolute,' but rather serves as a necessary common-denominator for the desired level of cross-platform conformance.

For example, if Dir functions only worked for relative paths, then someone trying to compile the same code for both WASM and Windows would have to have some special glue code to handle the case of absolute paths on Windows (i.e. if they were writing a library and couldn't be sure whether the paths they will be given were relative or absolute). With the current design, though, you can just use Dir.openFile and it will work anywhere (excluding #4659 which would be great to get resolved, but is tangential to this).

As mentioned before, this is all rather counter-intuitive (if I'm even right about the reasoning, which I may not be), and it would be great to get some documentation added that clearly lays out why std.fs is designed the way it is, and specifies that using fs.Dir for all paths is the intended way of doing things.

@Vexu Vexu added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Dec 25, 2020
@marler8997
Copy link
Contributor Author

marler8997 commented Dec 25, 2020

@squeek502 I don't understand why code would have to be different? Can you provide an example? If your application works on WASM, then that implies it uses relative path (since WASM doesn't support absolute paths), which would mean you would be using the Dir functions. In the case where the application wants to work with either relative or absolute paths, then my suggestion in the description is to provide "free functions" for this that don't contain an unused Dir parameter. These free functions would be arguably less confusing than supporting absolute paths in the Dir functions. Here's an example of why it is confusing.

pub fn example(out_name: []const u8) !void {
    const dir = try openDir("out");
    defer dir.close();
    const f = dir.openFile(out_name);

    // code that uses f
}

example("/tmp/foo");

In this example. the intention of the example function is to open a file relative to the out directory. As it exists, this intention is circumvented if the user provides an absolute path. So the example function is written incorrectly. To be correct, it would need to check that the out_name parameter it receives is not an absolute path, however, this requirement of the client to use the API correctly is unclear, because the code looks like it is creating a file relative to the dir path it is calling openFile on, otherwise, why would it have a dir argument in the first place? This means that whenever the codes' intention is to open a reletive file by calling a method on Dir, it MUST check whether the path is absolute to be correct. This violates the principle of making the "correct" way to do things the default/easy way. One must go out of their way to write correct code using this interface.

In the case where it is the intention to support absolute and relative paths, then there is no need for a Dir instance:

pub fn example(out_filename: []const u8) !void {
    const f = try openFile(out_filename);

    // code that uses a and b
}

example("/tmp/foo");

With the proposed API change, the first example that is written incorrectly would result in a runtime error which indicates that the path given must be an relative path. This means we are failing earlier, by failing as soon as the function is called rather than failing because an assumption that the file will be created underneath the directory is violated.

@semarie
Copy link
Contributor

semarie commented Dec 26, 2020

technically, it is how POSIX openat(3) function behave when an absolute path is passed. for quoting POSIX description:

The openat() function shall be equivalent to the open() function except in the case where path specifies a relative path.

So for someone coming from POSIX, the current behaviour of Dir-based fs is coherent.

@squeek502
Copy link
Collaborator

squeek502 commented Dec 26, 2020

@marler8997 My understanding is that both of your examples would not be able to work when targeting WASI, since a Dir is always required. That is, there is no way to (correctly) do anything fs-related without an existing Dir when targeting WASI (as I understand it, there's no CWD, you're not guaranteed to even have a preopen dir, and even if you do have a preopen dir, it's incorrect to have to guess at which preopen is intended).

I could very well be wrong about this, though, so please correct me if I am.

@marler8997
Copy link
Contributor Author

@semarie that's a fair point, it looks like that is the case for the ...at() functions on Posix. However, it looks like there have been multiple proposals for flags to modify the ...at() functions to disable this ability. This article goes through the history of it: https://lwn.net/Articles/723057/ , however, it doesn't look like any proposal has been accepted yet, though, some maintainers including Linus have said they see the usefulness for the flag. It just seems like the details and patch have not been finalized. I personally see use cases for both mechanisms, for me I would love to see a way to support both.

@sqeek502 Sorry I'm not understanding what you're trying to say. This issue is proposing that we remove absolute path support in the Dir methods, so if WASI never supported absolute paths in the first place then this proposal would have no affect on that platform.

@squeek502
Copy link
Collaborator

squeek502 commented Dec 26, 2020

@marler8997 I think what I'm trying to get across is that (as far as I'm aware):

these methods could be implemented as "free functions" without a Dir reference

is not possible if the goal is to be able to target all platforms (including WASI) with the same code.

So, the options as I see them are:

  • Make the API less confusing, but also drop WASI support from the 'universal' API (the proposed free functions would presumably become the 'default', but they couldn't be used for WASI)
  • Keep the slightly confusing Dir-for-everything API in order to keep WASI support in the 'universal' API (but add docs letting people know that Dir-for-everything is the intended 'default')

@marler8997
Copy link
Contributor Author

These "free functions" would still work on WASI. The "free functions" I described accept either absolute or relative paths, it handles both. That just means on WASI if an absolute path is passed in, it sounds like the WASI syscall would fail, just like it would today. This behavior doesn't change with this proposal.

@squeek502
Copy link
Collaborator

squeek502 commented Dec 26, 2020

A relative path on its own is not good enough for WASI, though. It's more similar to URL resolution: you can't resolve a relative link like "path/to/something" on its own--you also need to know the URL of the current page. In WASI, a relative path is similarly contextless--you need a Dir to know what it's meant to be relative to.

As always, I may be wrong. My understanding only comes from writing some fs tests for Zig and making sure they pass when targeting WASI, so my familiarity with how WASI actually works is pretty minimal. Do you have something in mind for how to implement a Dir-less function that will work for WASI?

@marler8997
Copy link
Contributor Author

@sqeek502 I think we're getting off into the weeds here, like I said this proposal should have no affect on WASI. If WASI requires a Dir reference to work, then it will still requires a Dir reference to work regardless of whether or not the Dir methods support absolute paths. If I'm missing something, maybe you can clear it up by providing a code example? Can you provide an example that works on WASI today, but then wouldn't work with this proposal?

@squeek502
Copy link
Collaborator

@marler8997

Sure, here's an example. With the current API design, if you were to write a function that takes a path to a file that should be parsed, then the current API is designed to encourage you to write it something like this (although it is not as obvious that this is the case as it should be):

const std = @import("std");
const builtin = @import("builtin");
const fs = std.fs;

fn parseFile(dir: fs.Dir, path: []const u8) !ParsedStruct {
    const file = try dir.openFile(path, .{});
    defer file.close();

    var parsed = ParsedStruct{};

    // parse
    var buffer: [1]u8 = undefined;
    const bytes_read = try file.readAll(&buffer);
    parsed.empty = bytes_read == 0;

    return parsed;
}

const ParsedStruct = struct {
    empty: bool = false,
};

test "empty" {
    var tmp = std.testing.tmpDir(.{});
    defer tmp.cleanup();

    var file = try tmp.dir.createFile("test", .{});
    file.close();

    const parsed = try parseFile(tmp.dir, "test");
    std.testing.expect(parsed.empty);
}

test "not empty" {
    var tmp = std.testing.tmpDir(.{});
    defer tmp.cleanup();

    try tmp.dir.writeFile("test", "something");

    const parsed = try parseFile(tmp.dir, "test");
    std.testing.expect(!parsed.empty);
}

test "absolute path" {
    if (builtin.os.tag == .wasi) return error.SkipZigTest;

    var tmp = std.testing.tmpDir(.{});
    defer tmp.cleanup();

    try tmp.dir.writeFile("test", "something");
    var absolute_path = try tmp.dir.realpathAlloc(std.testing.allocator, "test");
    defer std.testing.allocator.free(absolute_path);

    const parsed = try parseFile(tmp.dir, absolute_path);
    std.testing.expect(!parsed.empty);
}

This will allow it to work on any target (with both relative and absolute paths where absolute paths are supported):

$ zig test 7540.zig
All 3 tests passed.

$ zig test -target wasm32-wasi --test-cmd wasmtime --test-cmd --dir=. --test-cmd-bin 7540.zig
2 passed; 1 skipped.

If, instead, Dir-less free functions existed, then the more obvious way to write that function would be the same as your example code above:

fn parseFile(path: []const u8) !ParsedStruct {
    const file = try fs.openAnyFile(path, .{});
    defer file.close();

    // ...
}

But this then makes that function no longer able to target WASI (see my previous comments).

So, my concern with the proposal is that the existence of the proposed free functions would change what type of code people are encouraged to write, and that the use of the free functions would (unintentionally) exclude targeting certain platforms (WASI in this case). This would in turn reduce the re-usability of libraries across all the target platforms (you'd have to intentionally 'support' WASI whereas now you basically get WASI support for free).

@marler8997
Copy link
Contributor Author

Hmm that's interesting. So if I understand you correctly, in order for any code that works with the filesystem to work with WASI, it has to pair every filename with a Dir reference as well? How does the WASI application retrieve a Dir reference in the first place?

@squeek502
Copy link
Collaborator

Preopens, see std/fs/wasi.zig. std.testing.tmpDir uses getCwdOrWasiPreopen to get the "." preopen combined with passing --dir=. to wasmtime in build.zig to setup the "." preopen.

Note: preopens are not guaranteed to exist--running my previous example without the --test-cmd --dir=. fails with:

$ zig test -target wasm32-wasi --test-cmd wasmtime --test-cmd-bin 7540.zig
1/3 test "empty"... unable to make tmp dir for testing: didn't find '.' in the preopensError: failed to run main module `zig-cache/o/3cffa09a31ef3ca94e7f3b9359db30fc/test.wasm`

Caused by:
    0: failed to invoke command default
    1: wasm trap: unreachable
       wasm backtrace:
           0:  0x675 - <unknown>!std.os.abort
           1:  0x531 - <unknown>!std.builtin.default_panic
           2: 0x8079 - <unknown>!std.testing.getCwdOrWasiPreopen
           3: 0x5db9 - <unknown>!std.testing.tmpDir
           4: 0x59ed - <unknown>!test "empty"
           5: 0xa778 - <unknown>!std.special.main
           6: 0x9e1c - <unknown>!_start
       note: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information
       
error: the following test command failed with exit code 134:
wasmtime zig-cache/o/3cffa09a31ef3ca94e7f3b9359db30fc/test.wasm

@marler8997
Copy link
Contributor Author

@squeek502 I don't know much about WASI, but I would think that the free functions could use some sort of default Dir reference (i.e. like getCwdOrWasiPreopen). In any case, the alternative is that in order to work with WASI, every single file path would need to be paired with a Dir reference. Are you saying that ALL code that deals with filenames should be modified to pair all these path names with Dir references for the sake of working with WASI? For example, does this mean we need to modify every function in existence that deals with filenames to also take a Dir reference alongside of it? Is this the convention we want to establish in Zig?

@squeek502
Copy link
Collaborator

squeek502 commented Dec 28, 2020

Is this the convention we want to establish in Zig?

As I understand it, yes, this was/is the intention, although I may be wrong about maximizing-cross-platform-ness being the original goal. From #3813 when File.openRead (which was more like a Dir-less free function) was deprecated in favor of Dir.openFile:

It should now be clear the pattern that the standard library file system API is approaching. The API encourages programmers to take advantage of directory traversal via directory handles, for better general protection against Time Of Check, Time Of Use bugs across all Zig codebases.

@sedwardsmarsh
Copy link

For the sake of convenience, I agree that there should be a separate function like openFileRelative() that doesn't need a Dir argument.

Passing in the Dir to openFile() is confusing to a beginner Zigling like myself.

My suggestion is adding another function to std.fs, named openFileRelative() which doesn't require a Dir argument. Then std.fs.Dir.openFile() doesn't need to be modified.

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2024
@andrewrk andrewrk modified the milestones: 0.16.0, 0.14.0 Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

6 participants