Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io/fs: add ReadLinkFS interface #49580

Open
zombiezen opened this issue Nov 14, 2021 · 98 comments
Open

io/fs: add ReadLinkFS interface #49580

zombiezen opened this issue Nov 14, 2021 · 98 comments
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. Proposal Proposal-Accepted
Milestone

Comments

@zombiezen
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.17.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

Walked a directory with fs.WalkDir and encountered a symlink that I wanted to read.

What did you expect to see?

A function fs.ReadLink that behaves like os.Readlink, but operates on an fs.FS. Design sketch:

package fs

// ReadLink returns the destination of the named symbolic link.
//
// If fsys does not implement ReadLinkFS, then ReadLink returns an error.
func ReadLink(fsys FS, name string) (string, error)

// ReadLinkFS is the interface implemented by a file system that supports symbolic links.
type ReadLinkFS interface {
  FS

  // ReadLink returns the destination of the named symbolic link.
  ReadLink(name string) (string, error)
}

I would also want the file system returned by os.DirFS to have an implementation that calls os.Readlink. IIUC archive/zip.Reader would probably also benefit from an implementation.

An open question in my mind is whether the returned destination should be a slash-separated path or kept as-is. I think for consistency it probably should convert to a slash-separated path, but I'm not sure if this has problems on Windows.

What did you see instead?

No such API exists.

Other details

I have bandwidth to contribute an implementation of this, but I understand we're in the freeze and the earliest this could go in is Go 1.19.

This is somewhat related to #45470, but I'm not proposing changing any existing semantics, just adding a new method.

@gopherbot gopherbot added this to the Proposal milestone Nov 14, 2021
@bcmills
Copy link
Contributor

bcmills commented Nov 15, 2021

An open question in my mind is whether the returned destination should be a slash-separated path or kept as-is.

I would say it should be slash-separated and also relative to the same FS: links to absolute paths should be made relative, and links to paths above the FS root (or on an entirely different volume) should be rejected.

@rsc
Copy link
Contributor

rsc commented Dec 1, 2021

Rewriting the link is tricky and not rewriting the link is also tricky. It's unclear to me what we should do here, if anything.

@rsc
Copy link
Contributor

rsc commented Dec 1, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@zombiezen
Copy link
Contributor Author

As a data point, for the application I was writing, rewriting the link to be relative is effectively what I did anyway. I wanted to create a zip archive of an on-disk directory, so absolute paths were rewritten to be relative to the DirFS root. Returning an error if the path could not be represented was acceptable.

@rsc
Copy link
Contributor

rsc commented Dec 8, 2021

So the API for ReadLink is that it must return a path relative to the link, and if it can't do that, then it returns an error?
It seems like most symlinks are absolute, though, and most DirFS(foo) will not use foo = "/", so that will make most symlinks result in errors?

I'm trying to understand how useful this will be in practice, to balance against the cost. Will it be useful in practice? Or will people just be frustrated that 99% of symlinks aren't usable?

@bcmills
Copy link
Contributor

bcmills commented Dec 8, 2021

So the API for ReadLink is that it must return a path relative to the link, and if it can't do that, then it returns an error?

Yes.

It seems like most symlinks are absolute, though, and most DirFS(foo) will not use foo = "/", so that will make most symlinks result in errors?

I would expect the ReadLink method to internally transform absolute link paths to relative ones (for example, by using filepath.Rel and then checking that the returned path does not begin with ../.) That would allow symlinks to succeed when they are to other locations under the FS root — they would only fail if they are absolute symlinks that would otherwise refer to locations completely outside the FS tree.

So, for example, os.DirFS("/") on Unix would be able to resolve symlinks anywhere on the filesystem, even if those symlinks are absolute.

@hherman1
Copy link

hherman1 commented Dec 8, 2021

I would expect the ReadLink method to internally transform absolute link paths to relative ones (for example, by using filepath.Rel and then checking that the returned path does not begin with ../.) That would allow symlinks to succeed when they are to other locations under the FS root — they would only fail if they are absolute symlinks that would otherwise refer to locations completely outside the FS tree.

Does this extend to non local-disk filesystems?

@zombiezen
Copy link
Contributor Author

@rsc:

So the API for ReadLink is that it must return a path relative to the link, and if it can't do that, then it returns an error?

That was sufficient for my application. The root of my DirFS could contain multiple projects (think a GOPATH-like setup), so symlinks would usually resolve within the io/fs filesystem.

It seems like most symlinks are absolute, though, [...]

I'm not 100% convinced of that, but I don't have evidence to dispute your claim.

FWIW the cases that I was concerned with in a DevTools context:

  • Symlinks within the same directory. A common case is multiple aliases for a single on-disk program, like Busybox.
  • IIRC old versions of Git would use symlinks to represent symbolic references like HEAD instead of a file.

I'm trying to understand how useful this will be in practice, to balance against the cost. Will it be useful in practice? Or will people just be frustrated that 99% of symlinks aren't usable?

Agreed, I think weighing this tradeoff is the trickiest part of this proposal.

Relative links are IMO the most useful for a consumer of this API. I could imagine an implementation of ReadLink also returning absolute paths in the case where the returned path is above the root io/fs directory, but this might add complexity when the paths aren't slash-separated. However, I tend to agree with @bcmills that being strict about this is probably better.

You're probably already considering this, but it's just FS.ReadLink that would be picky about the link target. The implicit interface of DirFS.Open is to follow symlinks (again, #45470 tracks spelling out that behavior for other filesystems), and symlinks are already visible in directory listings.


@hherman1:

Does this extend to non local-disk filesystems?

I'm proposing the slash-separated relative path restriction would extend to non-local-disk filesystems, yes. How each implementation meets this contract is up to the individual filesystem.

@rsc
Copy link
Contributor

rsc commented Dec 15, 2021

@zombiezen, thanks for the use cases. It seems fairly unobtrusive to add ReadLinkFS and fs.ReadLink, so the cost seems low and the benefit > 0.

@bcmills:

I would expect the ReadLink method to internally transform absolute link paths to relative ones (for example, by using filepath.Rel and then checking that the returned path does not begin with ../.) That would allow symlinks to succeed when they are to other locations under the FS root — they would only fail if they are absolute symlinks that would otherwise refer to locations completely outside the FS tree.

So, for example, os.DirFS("/") on Unix would be able to resolve symlinks anywhere on the filesystem, even if those symlinks are absolute.

Rewriting symlinks may run into problems. I've been burned enough that I'm a bit wary about that. Should we start with just erroring out on the absolute ones?

@bcmills
Copy link
Contributor

bcmills commented Dec 15, 2021

I think it would be fine to start by erroring out on absolute links, and perhaps define a specific error (or error type) for symlinks that refer to locations outside of the FS.

From the perspective of fs.ReadLinkFS, the requirement would be that every returned path is relative to the passed-in name and below the FS root. (It would be up to the specific FS implementation to decide whether to achieve that by rewriting absolute links or rejecting them.)

@rsc
Copy link
Contributor

rsc commented Jan 5, 2022

OK, so it sounds like we agree on ReadLink and ReadLinkFS but with the restriction that the returned link must be relative, and absolute symlinks return errors instead. Do I have that right?

@zombiezen
Copy link
Contributor Author

IIUC the full constraint for the returned link is relative and underneath the root of FS. Otherwise, yes, that matches my understanding.

@rsc
Copy link
Contributor

rsc commented Jan 12, 2022

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jan 19, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: io/fs: add ReadLinkFS interface io/fs: add ReadLinkFS interface Jan 19, 2022
@rsc rsc modified the milestones: Proposal, Backlog Jan 19, 2022
@zombiezen
Copy link
Contributor Author

Excellent! I'm working on a CL now.

Implementation question: I would very much like to be able to use filepath.Clean and filepath.Rel in the os.DirFS implementation, but that would introduce a cyclic dependency AFAICT. What's the best way to use those functions?

@ianlancetaylor
Copy link
Contributor

In principle you could move the functions into an internal package imported by both the os package and the path/filepath package. But try to avoid that if you can. Better to keep the os package focused on the simplest possible approach. In particular I don't see why the os package would need to use Clean.

DavidGamba added a commit to DavidGamba/dgtools that referenced this issue Jan 20, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/385534 mentions this issue: io/fs: add ReadLinkFS interface

@kaniini
Copy link

kaniini commented Feb 21, 2022

I have use-cases which require the ability to have an absolute path returned, namely, taking an fs.FS and taring it up (tons of symlinks to /bin/busybox), would it be possible to make it so that an absolute path can be returned if opted into?

@qmuntal
Copy link
Contributor

qmuntal commented Oct 4, 2024

Sorry for being late to the party, I've just returned from a long paternity leave. My only concern is the interface methods docs, IMO they are too Unix specific. Windows supports other types of links that are not symbolic links. For example, on Windows, os.ReadLink supports symbolic links and mount points, and os.Lstat follows any kind of link (also known as reparse tag name surrogate).

My suggestion would be to remove the word symbolic from the docs.

@mvdan
Copy link
Member

mvdan commented Oct 4, 2024

Swapping "symbolic link" for "link" could make the docs a bit ambiguous with hard links though: https://pkg.go.dev/os#Link

Does Windows use any term for ReadLink and Lstat other than "link"? The OS in general does seem to have the notion of "hard links" as well as "symbolic links", which this proposal is mainly aimed at: https://learn.microsoft.com/en-us/windows/win32/fileio/hard-links-and-junctions

@qmuntal
Copy link
Contributor

qmuntal commented Oct 4, 2024

Does Windows use any term for ReadLink and Lstat other than "link"?

Could we use "soft link"? It is not a well established term on Windows, but it appears in Microsoft's docs and it should work well on Unix too. We could mention symlinks, and then clarify that other types of soft links are also supported, such as Windows mount points.

@zombiezen
Copy link
Contributor Author

@aclements SGTM. I can draft a new version of the CL within the next week or so.

@zombiezen
Copy link
Contributor Author

I have a new patchset of https://go.dev/cl/385534 ready for review that incorporates the discussed changes.

@aclements
Copy link
Member

I'm happy to use the term "soft link". It's a bit tricky to document in detail because this is an interface. It might not be backed by any sort of OS file system at all, so the exact meaning is kind of left up to each implementation of the interface. I'm open to suggestions, though.

@aclements
Copy link
Member

Based on the discussion above, this proposal seems like a likely accept.

The proposal is:

package fs

type ReadLinkFS interface {
        FS

        // ReadLink returns the destination of the named soft link.
        ReadLink(name string) (string, error)

        // Lstat returns a FileInfo describing the file without following any soft links.
        // If there is an error, it should be of type *PathError.
        Lstat(name string) (FileInfo, error)
}

along with toplevel

func ReadLink(fsys FS, name string) (string, error)
func Lstat(fsys FS, name string) (FileInfo, error)

There are no guarantees or requirements about what ReadLink returns.

We can continue to hash out what the exact documentation should look like to cover both Unix and Windows.

(This time for sure!)

@aclements aclements moved this from Active to Likely Accept in Proposals Oct 23, 2024
@zhangyoufu
Copy link
Contributor

zhangyoufu commented Oct 24, 2024

On Windows, can the current interface distinguish between Junction and Symlink, which are two different types?
See also https://www.2brightsparks.com/resources/articles/ntfs-hard-links-junctions-and-symbolic-links.html

The return value of ReadLink lacks a rigorous definition. According to this wiki page from ntfs-3g, Junctions may also contain volume id. If the target volume does not exist, what should ReadLink return? A failure or NT Object Manager path beginning with \\?\?

@mvdan
Copy link
Member

mvdan commented Oct 24, 2024

@zhangyoufu It seems to me like the same questions apply to the documented behavior for https://pkg.go.dev/os#Readlink. Does os.Readlink behave OK for you in those scenarios?

@zhangyoufu
Copy link
Contributor

zhangyoufu commented Oct 25, 2024

@mvdan Yes. Current behavior of os.Readlink meet my expectation under Windows.

junction (mountpoint) => \\?\Volume{c54466ef-0000-0000-0000-100000000000}\
junction => C:\junction-target
symlinkd => symlinkd-target
symlink => symlink-target

Go standard library did a good job handling/testing all these cases.

@mvdan
Copy link
Member

mvdan commented Oct 25, 2024

@zhangyoufu given that os.Readlink doesn't document any specific Windows behavior, and that it already behaves the way that works for you, I suppose we can do the same for fs.ReadLinkFS.ReadLink. I'm not sure that anything in particular needs to change in the proposed API.

@qmuntal
Copy link
Contributor

qmuntal commented Oct 25, 2024

I'm not sure that anything in particular needs to change in the proposed API.

Agree. io/fs interfaces and documentation should avoid mentioning concret implementations details and special cases. If the os implementation needs more documentation (which I think it does), then it should be added to os.DirFS.

@zhangyoufu
Copy link
Contributor

zhangyoufu commented Oct 25, 2024

During my test on Windows, FileInfo returned by Lstat looks strange:

  • we can Readlink() on junction, while FileInfo.Mode() returns ModeIrregular rather than ModeSymlink
  • junction (incl. volume mountpoint) and symlinkd has FILE_ATTRIBUTE_DIRECTORY from win32 API, but FileInfo.IsDir() simply returns false

Here is my test result:

mountpoint => \\?\Volume{c54466ef-0000-0000-0000-100000000000}\, modeSymlink=false, modeIrregular=true, isDir=false, fileAttributeDir=true
junction => C:\junction-target, modeSymlink=false, modeIrregular=true, isDir=false, fileAttributeDir=true
symlinkd => X:\symlinkd-target, modeSymlink=true, modeIrregular=false, isDir=false, fileAttributeDir=true
symlink => X:\symlink-target, modeSymlink=true, modeIrregular=false, isDir=false, fileAttributeDir=false

To make io/fs interface portable, robust and implementation-neutral, it may be necessary to clarify:

  • If Readlink() succeed on a file, do we require ModeSymlink on its FileInfo.Mode?
  • Is it allowed to have both ModeDirectory and ModeSymlink?
  • Should we expose a race-free way to perform both ReadLink and Lstat? (on Linux it's possible to open using O_PATH | O_NOFOLLOW, then calls fstat and readlinkat)

Test Code:

test.bat
@echo off
cd /D "%~dp0"

for /f "tokens=*" %%i in ('mountvol %SystemDrive% /L') do set VolumeName=%%i

REM prepare
md mountpoint
mountvol mountpoint %VolumeName%
mklink /J junction %SystemDrive%\junction-target
mklink /D symlinkd X:\symlinkd-target
mklink symlink X:\symlink-target

pause
go run main.go
pause

REM cleanup
mountvol mountpoint /D
rd mountpoint
rd junction
rd symlinkd
del symlink
main.go
package main

import (
    "fmt"
    "io/fs"
    "os"
    "reflect"
    "syscall"
)

func main() {
    for _, source := range []string{"mountpoint", "junction", "symlinkd", "symlink"} {
        target, err := os.Readlink(source)
        if err != nil {
            fmt.Println(err)
            continue
        }
        fi, err := os.Lstat(source)
        if err != nil {
            fmt.Println(err)
            continue
        }
        modeSymlink := fi.Mode() & fs.ModeSymlink != 0
        modeIrregular := fi.Mode() & fs.ModeIrregular != 0
        fileAttributeDir := reflect.ValueOf(fi).Elem().FieldByName("FileAttributes").Uint() & syscall.FILE_ATTRIBUTE_DIRECTORY != 0
        fmt.Printf("%s => %s, modeSymlink=%v, modeIrregular=%v, isDir=%v, fileAttributeDir=%v\n", source, target, modeSymlink, modeIrregular, fi.IsDir(), fileAttributeDir)
    }
}

@qmuntal
Copy link
Contributor

qmuntal commented Oct 29, 2024

we can Readlink() on junction, while FileInfo.Mode() returns ModeIrregular rather than ModeSymlink

This is the expected behavior since Go 1.23 (see #61893).

junction (incl. volume mountpoint) and symlinkd has FILE_ATTRIBUTE_DIRECTORY from win32 API, but FileInfo.IsDir() simply returns false

This is also the expected behavior. I don't know the exact reason to not set os.ModeDir for symbolic links and mount points, but it has the nice benefit of avoiding unintentionally following a mount point when traversing a directory, which can lead to infinite recursion.

If Readlink() succeed on a file, do we require ModeSymlink on its FileInfo.Mode?

I wouldn't limit ReadLinkFS.Readlink to just symlinks. Some implementations might support other types of "soft links" (e.g. Windows mount points).

Should we expose a race-free way to perform both ReadLink and Lstat? (on Linux it's possible to open using O_PATH | O_NOFOLLOW, then calls fstat and readlinkat)

This is very related to #67002. It probably deserves a separate proposal.

@zhangyoufu
Copy link
Contributor

zhangyoufu commented Oct 29, 2024

I wouldn't limit ReadLinkFS.Readlink to just symlinks. Some implementations might support other types of "soft links" (e.g. Windows mount points).

IEEE Std 1003.1 defined readlink(2), the API, as read the contents of a symbolic link, and readlink(1), the CLI utility, as display the contents of a symbolic link. It also defined the term link (in context of file hierarchy), is either a hard link or a symbolic link.

I understand that Go may not follow aforementioned spec/standard. But I would be surprised if Go define Readlink to also work on links that is neither hardlink, nor symlink.

IMHO, Windows junctions (incl. mount points) are also symlinks.

@aclements
Copy link
Member

I think none of this affects the ReadLinkFS interface directly, though it certainly affects implementations. It seems like we should improve the documentation on os.Readlink, but that can be done separately.

@aclements aclements moved this from Likely Accept to Accepted in Proposals Oct 30, 2024
@aclements
Copy link
Member

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.

The proposal is:

package fs

type ReadLinkFS interface {
        FS

        // ReadLink returns the destination of the named soft link.
        ReadLink(name string) (string, error)

        // Lstat returns a FileInfo describing the file without following any soft links.
        // If there is an error, it should be of type *PathError.
        Lstat(name string) (FileInfo, error)
}

along with toplevel

func ReadLink(fsys FS, name string) (string, error)
func Lstat(fsys FS, name string) (FileInfo, error)

There are no guarantees or requirements about what ReadLink returns.

We can continue to hash out what the exact documentation should look like to cover both Unix and Windows.

@aclements aclements changed the title proposal: io/fs: add ReadLinkFS interface io/fs: add ReadLinkFS interface Oct 30, 2024
@aclements aclements modified the milestones: Proposal, Backlog Oct 30, 2024
aibor added a commit to aibor/virtrun that referenced this issue Nov 17, 2024
In preparation for golang/go#49580, this
commit adds an own simplified implementation of ReadLinkFS. It adds
helpers to extend a simple fstest.MapFS into a ReadLinkFS.
@qmuntal
Copy link
Contributor

qmuntal commented Nov 21, 2024

Code freeze is today and CL 385534 hasn't been properly reviewed yet. Any chance to put this into the fast line so it can land in Go 1.24?

@dmitshur dmitshur modified the milestones: Backlog, Go1.24 Nov 21, 2024
@mvdan
Copy link
Member

mvdan commented Nov 22, 2024

One last minute comment came in before the submit happened, which only affects fstest and looks pretty easy to fix. I assume we're still OK for 1.24?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. Proposal Proposal-Accepted
Projects
Status: Accepted
Development

No branches or pull requests