Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for std::fs::canonicalize #27706

Closed
alexcrichton opened this issue Aug 12, 2015 · 20 comments · Fixed by #29254
Closed

Tracking issue for std::fs::canonicalize #27706

alexcrichton opened this issue Aug 12, 2015 · 20 comments · Fixed by #29254
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Milestone

Comments

@alexcrichton
Copy link
Member

This is a tracking issue for the unstable fs_canonicalize feature in the standard library. There are a number of reasons it is not currently stable:

  • The implementation on Unix isn't technically memory safe.
  • The implementation on Windows returns some quite odd paths which aren't always accepted by all tools.
  • The guarantees about what "canonical" means haven't been well documented and audited to see what it means to do across platforms.

Stabilizing this should probably also consider C++'s implementation as well!

@alexcrichton alexcrichton added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. A-io labels Aug 12, 2015
@nagisa
Copy link
Member

nagisa commented Aug 12, 2015

The implementation on Windows returns some quite odd paths which aren't always accepted by all tools.

Sounds like issue in the tools rather than pathes or canonicalisation algorithm.

@alexcrichton
Copy link
Member Author

Nominating for stabilization or deprecation in 1.5, I don't think this is likely to change any time soon so we should just make a decision. I believe the memory unsafety on Unix has been resolved in #27930, Windows returning weird things is fine, and I'm ok just documenting what syscalls this is calling.

@retep998
Copy link
Member

This API is getting the canonical path for something, aka the one true path for that something. There may be multiple valid paths to get to something, due to symbolic links and such, but there is only one canonical path. Also, returning a verbatim path on Windows is the right thing to do, because the path is a valid NT path, and with the \\?\ prefix it is also a valid win32 path, but if the \\?\ prefix is removed, then it might no longer be a valid win32 path due to win32 having a few more restrictions on paths than NT.

However I do wish that there was a way to get the canonical path of a file directly from a File and not just a Path.

@cuviper
Copy link
Member

cuviper commented Sep 21, 2015

@retep998 There's no such thing as a canonical path for a File in a unix-style filesystem. Once you're down to a file descriptor, there's no telling what Path corresponds without combing through the filesystem for a matching inode number. Even then, with hard links there may be multiple paths reaching the same underlying file, none more canonical than any other. Or an open file may have no path at all if it was unlinked.

EDIT: I guess Linux does have the original path symlinks in /proc/self/fd/, and BSDs can mount fdesc for something similar. But I doubt Rust should try to expose this.

@retep998
Copy link
Member

Oh, that's a good point about hard links. I shall investigate!

@alexcrichton
Copy link
Member Author

This issue is now entering its cycle-long FCP for stabilization in 1.5

@alexcrichton alexcrichton added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed I-nominated labels Sep 24, 2015
@Diggsey
Copy link
Contributor

Diggsey commented Sep 29, 2015

The windows version of this function currently only works for files: it will fail for directories. This differs significantly from the unix implementation, which uses realpath.

This should work for both files and directories on all platforms. It should also be documented that it will only work for paths that exist.

@retep998
Copy link
Member

Here are findings:

GetFullPathNameW does not deal with symbolic links at all whatsoever. Thus it's only use for us is to get the current directory for a specific drive letter when we're normalizing drive relative paths, when we eventually one day add a normalization method. Since it is bound by MAX_PATH and doesn't hit the filesystem we can implement all the rest of the logic ourselves.

GetFinalPathNameByHandleW does handle all manner of reparse points, but it does not handle hard links. Therefore the documentation should clearly state this behavior.

If you want to know whether two paths point to the same file and account for hard links, you should use GetFileInformationByHandleEx with FileIdInfo, which provides a unique identifier for each file. Files which are hard links to each other will have the same identifier.

@retep998
Copy link
Member

Also, would we want to expose canonicalize on File itself since that is effectively how the implementation works on Windows?

@alexcrichton
Copy link
Member Author

Thanks for the findings and quick fixes @retep998 and @Diggsey!

I'm not currently aware of a great canonicalize implementation for File on Unix, so we may want to hold off on that for now. If it's not easy to do on Linux then the best way forward would be a Windows-specific extension trait for File, but that may not be worth it quite yet.

@retep998
Copy link
Member

Also a question to decide on. If we have C:\foo\bar.txt for example, where bar.txt is a symbolic link to somewhere else, should canonicalize(r"C:\foo\bar.txt") return the canonical path to the symbolic link bar.txt or to the final thing that bar.txt points to?

@Diggsey
Copy link
Contributor

Diggsey commented Sep 29, 2015

Definitely the latter: canonicalization should (at least by default) resolve all symbolic links. To do otherwise, and resolve some but not the last one, is just asking for trouble.

@alexcrichton
Copy link
Member Author

We probably shouldn't try to invent too many Rust-specific semantics for this kind of function, instead it may be worth taking a look at what Unix and Windows print out and then go from there.

@Diggsey
Copy link
Contributor

Diggsey commented Sep 29, 2015

Windows will always follow symlinks unless the application indicates otherwise by passing in a special flag to CreateFile. I assume linux does something similar.

@steveklabnik steveklabnik added this to the 1.5 milestone Oct 1, 2015
@alexcrichton
Copy link
Member Author

The libs team discussed this during triage today and the decision was to stabilize.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Oct 25, 2015
This commit stabilizes and deprecates library APIs whose FCP has closed in the
last cycle, specifically:

Stabilized APIs:

* `fs::canonicalize`
* `Path::{metadata, symlink_metadata, canonicalize, read_link, read_dir, exists,
   is_file, is_dir}` - all moved to inherent methods from the `PathExt` trait.
* `Formatter::fill`
* `Formatter::width`
* `Formatter::precision`
* `Formatter::sign_plus`
* `Formatter::sign_minus`
* `Formatter::alternate`
* `Formatter::sign_aware_zero_pad`
* `string::ParseError`
* `Utf8Error::valid_up_to`
* `Iterator::{cmp, partial_cmp, eq, ne, lt, le, gt, ge}`
* `<[T]>::split_{first,last}{,_mut}`
* `Condvar::wait_timeout` - note that `wait_timeout_ms` is not yet deprecated
  but will be once 1.5 is released.
* `str::{R,}MatchIndices`
* `str::{r,}match_indices`
* `char::from_u32_unchecked`
* `VecDeque::insert`
* `VecDeque::shrink_to_fit`
* `VecDeque::as_slices`
* `VecDeque::as_mut_slices`
* `VecDeque::swap_remove_front` - (renamed from `swap_front_remove`)
* `VecDeque::swap_remove_back` - (renamed from `swap_back_remove`)
* `Vec::resize`
* `str::slice_mut_unchecked`
* `FileTypeExt`
* `FileTypeExt::{is_block_device, is_char_device, is_fifo, is_socket}`
* `BinaryHeap::from` - `from_vec` deprecated in favor of this
* `BinaryHeap::into_vec` - plus a `Into` impl
* `BinaryHeap::into_sorted_vec`

Deprecated APIs

* `slice::ref_slice`
* `slice::mut_ref_slice`
* `iter::{range_inclusive, RangeInclusive}`
* `std::dynamic_lib`

Closes rust-lang#27706
Closes rust-lang#27725
cc rust-lang#27726 (align not stabilized yet)
Closes rust-lang#27734
Closes rust-lang#27737
Closes rust-lang#27742
Closes rust-lang#27743
Closes rust-lang#27772
Closes rust-lang#27774
Closes rust-lang#27777
Closes rust-lang#27781
cc rust-lang#27788 (a few remaining methods though)
Closes rust-lang#27790
Closes rust-lang#27793
Closes rust-lang#27796
Closes rust-lang#27810
cc rust-lang#28147 (not all parts stabilized)
bors added a commit that referenced this issue Oct 25, 2015
This commit stabilizes and deprecates library APIs whose FCP has closed in the
last cycle, specifically:

Stabilized APIs:

* `fs::canonicalize`
* `Path::{metadata, symlink_metadata, canonicalize, read_link, read_dir, exists,
   is_file, is_dir}` - all moved to inherent methods from the `PathExt` trait.
* `Formatter::fill`
* `Formatter::width`
* `Formatter::precision`
* `Formatter::sign_plus`
* `Formatter::sign_minus`
* `Formatter::alternate`
* `Formatter::sign_aware_zero_pad`
* `string::ParseError`
* `Utf8Error::valid_up_to`
* `Iterator::{cmp, partial_cmp, eq, ne, lt, le, gt, ge}`
* `<[T]>::split_{first,last}{,_mut}`
* `Condvar::wait_timeout` - note that `wait_timeout_ms` is not yet deprecated
  but will be once 1.5 is released.
* `str::{R,}MatchIndices`
* `str::{r,}match_indices`
* `char::from_u32_unchecked`
* `VecDeque::insert`
* `VecDeque::shrink_to_fit`
* `VecDeque::as_slices`
* `VecDeque::as_mut_slices`
* `VecDeque::swap_remove_front` - (renamed from `swap_front_remove`)
* `VecDeque::swap_remove_back` - (renamed from `swap_back_remove`)
* `Vec::resize`
* `str::slice_mut_unchecked`
* `FileTypeExt`
* `FileTypeExt::{is_block_device, is_char_device, is_fifo, is_socket}`
* `BinaryHeap::from` - `from_vec` deprecated in favor of this
* `BinaryHeap::into_vec` - plus a `Into` impl
* `BinaryHeap::into_sorted_vec`

Deprecated APIs

* `slice::ref_slice`
* `slice::mut_ref_slice`
* `iter::{range_inclusive, RangeInclusive}`
* `std::dynamic_lib`

Closes #27706
Closes #27725
cc #27726 (align not stabilized yet)
Closes #27734
Closes #27737
Closes #27742
Closes #27743
Closes #27772
Closes #27774
Closes #27777
Closes #27781
cc #27788 (a few remaining methods though)
Closes #27790
Closes #27793
Closes #27796
Closes #27810
cc #28147 (not all parts stabilized)
@dpc
Copy link
Contributor

dpc commented Nov 14, 2015

Docs say this is still unstable: https://doc.rust-lang.org/stable/std/fs/fn.canonicalize.html

@nagisa
Copy link
Member

nagisa commented Nov 14, 2015

These are stable docs. Afair stable is still 1.4.
On Nov 14, 2015 8:01 PM, "Dawid Ciężarkiewicz" notifications@github.com
wrote:

Docs say this is still unstable:
https://doc.rust-lang.org/stable/std/fs/fn.canonicalize.html


Reply to this email directly or view it on GitHub
#27706 (comment).

@steveklabnik
Copy link
Member

Yes, @nagisa is correct. They'll be stable in the next release.

@dpc
Copy link
Contributor

dpc commented Nov 14, 2015

My mistake, I was unaware how this process works. I thought if something is stable, it's already in stable release.

@steveklabnik
Copy link
Member

Check out http://blog.rust-lang.org/2014/10/30/Stability.html, which explains the train model for release processes. Everything lands on nightly, then makes its way through beta and then finally into a release. We're on step 2 of that process for this particular feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants