-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readlink on Windows cannot read app exec links #82015
Comments
The IO_REPARSE_TAG_APPEXECLINK was introduced for aliases to installed UWP apps that require activation before they can be executed. Currently these are in an unusual place as far as Python support goes - stat() fails where lstat() succeeds, but the lstat() result doesn't have the right flags to make is_link() return True. It's not clear whether we *should* treat these as regular symlinks, given there are a number of practical differences, but since we can enable it I'm going to post a PR anyway. |
Added an update with a new stat field (st_reparse_tag) so that we can tell them apart, which also means I can add a test for readlink. |
Symlinks are specially interpreted by the file API, I/O manager, and network redirector. So I think they should remain a separate category. readlink() and is_link() should be reserved for POSIX symlinks, i.e. only IO_REPARSE_TAG_SYMLINK. These app-exec reparse points are not name surrogates (i.e. 0x2000_0000 set in the tag), and they lack an associated filter driver that handles them in normal file operations. I/O syscalls that try to reparse the path fail with STATUS_IO_REPARSE_TAG_NOT_HANDLED. It's also messy to conflate different things in general for the sake of a particular case. For example, if code sees the app-exec link as a symlink because ntpath.is_link() is true, it may try to copy it via os.readlink() and os.symlink(). This creates a different type of reparse point, if the user even has the symlink privilege, which may not behave the same as the original app-exec link in a CreateProcessW call. (I don't know what the end goal is for the Windows team in terms of allowing app executables under "%ProgramFiles%\WindowsApps" to be executed directly.) How about adding a separate function such as nt.read_reparse_point() that's able to read reparse points that are documented (most types are opaque) and can be interpreted as a string, or a tuple of strings? The internal implementation could be shared with os.readlink. The current behavior of follow_symlinks in stat() is problematic. It should be limited to symlinks. So another idea that I think could prove generally useful is to extend stat() and lstat() with a follow_reparse_points=True parameter. If false, it would imply follow_symlinks is false. Explicitly passing follow_symlinks=True with follow_reparse_points=False would be an error. With follow_symlinks=False and follow_reparse_points=True, it would only open a symlink and follow all other types of reparse points. The stat result would gain an st_reparse_tag field (already added by your PR), which would be non-zero whenever a reparse point is opened. |
If we support reading junctions, this should be using the substitute name (with \??\ replaced by \\?\) instead of the print name. For both symlinks and junctions, [MS-FSCC] 2.1.2 states that the print name SHOULD be informative, not that it MUST be, where SHOULD and MUST are defined by RFC2119. The print name can be non-informative for some reason (e.g. maybe to use the whole 16 KiB buffer for the target path.) For symlinks that's not normally observed, since CreateSymbolicLinkW is conservative. For junctions, applications and frameworks use a low-level IOCTL to set them, and they're not necessarily bothering to set a print name. For example, PowerShell omits it:
This mount point works fine despite the lack of a print name. |
Thanks Eryk for your valuable response :)
I'm okay with that reasoning. Honestly, the only real problem I've seen is in applications that use a reimplementation of spawn() rather than the UCRT's version (which handles the reparse point jsut fine).
Also true, and likely to be a cause of more problems. But not ours to fix :)
Maybe it can just return bytes and let the caller do the parsing?
This sounds like a good option to me, too. So that would suggest that Modules/posixmodule.c in win32_xstat_impl would verify both FILE_ATTRIBUTE_REPARSE_POINT and IO_REPARSE_TAG_SYMLINK? And if the tag is different it'll return information about the reparse point rather than the target?
I agree this can stay. We don't need the rest of the logic here - callers who care to follow non-symlink reparse points can use the new nt.read_reparse_point() function to do it. |
I looked into this spawn problem. It's due to Cygwin's spawnve, which calls NtOpenFile to open the file, and then memory-maps it and reads the image header [1]. For an app-exec link, this fails with STATUS_IO_REPARSE_TAG_NOT_HANDLED, which is translated to Windows ERROR_CANT_ACCESS_FILE (1920). In turn, Cygwin translates this error to its to default errno value, EACCES. (The Windows CRT uses EINVAL as the default.)
It could parse out as much as possible and return a struct sequence:
Only the ReparseTag and DataBuffer fields would always be present. ReparseGuid would be present for non-Microsoft tags (i.e. bit 31 is unset). The result could maybe have one or more fields for the app-exec-link type.
Maybe have two non-overlapping options, follow_symlinks and follow_nonlinks. The latter might be more sensible as a single option for lstat(), since it never follows symlinks. Then if either follow_symlinks or follow_nonlinks is false, stat has to open the reparse point and get the tag to determine whether it should reopen with reparsing enabled. The straight-forward way to get the tag is GetFileInformationByHandleEx: FileAttributeTagInfo. This should succeed if the file system supports reparse points. But still check for FILE_ATTRIBUTE_REPARSE_POINT in the attributes before trusting the tag value. If the call fails, assume it's not a reparse point and proceed to GetFileInformationByHandleW. Another tag to look for is IO_REPARSE_TAG_AF_UNIX (0x8000_0023). It's relatively new, so I haven't used it yet. I suppose it should be mapped S_IFSOCK in the stat result. |
Great, that's roughly what I suspected. Unfortunately, I've been told that looking into Cygwin's (GPL'd) code myself is going to cost me two weeks of dev work ("cooling off" period), so someone else will have to help them fix it.
TBH, I feel like that's more work than is worth us doing for something that will be relatively rarely used, will live in the stdlib, and is obviously something that will become outdated as Microsoft adds new reparse points. If we return the parsed data as opaque bytes then someone can write a simple PyPI package to extract the contents. (Presumably the reparse tag will have come from stat() already.)
I read this suggestion the first time and I think it would send the message that we are more capable than we really are :) In theory, we can't follow any reparse point that isn't documented as being followable and provides the target name is a stable, documented manner. The appexec links don't do this (I just looked at the returned buffer), so we really should just not follow them. They exist solely so that CreateProcess internally returns a unique error code that can be handled without impacting regular process start, which means we *don't* want to follow them. Again, if someone writes the PyPI package to parse all known reparse points, they can do that. Now, directory junctions are far more interesting. My gut feel is that we should treat them the same as symlinks (with respect to stat vs. lstat) for consistency, but I'll gladly defer to you (Eryk) for the edge cases that may introduce. --- My plan for PR 15231 is:
|
GetFinalPathName() does this conversion for us, any reason not to use that? (GetFullPathName() doesn't seem to recognize the \??\ prefix and will prepend the current drive) |
Latest PR update uses GetFinalPathName to resolve SubstituteName, returning it unmodified on failure (e.g. symlink where the target file no longer exists). Replacing "\??\" with "\\?\" in place is trivial though, as we start with a mutable buffer. I'm just not clear that it's as simple as that, though. |
Junctions (NT 5) and symlinks (NT 6) are stable. So if os.read_reparse_point only returns the unparsed bytes, maybe add os.read_junction as well. I know other projects overload this on POSIX readlink. They're both name-surrogate reparse points, but they have different constraints and behavior. The I/O manager tries to make a junction behave something like a hard link to a directory, with the addition of being able to link across local volumes. This is in turn relates to how it evaluates relative symbolic links. For example, if "C:/Junction" and "C:/Symlink" both target r"\\?\C:\Temp1\Temp2", and there's a relative symlink "C:/Temp1/Temp2/foo_link" that targets r"..\foo", then "C:/Junction/foo_link" references "C:/foo" but "C:/Symlink/foo_link" references "C:/Temp1/foo". Another difference is with remote filesystems. SMB special cases symlinks to have the server send the reparse request over the wire to be evaluated on the client side. (Refer to [MS-SMB2] 2.2.2.1 Symbolic Link Error Response, and the subsequent section about client-side handling of this error.) So an absolute symlink on the server that targets r"\\?\C:\Windows" actually references the client's "C:/Windows" directory, whereas the same junction target would reference the server's "C:/Windows" directory. The symlink evaluation will succeed only if the client's R2L (remote to local) policy allows it. Symlinks can also target remote devices, depending on the L2R and R2R policy settings. Junctions are restricted to local devices.
To follow a reparse point, we're just calling CreateFileW the normal way, without FILE_FLAG_OPEN_REPARSE_POINT. The Windows API also does this (usually via NtOpenFile, but this has a similar FILE_OPEN_REPARSE_POINT option) for tags it doesn't handle. That's why MoveFileExW (os.rename and os.replace) fails on one of these app-exec links. In some cases, it adds a third open attempt if the reparse point isn't handled. This is important for DeleteFileW (os.remove) and RemoveDirectoryW (os.rmdir) because we should be able to delete a bad reparse point.
I know, so a regular stat() will fail. I think for an honest result, stat() should fail for a reparse point that can't be handled. Scripts can use stat(path, follow_nonlinks=False) or stat(path, follow_reparse_points=False), or however this eventually gets parameterized to force opening all reparse points.
Junctions are their own thing. They're mount points that behave like Unix volume mounts (in Windows, target the root directory of a volume device named by its "Volume{...}" name) or Unix bind mounts (in Windows, target arbitrary directories on any local volume; in Linux it's a mount created with --bind or FUSE bindfs). Bind-like junctions are also similar to DOS subst drives (e.g. "W:" -> "C:/Windows") and UNC shares. These are all mount points of one sort or another. OTOH, the base device names such as "//?/C:" and "//?/Volume{...}", without a specified root directory, are aliases (object symlinks) for an NT device such as r"\Device\HarddiskVolume2". These paths open the volume itself, not the mounted filesystem, so they're not like Unix mount points. They're like Unix '/dev/sda1' device paths, except in Unix, devices don't have their own namespaces, so it would be nonsense to open "/dev/sda1/". RemoveDirectoryW for a volume mount is special cased to call DeleteVolumeMountPointW, which notifies the mount-point manager. It won't do this for a junction that targets the same volume root directory via the DOS drive-letter name -- or any other device alias for that matter (e.g. Windows 10 creates "\\?\BootPartition" as an alternative named for the system "C:" drive). So bind-like mounts are different from volume mounts, but both are different from symlinks. |
If the path starts with "\\??\\" we can just change the first question mark to a backslash. For a symlink, fail at this step if Flags includes SYMLINK_FLAG_RELATIVE, which would be inconsistent data. "\\??\\" is the NT prefix for the caller's local (logon session) device directory. This implicitly shadows the global device directory, "\\Global??". We don't use NT's real "\\Device" directory in Windows, but it's available as "//?/GlobalRoot/Device" or "//?/Global/GlobalRoot/Device". "\\??\\" shouldn't be used directly in Windows programming, since GetFullPathNameW (often implicitly called) doesn't recognize it, but some API functions will pass it along, even though they should really fail the call. It will work until it doesn't, and by then we could have a right mess. If a path starts with exactly "\\\\?\\" (backslash only), Windows simply copies it verbatim, except for changing the prefix to "\\??\\". Other Windows device prefixes where the domain is "." or "?" can use any mix of forward slash and backslash because they get normalized first (e.g. "//?\\" -> "\\\\?\\"). Only an initial "\\\\?\\" bypasses the normalization step. |
I think we'll want bpo-9949 merged as well, so that ntpath.realpath() does its job. Certainly the tests would benefit from it. Until then, I think it makes sense for os.readlink() to handle the prefix and _getfinalpathname() call, but leave nt.readlink() as returning the raw value. |
os.readlink() shouldn't resolve the final path or realpath(). It should simply return the link substitute path, which starts with "\\\\?\\". I'm wary of trying to return it without the prefix. We would need a function that's shared with the proposed implementation of realpath() to determine whether the given path (not the final path) is safe to return as a normal DOS or UNC path. --- BTW, I looked into how CreateFileW is supporting "\\??\\". It turns out at some point they changed RtlpDosPathNameToRelativeNtPathName to look for both "\\\\?\\" and "\\??\\" when determining whether to skip normalizing the path. I think that's a bad idea since the Windows API doesn't consistently support the NT prefix. ReactOS is supposed to target NT 5.2 (Server 2003), and it doesn't allow the NT prefix here. Refer to its implementation of RtlpDosPathNameToRelativeNtPathName_Ustr [1]. It only looks for RtlpWin32NtRootSlash ("\\\\?\\"), not RtlpDosDevicesPrefix ("\\??\\"). So I suppose Windows changed this some time between Vista and Windows 10. |
Me too, but suddenly adding "\\?\" to the paths breaks a lot of assumptions.
My idea was to GetFinalPathName(path[4:])[4:] and if that fails, don't strip the prefix. Which is obviously not be perfect, but since we're not going to add a check for the LongPathsEnabled flag (let alone any of the other edge cases), we can't easily figure out whether it's safe to return manually. I really want a fix for this in 3.8, or else os.stat(sys.executable) may fail, but I don't think changing the result of readlink() is okay at this stage. Maybe I'll leave that out for now and just take the st_reparse_tag and stat() changes? |
I agree, but Python can support this without handling junctions as symlinks or limiting the reparse points that we can follow. There are reparse points for offline storage and projected file systems (e.g. VFS for Git) that should normally be traversed, and to that I would add junctions. stat() in Unix always opens a mounted directory, not the underlying directory of a mount point. Windows Python should try to be consistent with Unix where doing so is reasonable. Given the only option here is follow_symlinks, then the first CreateFileW call in win32_xstat_impl should only open a reparse point if follow_symlinks is false. In this case, if it happens to open a reparse point that's not a symlink, it should try to reopen with reparsing enabled. In either case, if reparsing fails because a reparse point isn't handled (ERROR_CANT_ACCESS_FILE), try to open the reparse point itself. The latter would be the second open attempt if follow_symlinks is true and the third open attempt if follow_symlinks is false. If a reparse point isn't handled, then there's nothing in principle that stat() could ever follow. Given that it's impractical to fail in this case, considering how much code would have to be modified, then the above compromise should suffice. I checked RtlNtStatusToDosError over the range 0xC000_0000 - 0xC0ED_FFFF. (In ntstatus.h, FACILITY_MAXIMUM_VALUE is 0xED, so there's no point in checking facilities 0x0EF-0xFFF.) Only STATUS_IO_REPARSE_TAG_NOT_HANDLED maps to ERROR_CANT_ACCESS_FILE, so we don't have to worry about unrelated failures using this error code. This is separate from an invalid reparse point (ERROR_INVALID_REPARSE_DATA) or a reparse point that can't be resolved (ERROR_CANT_RESOLVE_FILENAME), which should still be errors that fail the call. |
The unwritten assumption has been that readlink() is reading symlinks that get created by CreateSymbolicLinkW, which sets the print name as the DOS path that's passed to the call. In this case, readlink() can rely on the print name being the intended DOS path. I raised a concern in the case of reading junctions. There's no high-level API to create junctions, so we can't assume the print name is reliable. PowerShell's new-item doesn't even set a print name for junctions. That symlinks also are valid without a print name (in principle; I haven't come across it practice) lends additional weight to always using the substitute name. Even if we have the DOS path, resolving paths manually is still complicated if it's a relative symlink with a reserved name (DOS device; trailing dots or spaces) as the final component or if it's a long path. Reparsing a relative symlink in the kernel doesn't reserve such names and there's no MAX_PATH limit in the kernel. So using readlink() is tricky. Fortunately realpath() in Windows doesn't require a resolve loop based on readlink(). The kernel almost always knows the final path of an opened file, and we can walk the components from the end until we find one that exists.
An existing file named "spam" would be a false positive for a link that targets "spam.". The internal CreateFileW call would open "spam". Also, symlinks allow remote paths, and this doesn't handle "\\\\?\\UNC" paths. More generally, a link target doesn't have to exist, so being able to access it shouldn't be a factor. I see it's also returning the result from _getfinalpathname. readlink() doesn't resolve a final, solid path. It just returns the contents of a link, which can be a relative or absolute path. In the proposed implementation of realpath() that I helped on for bpo-14094 (I wasn't aware of the previous work in bpo-9949), there's an _extended_to_normal function that tries to return a normal path if possible. The length of the normal path has to be less than MAX_PATH, and _getfullpathname should return the path unchanged. GetFullPathNameW is just rule-based processing in user mode; it doesn't touch the file system. I wish we could remove the MAX_PATH limit in this case. I think at startup we should try to call RtlAreLongPathsEnabled, even though it's not documented, and set a sys flag to indicate whether we can use long paths. Also, support a -X option and an environment variable to override automatic detection. |
This sounds like reasonable logic. I'll take a look at updating my PR.
I totally forgot about this issue and the PR (but it looks like the contributor did too). I just posted PR 15287 today with the tests from the patch on bpo-9949 and it's looking okay - certainly an improvement over the current behaviour. But it doesn't have the change to readlink() that would add the \\?\ prefix, which means it doesn't answer that question.
The problem is that we have to remove the limit in any case where the resulting path might be used, which is what we're already trying to encourage by supporting long paths. Perhaps the best we can do is an additional test where we GetFinalPathName, strip the prefix, reopen the file, GetFinalPathName again and if they match then return it without the prefix. That should handle the both long path settings as transparently as we can. |
I tried this and the downside (other than hitting the filesystem a few more times) is that any unresolvable path is going to come back with the prefix - e.g. symlink cycles and dangling links. Maybe that's okay? The paths are going to be as valid as they can get (that is, unusable :) ) and it at least means that long paths and deliberately unnormalized paths. Posted an update to PR 15287 with that change. |
And I just posted an update to PR 15231 with essentially a rewrite of stat() on Windows. Should be better than it was :) |
Maybe it's better to ignore the MAX_PATH limit and let processes fail hard if they lack long-path support. A known and expected exception is better than unpredictable behavior (see the next paragraph for an example). That leaves the problem of a final component that's a reserved name, i.e. a DOS device name or a name with trailing dots or spaces. We have no choice but to return this case as an extended path. The intersection of this problem with SetCurrentDirectoryW (os.chdir) troubles me. Without long-path support, the current-directory buffer in the process parameters is hard limited to MAX_PATH, and passing SetCurrentDirectoryW an extended path can't work around this. Fair enough. But it still accepts a device path as the current directory, even though the docs do not explicitly allow it, and the implementation assumes it's disallowed. The combination is an ugly bug: >>> os.chdir('//./C:/Temp')
>>> os.getcwd()
'\\\\.\\C:\\Temp'
>>> os.path._getfullpathname('/spam/eggs')
'\\\\spam\\eggs'
>>> os.chdir('//?/C:/Temp')
>>> os.getcwd()
'\\\\?\\C:\\Temp'
>>> os.path._getfullpathname('/spam/eggs')
'\\\\spam\\eggs' In order to resolve a rooted path such as "/spam/eggs", the runtime library needs to be able to figure out the current drive from the current directory. It checks for a UNC path and otherwise assumes it's a DOS drive, since it's assuming device paths aren't allowed. It ends up assuming the current directory is a DOS drive and grabs the first two characters as the drive name, which is "\\\\". Then when joining the rooted path to this 'drive', the initial slash or backslash of the rooted path gets collapsed into the preceding backslash. The result is at best a broken path, and at worst an unrelated UNC path that exists. I think os.chdir should raise an exception when passed a device path. In explanation, we can point to the documentation of SetCurrentDirectoryW, which explicitly states the following:
I assume you're talking about realpath() here, toward the end where we're working with a solid path, or rather where we have at least the beginning part of the path as a solid path, up to the first component that's inaccessible. For the problem of reserved names, GetFullPathNameW is all we need. This doesn't address the MAX_PATH issue. But that either works or not. It's a user-mode issue. There's nothing to resolve in the kernel. If the path is too long, then CreateFileW will fail at RtlDosPathNameToRelativeNtPathName_U_WithStatus with STATUS_NAME_TOO_LONG, before making a single system call. |
Yes, and so are you :) Let's move that discussion to bpo-9949 and/or PR 15287.
When the OS starts returning an error code for this case, we can start raising an exception. It might be worth reporting these cases though, as you're right that they don't seem to be handled correctly. |
[Quoting from the PR comments]
I get your argument that junctions are not symlinks, but I disagree that we should try this hard to emulate POSIX semantics rather than letting the OS do its thing. We aren't reparsing anything ourselves (in the PR) - if the OS is configured to do something different because of a reparse point, we're simply going to respect that instead of trying to work around it. A user who has created a directory junction likely wants it to behave as if the directory is actually in that location. Similarly, a user who has created a directory symlink likely wants it to behave as if it were in that location. Powershell treats both the same for high-level operations - the LinkType attribute is the only way to tell them apart (mirrored in the st_reparse_tag field).
The premise here is not true - we've opened the reparse point to get the file attributes. The only reason we look at the reparse tag at all is to raise an error if the user requested traversal and despite that, we ended up at a link, and I'm becoming less convinced that should be an error anyway (this is different from nt.readlink() and ntpath.realpath(), of course, where we want to read the link and return where it points). nt.stat() is trying to read the file attributes, and if they are not accessible then raising is the correct behaviour, so I don't see why we should try any harder than the OS here. |
Unless your point is that we should _always_ traverse junctions? In which case we have a traverse 'upgrade' scenario (calls to lstat() become calls to stat() when we find out it's a junction). Again, not sure why we'd want to hide the ability to manipulate the junction itself from Python users, except to emulate POSIX. And I'd imagine anyone using lstat() is doing it deliberately to manipulate the link and would prefer we didn't force them to add Windows-specific code that's even more complex. |
If we've opened the reparse point to test IO_REPARSE_TAG_SYMLINK, and that's not the case, then we need to reopen with reparsing enabled. This is exactly what Windows API functions do in order to implement particular behavior for just symlinks or just mountpoints. For example, if we've opened an HSM reparse point, we must reopen to let the file-system filter driver implement its semantics to replace the reparse point with the real file from auxiliary storage and complete the request. That is the stat() result I want when I say stat(filename, follow_symlinks=False) or lstat(filename), because this file is not a symlink. It's implicitly just the file to end users -- despite whatever backend tricks are being played in the kernel to implement other behavior such as HSM. Conflating this with a symlink is not right. Lies catch up with us. We can't copy it as link via os.symlink and os.readlink, and it doesn't get treated like a symlink in API functions. If you want to add an "open reparse point" parameter, that would make sense. It's of some use to get the tag and implement particular behavior for types of reparse points, and particularly for name surrogates, which includes mount points (junctions). As to mount points, yes, I do think we should always traverse them. Please see my extended comment and the follow-up example on GitHub.
A mount point is not a link. ismount() and islink() can never both be true. Also, a POSIX symlink can never be a directory, which is why we make stat() pretend directory symlinks aren't directories. If the user wants a link, they can use a symlink that's created by os.symlink, mklink, new-item -type SymbolicLink, etc. |
Okay, I get it now. So we _do_ want to "upgrade" lstat() to stat() when it's not a symlink.
I don't want to add any parameters - I want to have predictable and reasonable default behaviour. os.readlink() already exists for "open reparse point" behaviour. The discussion is only about what os.lstat() returns when you pass in a path to a junction.
I'm still not convinced that this is what we want to do. I don't have a true Linux machine handy to try it out (Python 3.6 and 3.7 on WSL behave exactly like the semantics I'm proposing, but that may just be because it's the Windows kernel below it).
ismount() is currently not true for junctions. And I can't find any reference that says that POSIX symlinks can't point to directories, nor any evidence that we suppress symlink-to-directory creation or resolution in Python (also tested on WSL).. |
Except this bug came about because we want to _downgrade_ stat() to lstat() when it's an appexeclink, because the whole point of those is to use them without following them (and yeah, most operations are going to fail, but they'd fail against the target file too). So we have this logic: def xstat(path, traverse):
f = open(path, flags | (0 if traverse else OPEN_REPARSE_POINT))
if !f:
# Special case for appexeclink
if traverse and ERROR_CANT_OPEN_FILE:
st = xstat(path, !traverse)
if st.reparse_tag == APPEXECLINC:
return st
raise ERROR_CANT_OPEN_FILE
# Handle "likely" errors
if ERROR_ACCESS_DENIED or SHARING_VIOLATION:
st = read_from_dir(os.path.split(path))
else:
st = read_from_file(f)
# Always make the OS resolve "unknown" reparse points
ALLOWED_TO_TRAVERSE = {SYMLINK, MOUNT_POINT}
if !traverse and st.reparse_tag not in ALLOWED_TO_TRAVERSE:
return xstat(path, !traverse)
return st And the open question is just whether MOUNT_POINT should be in that set near the end. I believe it should, since the alternative is to force all Python developers to write special Windows-only code to handle directory junctions. |
But that's only required if lstat() traverses junctions, which I'm not proposing to do.
Yeah, ultimately it looks like it'll be up to me, which is why I want as much of your input as I can get before having to make a call :) I'm leaning towards "if the OS follows it by default, stat() will follow it", and (given some of your later comments), "when ERROR_CANT_ACCESS_FILE occurs on a reparse point (is that the only scenario?) behave like lstat()". (And I carefully phrased that to not imply that Python is going to partially follow a link chain for you if the eventual end result is not valid. e.g. stat() on a symlink to an appexeclink will return the symlink, because ERROR_CANT_ACCESS_FILE was the result. Then the suggestion will be "if stat() returns a link, use os.path.realpath() if you want to resolve it as far as possible".)
I don't think it's that complicated:
This isn't difficult to handle specifically if we want to, though, since the stat result now includes the reparse tag. And you're right, we probably have to handle it.
I assumed it was intentional as well, though it'll almost certainly be based on pragmatically getting things to work (e.g. the '/mnt/c/Document and Settings' junction... though now I try it that those don't actually work...)
I think the intent is that it's mounted the root of another file system. There was discussion of just using the reparse tag in bpo-9035, but it looks like from msg138197 that returning True for regular directory links was not the intent (despite the tests in that message being insufficient).
Ah, the future possibilities of getting that one field added in this PR :) Let's save other enhancements for another PR - especially when they won't be impacted by this one.
That's easily fixed.
This is easy - just remove the "call again with traverse=TRUE" - and provided we don't set S_IFLNK in this case then we are being "correct", right? (Nobody has yet defined whether S_IFLNK is required for st_reparse_tag to be set.)
nt.lstat().st_reparse_tag and the constants in the stat module are sufficient for this, right?
This entire discussion is because POSIX can't fully describe Windows filesystems :) The two goals need to be making the "most consistent" behaviour the default, while enabling special cases to be specialised. So I care less about the correct information being reported than it being acted upon in the correct way - if that means lying about junctions and mount points being links and not directories (since POSIX apparently doesn't support setting both flags?), then let's lie about it. lstat().st_reparse_tag will reveal whether it's a junction, and ismount() whether it's linking to a different device. |
That was a long set of replies, so here's my summary proposed behaviour: (Where "links" are the generic term for the set that includes "reparse point", "symlink", "mount point", "junction", etc.) os.lstat(path): returns attributes for path without traversing any kind of links os.stat(path): traverses any links supported by the OS and returns attributes for the final file
os.exists(path): now returns True for links that exist but the OS reports are not followable - previously returned False (links that are followable but the target does not exist continue to return False) os.path.is_dir(path): now returns False for directory links where the target does not exist os.unlink(path): unchanged (still removes the junction, not the contents) os.scandir(path): DirEntry.is_symlink() and DirEntry.is_dir() will now both be True for junctions (this was always the case for symlinks to directories). DirEntry.stat(follow_symlinks=False).st_reparse_tag == stat.IO_REPARSE_TAG_MOUNT_POINT is how to specifically detect a junction. shutil.copytree(path): Unchanged. (requires a minor fix to continue to recursively copy through junctions (using above test), but not symlinks.) shutil.rmtree(path): Will now remove a junction rather than recursively deleting its contents (net improvement, IMHO) And as I said at the end of the long post, my main goals are:
The most impactful change would seem to be the one to DirEntry, in that users may now treat junction points more like symlinks than real directories depending on how they've set up their checks. But for the other benefits - particularly the more reliable exists() checks - I think this is worth it overall. |
Why group all reparse points under the banner of 'link'? If we have a typical HSM reparse point (the vast majority of allocated tags), then all operations, such as delete and rename, act on the file itself, not simply the reparse point. We should be able to delete or rename a link without affecting the target. In this case, there's also no chance that the reparse point is a surrogate for another path on the system, so code that walks paths doesn't have to worry about loops with regard to these reparse points. The only practical use case I can think of for detecting/opening this type of reparse point is backup software that should avoid triggering an HSM recall. For example: https://www.ibm.com/support/knowledgecenter/en/SSEQVQ_8.1.0/client/r_opt_hsmreparsetag.html As I've previously suggested (and this is the last time because I'm becoming a broken record), lstat() should at least be restricted to opening only name-surrogate reparse points that are supposed to be like links in that they target another path in the system. Plus it also has to open unhandled reparse points. Personally, I'm only comfortable with opening it up to name surrogates if islink() and readlink() are still limited to just Unix-like symlinks that we can create via symlink(). Nothing changes there. It's just a restriction of how lstat() currently works. The addition of the reparse tag in the stat result enables special handling of non-symlink surrogates.
Everyone else who relies on islink(), readlink(), and symlink() to copy symlinks isn't special casing their code to look for junctions or anything else we lump under the banner of islink(). They could code defensively if readlink() fails for a 'link' that we can't read. But that leaves the problem of readlink() succeeding for a junction. That can causes problems if the target is passed to os.symlink(), which changes the link from a hard name grafting to a soft name grafting. Why would we need to read the target of a junction? It's not needed for realpath() in Windows. We should only have to resolve symlinks. For example:
We don't have to resolve this as "Z:/bar/baz/spam/eggs", and doing so may even be wrong for someone using it to manually resolve a relative symlink. "C:/Mount/junction/spam/eggs" is a solid path. In Unix it would not be resolved by realpath(). A solid path is needed to figure out how to create a relative symlink, or how to manually resolve one for a given path. For example, if "foo_link" in "C:/Mount/junction/spam/eggs" targets "../../../foo", this refers to "C:/Mount/foo". On the other hand, if the junction mount point were replaced by a soft symlink, then "C:/Mount/symlink/spam/eggs" is not a solid path. "foo_link" is instead evaluated over the target path: "Z:/bar/baz/spam/eggs/foo_link", so the link resolves to "Z:/bar/foo". IMO, S_IFLNK need not be set for anything other than Unix-like symbolic links. We would just need to document that on Windows, lstat opens any link-like reparse point that indicates it targets another path on the system, plus any reparse point that's not handled, but that islink() is only true for actual Unix symlinks that can be created via os.symlink() and read via os.readlink(). This preserves how islink() and readlink() currently work, while still leaving the door open to fix misbehavior in particular cases. Code, including our own code, that needs to look for the broader Windows category of "name surrogate" can examine the reparse tag. For convenience we can provide issurrogate() that checks lstat(filename).st_reparse_tag & 0x2000_0000. This can be true for directories. Also, a surrogate doesn't have to behave like a Unix "soft" symlink, i.e. it applies to "hard" mount points. In Unix, issurrogate() could just be an alias for islink() since Unix provides only one type of name surrogate. Currently the name surrogate category includes the following tags:
IO_REPARSE_TAG_OSR_SAMPLE is used by OSR sample code in their Windows driver curriculum, so that one is unlikely to be seen in practice. I don't know anything about the other non-Microsoft tags. NVidia's UnionFS looks interesting. Using reparse points to merge file systems is probably not the most efficient way to handle that problem, but I'm sure the devil is in the details there.
Whatever we're calling a link should be capable of being deleted via os.unlink. If we apply S_IFLNK, then it won't have S_IFDIR (at least how POSIX code expects it), and unlink should work on it. The current state of affairs in which unlink/remove works on a junction, which is reported by stat() as a directory, is inconsistent. It's not specified to remove directories, so nothing that it can remove should be a directory.
I'd like for it to remove all name-surrogate directories like CMD's |
The security on compatibility junctions denies everyone read-data (list) access, but in Windows they can still be traversed (e.g. "C:/Documents and Settings/Public") because execute (traverse) access isn't denied, and even if it were denied, standard Windows users have SeChangeNotifyPrivilege to bypass traverse checking. I created a test junction named "eggs" that targets a directory named "spam" that has "spam/foo" subdirectory. I set the junction's security to match that of "Documents and Settings". In WSL, trying to traverse it to stat the "foo" subdirectory failed with EACCES, just as with "Documents and Settings/Public". After removing the entry that denies read-data access, it worked fine. There's no problem traversing "spam" directly if I set the same security on it that denies everyone read-data access; it only prevents listing the directory. It seems that in order to evaluate an NT junction, drvfs tries to open it with read-data access. I don't see why it would have to do that. |
Because for the purposes of the list of changes beneath it, there wasn't any difference (e.g. "traverses any links supported by the OS" is more meaningful to most people, even though both of us would understand it to mean "traverses any traversable reparse points supported by the OS"). I'm not redefining them all to be the same thing, just establishing the terminology for what immediately followed.
Apologies for causing the repetition. Let me summarise what I believe you're suggesting as an alternate flow (bearing in mind that only os.stat() and os.lstat() are affected here): os.lstat:
os.stat:
If you can fill in the gaps, that will help me understand exactly what you're proposing.
Right, but is that because they deliberately want the junction to be treated like a file? Or because they want it to be treated like the directory is really right there? os.rmdir() already does special things to behave like a junction rather than the real directory, and the islink/readlink/symlink process is going to be problematic on Windows since most users can't create symlinks. That code simply isn't going to be portable. But code that is using stat() and expecting to get the real directory ought to work, just as code using lstat() and expecting to get the link if it's been linked somehow ought to work.
I think I understand your reasoning here now, sorry for it taking so long.
I'm not adding new API, even for internal use. This is edge case enough that os.lstat() is fine for it.
I'm proposing to fix the inconsistency by fixing the flags. Your proposal is to fix the inconsistency by generating a new error in unlink()? (Just clarifying.)
Currently Windows shutil.rmtree traverses into junctions and deletes everything, though it then succeeds to delete the junction. With my change, rmtree() directly on a junction now raises (could be fixed?) but rmtree on a directory containing a junction will remove the junction without touching the target directory. So I think we're both happy about this one. |
Here's the requested overview for the case where name-surrogate reparse points are handled as winlinks (Windows links), but S_IFLNK is reserved for IO_REPARSE_TAG_SYMLINK. I took the time this afternoon to write it up in C, which hopefully is clearer than my prose. It handles all CreateFileW failures inline, but uses a recursive call to traverse a non-link. No reparse tag values are hard coded in win32_xstat_impl. I extended it to support devices that aren't file systems, such as "con", disk volumes, and raw disks. This enhancement was requested on another issue, but it may as well get updated in this issue if win32_xstat_impl is getting overhauled. Some of these devices are already supported by fstat(). The latter can be extended similarly to support volume devices and raw disk devices. I opted to fail the call in the unhandled-tag case if it opens a link that should be traversed. Either it's an unhandled link, which is an unacceptable condition (i.e. it should be a sink, not a link), or it's a link or sequence of links to an unhandled reparse point. Returning the link reparse-point data is not what the caller wants from a successful stat(). --- static int
win32_xstat_impl(const wchar_t *path, struct _Py_stat_struct *result,
BOOL traverse)
{
HANDLE hFile;
BY_HANDLE_FILE_INFORMATION fileInfo;
FILE_ATTRIBUTE_TAG_INFO tagInfo = { 0 };
DWORD fileType, error;
BOOL isUnhandledTag = FALSE;
int retval = 0;
DWORD access = FILE_READ_ATTRIBUTES;
DWORD flags = FILE_FLAG_BACKUP_SEMANTICS; /* Allow opening directories. */
if (!traverse) {
flags |= FILE_FLAG_OPEN_REPARSE_POINT;
}
hFile = CreateFileW(path, access, 0, NULL, OPEN_EXISTING, flags, NULL);
if (hFile == INVALID_HANDLE_VALUE) {
/* Either the path doesn't exist, or the caller lacks access. */
error = GetLastError();
switch (error) {
case ERROR_ACCESS_DENIED: /* Cannot sync or read attributes. */
case ERROR_SHARING_VIOLATION: /* It's a paging file. */
/* Try reading the parent directory. */
if (!attributes_from_dir(path, &fileInfo, &tagInfo.ReparseTag)) {
/* Cannot read the parent directory. */
SetLastError(error);
return -1;
}
if (fileInfo.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) {
if (traverse ||
!IsReparseTagNameSurrogate(tagInfo.ReparseTag)) {
/* The stat call has to traverse but cannot, so fail. */
SetLastError(error);
return -1;
}
}
break;
case ERROR_INVALID_PARAMETER:
/* \\.\con requires read or write access. */
hFile = CreateFileW(path, access | GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
OPEN_EXISTING, flags, NULL);
if (hFile == INVALID_HANDLE_VALUE) {
SetLastError(error);
return -1;
}
break;
case ERROR_CANT_ACCESS_FILE:
/* bpo37834: open unhandled reparse points if traverse fails. */
if (traverse) {
traverse = FALSE;
isUnhandledTag = TRUE;
hFile = CreateFileW(path, access, 0, NULL, OPEN_EXISTING,
flags | FILE_FLAG_OPEN_REPARSE_POINT, NULL);
}
if (hFile == INVALID_HANDLE_VALUE) {
SetLastError(error);
return -1;
}
break;
default:
return -1;
}
}
if (hFile != INVALID_HANDLE_VALUE) {
/* Query the reparse tag, and traverse a non-link. */
if (!traverse) {
if (!GetFileInformationByHandleEx(hFile, FileAttributeTagInfo,
&tagInfo, sizeof(tagInfo))) {
/* Allow devices that do no support FileAttributeTagInfo. */
error = GetLastError() ;
if (error == ERROR_INVALID_PARAMETER ||
error == ERROR_INVALID_FUNCTION ||
error == ERROR_NOT_SUPPORTED) {
tagInfo.FileAttributes = FILE_ATTRIBUTE_NORMAL;
tagInfo.ReparseTag = 0;
} else {
retval = -1;
goto cleanup;
}
} else if (tagInfo.FileAttributes &
FILE_ATTRIBUTE_REPARSE_POINT) {
if (IsReparseTagNameSurrogate(tagInfo.ReparseTag)) {
if (isUnhandledTag) {
/* Traversing previously failed for either this link
or its target. */
SetLastError(ERROR_CANT_ACCESS_FILE);
retval = -1;
goto cleanup;
}
/* Traverse a non-link, but not if traversing already failed
for an unhandled tag. */
} else if (!isUnhandledTag) {
CloseHandle(hFile);
return win32_xstat_impl(path, result, TRUE);
}
}
}
fileType = GetFileType(hFile);
if (fileType != FILE_TYPE_DISK) {
if (fileType == FILE_TYPE_UNKNOWN && GetLastError() != 0) {
retval = -1;
goto cleanup;
}
DWORD fileAttributes = GetFileAttributesW(path);
memset(result, 0, sizeof(*result));
if (fileAttributes != INVALID_FILE_ATTRIBUTES &&
fileAttributes & FILE_ATTRIBUTE_DIRECTORY) {
/* \\.\pipe\ or \\.\mailslot\ */
result->st_mode = _S_IFDIR;
} else if (fileType == FILE_TYPE_CHAR) {
/* \\.\nul */
result->st_mode = _S_IFCHR;
} else if (fileType == FILE_TYPE_PIPE) {
/* \\.\pipe\spam */
result->st_mode = _S_IFIFO;
}
/* FILE_TYPE_UNKNOWN, e.g. \\.\mailslot\waitfor.exe\spam */
goto cleanup;
}
if (!GetFileInformationByHandle(hFile, &fileInfo)) {
error = GetLastError();
if (error != ERROR_INVALID_PARAMETER &&
error != ERROR_INVALID_FUNCTION &&
error != ERROR_NOT_SUPPORTED) {
retval = -1;
goto cleanup;
}
/* Volumes and physical disks are block devices, e.g.
\\.\C: and \\.\PhysicalDrive0. */
memset(result, 0, sizeof(*result));
result->st_mode = 0x6000; /* S_IFBLK */
goto cleanup;
}
}
if (!(fileInfo.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
/* Fix the file execute permissions. This hack sets S_IEXEC if
the filename has an extension that is commonly used by files
that CreateProcessW can execute. A real implementation calls
GetSecurityInfo, OpenThreadToken/OpenProcessToken, and
AccessCheck to check for generic read, write, and execute
access. */
const wchar_t *fileExtension = wcsrchr(path, '.');
if (fileExtension) {
if (_wcsicmp(fileExtension, L".exe") == 0 ||
_wcsicmp(fileExtension, L".bat") == 0 ||
_wcsicmp(fileExtension, L".cmd") == 0 ||
_wcsicmp(fileExtension, L".com") == 0) {
result->st_mode |= 0111;
}
}
}
cleanup:
if (hFile != INVALID_HANDLE_VALUE) {
CloseHandle(hFile);
}
return retval;
} |
Here are two additional differences between mount points and symlinks: (1) A mount point in a remote path is always evaluated on the server and restricted to devices that are local to the server. So if we handle a mount point as if it's a POSIX symlink that works with readlink(), then what are we to do with the server's drive "Z:"? Genuine symlinks are evaluated on the client, so readlink() always makes sense. (Though if we resolve a symlink manually, then we're bypassing the system's R2L symlink policy.) (2) A mount point has its own security that's checked in addition to the security on the target directory when it's reparsed. In contrast, security set on a symlink is not checked when the link is reparsed, which is why icacls.exe implicitly resolves a symlink when setting and viewing security unless the /L option is used.
I wanted lstat in Windows to traverse mount points by default (but I gave up on this), as it does in Unix, because a mount point behaves like a hard name grafting in a path. This is important for relative symlinks that use ".." components to traverse above their parent directory. The result is different from a directory symlink that targets the same path. A counter-argument (in favor of winlinks) is that a mount point is still ultimately a name-surrogate reparse point, so, unlike a hard link, its existence doesn't prevent the directory from being deleted. It's left in place as a dangling link if the target is deleted or the device is removed from the system. Trying to follow it fails with ERROR_PATH_NOT_FOUND or ERROR_FILE_NOT_FOUND. Also, handling a mount point as a directory by default would require an additional parameter because in some cases we need to be able to open a junction instead of traversing it, such as to implement shutil.rmtree to behave like CMD's Another place identifying a mount point is required, unfortunately, is in realpath(). Ideally we would be able to handle mount points as just directories. The problem is that NT allows a mount point to target a symlink, something that's not allowed in Unix. Traversing the mount point is effectively the same as traversing the symlink. So we have to read the mount-point target, and if it's a symlink, we have to read and evaluate it. (Consequently it seems that getting the real path for a remote path is an intractable problem when mount points are involved. We can only get the final path.) --- Even without the addition of a new parameter, we may still want to limit the definition of 'link' in Windows lstat to name-surrogate reparse points, i.e. winlinks. Reparse points that aren't name surrogates don't behave like links. They behave like the file itself, and reparsing may automatically replace the reparse point with the real file. Some of them are even directories that have the directory bit (28) set in the tag value, which means they're allowed to contain other files. (Without the directory tag bit, setting a reparse point on a non-empty directory should fail.) The counter-argument to changing lstat to only open winlinks is that changing the meaning of 'link' in lstat is too disruptive to existing software that may depend on the old behavior, i.e. opening any reparse point. I think the use cases for opening non-links are rare enough that it's not beyond the pale to change this behavior in 3.8 or 3.9.
For copytree it makes sense to traverse a mount point as a directory. We can't reliably copy a mount point. In Unix, even when a volume mount or bind mount can be detected, there's no standard way to clone it to a new mount point, and even if there were, that would require super-user access. In Windows, we could wrap CreateDirectorExW, which can copy a mount point, but it requires administrator access to copy a volume mount point (i.e. "\\\\?\\Volume{...}\\"), for which it calls SetVolumeMountPointW in order to update the mount-point manager in the kernel. We also have a limited ability to create mount points via _winapi.CreateJunction, but it's buggy in corner cases and incomplete. It suffices for the reason it was added -- testing the ability to delete a junction via os.remove().
This is similar in spirit to Unix, except Unix refuses to delete a mount point. For example, if we have a Unix bind mount to a non-empty directory, rmdir() fails with EBUSY. On the other hand, rmdir() on the real directory fails with ENOTEMPTY. If Unix handled the mount point as if it's just the mounted directory, I'd expect the error to be the same. It's not particularly special in Windows unless it's a volume mount point. Then RemoveDirectoryW tries to call DeleteVolumeMountPointW. This could be a case where it would fail to remove a mount point, just like Unix. But the internal DeleteVolumeMountPointW call is allowed to fail if the caller doesn't have access to update the mount-point manager, in which case it removes the junction anyway. The consequence of failing to update the mount-point manager is that GetFinalPathNameByHandleW calls will subsequently return a non-existing path for a volume that was mounted only in the deleted folder (i.e. the volume isn't also assigned a drive letter). Thus we can't assume the result from GetFinalPathNameByHandleW exists. This just pertains to volume mount points, which are special to the mount-point manager because it uses them to translate a native device path into a canonical DOS path. Bind mount points have no special significance to the mount-point manager.
Then copying the symlink fails, which I think is better than silently transforming the behavior from a mount point to a symlink. Defensive code can fall back on physically copying the target file or directory. The latter is the default behavior for copytree. It's only an issue if code calls copytree(src, dst, symlinks=True). However, it's always a concern with shutil.move(), which attempts to move a file via os.rename. This fails for a cross-volume rename. Then if islink() is true, it falls back on os.symlink(os.readlink(src), real_dst) and os.unlink(src). (On my own systems, I grant the symlink privilege to the Authenticated Users group, which allows symlink creation by standard users and administrators -- elevated or not. But in general, a fear of symlinks is warranted, even in Unix.)
unlink() didn't used to remove junctions prior to 3.5 (see bpo-18314). Instead of rolling back the change, or conflating the meaning of S_IFLNK, a counter-proposal is to harmonize unlink with the proposed change to lstat, i.e. to allow removing all name-surrogate directories. A name-surrogate directory cannot have children in the directory itself, so allowing it for os.unlink is in the spirit of the function, even if doing so is inconsistent with the literal specification. This is documented in ntifs.h:
Regarding the directory bit, the registered tags with this bit are IO_REPARSE_TAG_CLOUD*, IO_REPARSE_TAG_WCI_1, and IO_REPARSE_TAG_PROJFS (for projected file systems).
That's like Unix mount-point behavior, except Windows allows a volume mount point to be deleted (not just a bind mount point), despite negative consequences to API functions such as GetFinalPathNameByHandleW if the user isn't allowed to update the system database of volume mount points. An issue here, and with all code that walks a tree (especially destructively), is the link behavior of mount points. Bind mount points have the same problem in both Unix and Windows. For example, shutil.rmtree will fail to remove a mount point that targets a directory that it already removed. It's a different OSError in Unix vs Windows (EBUSY vs ENOENT or ERROR_PATH_NOT_FOUND), but an error all the same. That in itself is not an argument to handle a junction as a symlink, because it's still a mount point that behaves as such, even if someone is using it as a symlink. However, it is an argument for special handling of winlinks, which would allow the Windows implementation to behave better than Unix, IMO, in addition to helping Windows users that are forced to use mount points instead of symlinks.
Changing rmtree to work on a target directory that claims to be a symlink would require special casing Windows in shutil.rmtree. But in general this is a problem that affects all code that looks for symlinks, not just code in the standard library. If the meaning of S_IFLNK remains the same, then existing code has the option of being upgraded to delete directory winlinks without traversing them, but nothing is forced on them. In this case, for example, we could wrap the os.scandir call: if not _WINDOWS:
_rmtree_unsafe_scandir = os.scandir
else:
import contextlib
def _rmtree_unsafe_scandir(path):
try:
st = os.lstat(path)
attr, tag = st.st_file_attributes, st.st_reparse_tag
except OSError:
attr = tag = 0
if (attr & stat.FILE_ATTRIBUTE_DIRECTORY
and attr & stat.FILE_ATTRIBUTE_REPARSE_POINT
and tag & 0x2000_0000): # IsReparseTagNameSurrogate
return contextlib.nullcontext([])
else:
return os.scandir(path) For a directory winlink, the above _rmtree_unsafe_scandir function returns a context manager that yields an empty list, so _rmtree_unsafe skips to os.rmdir(path). This reproduces the behavior of CMD's |
Thanks for the code snippet, that helped me a lot (and since you went to the trouble of fixing other bugs, I guess I'll have to merge it into my PR now). Any particular reason you did GetFileAttributesW(path) in the non-FILE_TYPE_DISK case when we have the hFile around? I'm trying to get one more opinion from a colleague on setting S_IFLNK for all name surrogate reparse points vs. only symlinks by default (the Python/fileutils.c change, and implicitly the fixes to Lib/shutil.py). I might try and get some broader opinions as well on whether "is_dir() is true, do you suspect it could be a junction" vs "is_link() is true, do you suspect it could be a junction", since that is what it really comes down to. (We need to make changes to shutil to match Explorer anyway - rmtree should not recurse, and copytree should.) However, the minimal change is to leave S_IFLNK only for symlinks, so unless I get a strong case for treating all name surrogates as links, I'll revert to that. |
The point of getting the file attributes is to identify the root directory of the namedpipe and mailslot file systems. For example, os.listdir('//./pipe') works because we append "\\*.*" to the path. GetFileInformationByHandle[Ex] forbids a pipe handle, for reasons that may no longer be relevant in Windows 10 (?). I remembered the restriction in the above case, but it seems I forgot about it when querying the tag. So the order of the GetFileInformationByHandleEx and GetFileType blocks actually needs to be flipped. That would be a net improvement anyway since there's no point in querying a reparse tag from a device that's not a file system (namedpipe and mailslot are 'file systems', but only at the most basic level). I can't imagine there being a problem with querying FileBasicInfo to get the file attributes. I just checked that it works fine with "//./pipe/" and "//./mailslot/" -- at least in Windows 10. Anyway, GetFileAttributesW uses a query-only open that doesn't create a real file object or even require an IRP usually, so it's not adding much cost compared to querying FileBasicInfo using the handle. |
I've done that now. And thanks for confirming. That was my suspicion, but I wasn't sure if you knew something here that I didn't (v. likely!). |
So my colleagues confirmed that they deliberately represent junction points as symlinks within WSL, including translating the target to the mounted location (assuming it is mounted) and letting the Linux code traverse it normally. They also said they haven't heard any feedback suggesting it causes any trouble. However, the more I've thought about the implications of islink() returning true for a junction, the more I've come around to Eryk's point of view, so I'm going to merge this as is (the PR currently only sets S_IFLNK for actual symlinks). That said, nt.readlink() will still be able to read the target of a junction, and code like I have in PR 15287 that uses readlink() to follow links will resolve them (noting that that implementation of realpath() attempts to let the OS follow it first). I think that covers the intended use cases best. The only updates left are a couple of docs here, but I'll finish up bpo-9949 first and rebase on those changes as well. |
Adding a small fix for the Win7 buildbots in PR 15377 |
That should be all the buildbot issues fixes, so I'm marking this resolved and will wait for the inevitable 3.8.0b4 feedback! |
There's a bug on macOS that is blocking the release regarding |
Unit tests didn't catch it since it fails on older macOS releases. |
One problem seems to be that the code added for this issue assumes that the documentation is correct in implying that the stat.FILE_ATTRIBUTE_* constants (like stat.FILE_ATTRIBUTE_REPARSE_POINT) are only present on Windows. But besides being conditionally created in _stat.c, they are also undconditionally defined in stat.py on all platforms. That makes some of the tests in shutil.py, like: |
... and the other important difference is that older versions of macOS do not support fd functions so _use_fd_functions is false and the alternate path is taken in shutil.rmtree, the path that calls _rm_tree_islink which fails. >>> shutil.rmtree('a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/shutil.py", line 718, in rmtree
if _rmtree_islink(path):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/shutil.py", line 564, in _rmtree_islink
(st.st_file_attributes & stat.FILE_ATTRIBUTE_REPARSE_POINT
AttributeError: 'os.stat_result' object has no attribute 'st_file_attributes' |
Huh, didn't realise those were always defined in stat.py. Changing the test to "hasattr(... "st_file_attributes")" should be fine. I can get to it in a couple of hours if nobody else gets there first. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: