-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-37609 - Support "UNC" and "GLOBAL" junctions in ntpath.splitdrive()
.
#31702
bpo-37609 - Support "UNC" and "GLOBAL" junctions in ntpath.splitdrive()
.
#31702
Conversation
…e()`. Co-authored-by: Eryk Sun <eryksun@gmail.com>
The UNC root must be exactly two separators. Other separators may be | ||
repeated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for picking this up, but, fair warning, it's going to need more work. If you're up for that, great. I'd love to get your opinions on what needs to be done.
Steve Dower introduced the idea of leveraging PathCchSkipRoot()
in Windows, and I like that idea, despite some reservations on my part. So I think splitdrive()
should try to conform with PathCchSkipRoot()
. As such, repetition of separators between the domain and junctions (e.g. UNC server and share) should be parsed as empty values for those components. Similar behavior would be extended to "\\?\UNC" paths.
At the time I wrote this, I was thinking to support whatever paths work in practice, but on closer scrutiny even GetFullPathNameW()
handles the initial slashes for the server and share components without normalizing repeated slashes. For example:
>>> n = GetFullPathNameW('//server//file', len(buf), buf, byref(filepart))
>>> print(buf.value)
\\server\file
>>> print(filepart.value)
file
>>> n = GetFullPathNameW('////file', len(buf), buf, byref(filepart))
>>> print(buf.value)
\\\file
>>> print(filepart.value)
file
For normal UNC filepaths, if os.chdir()
handles the path as a UNC path instead of as a rooted path on the current drive, then splitdrive()
should split out the 'drive' component, even if it's malformed (e.g. empty server or share component). Extend this behavior to "\\?\UNC" paths, as PathCchSkipRoot()
does, even though they're not valid for the current working directory.
UNC drive examples in the file namespace: | ||
|
||
splitdrive('//server/share') == ('//server/share', '') | ||
splitdrive('//server///share') == ('//server///share', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example should be changed to match PathCchSkipRoot()
. For example:
>>> buf.value = r'\\server\\\dir'
>>> hr = PathCchSkipRoot(buf, byref(filepath))
>>> print(filepath.value)
\\dir
Corresponding splitdrive()
result:
splitdrive('//server///dir') == ('//server/', '//dir')
The share component is an empty string. This example can be moved to a section that discusses malformed drives, particularly a section about paths that normalize as functional paths. In this case the path normalizes as if "dir" is a UNC share component.
insensitive. Any device junction is recognized as a UNC drive, with | ||
two exceptions that require additional qualification: "GLOBAL" and "UNC". | ||
|
||
Normally the device namespace includes the local device junctions of a | ||
user, such as mapped and subst drives. The "GLOBAL" junction limits this | ||
view to just global devices. It must be followed either by a device | ||
junction or another "GLOBAL" junction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of the desire to natively use PathCchSkipRoot()
, the splitdrive()
implementation should not support "Global". PathCchSkipRoot()
doesn't support this prefix, and it's uncommon in practice. It's mostly used by device drivers that need to ensure that they're accessing a global device when executing in an arbitrary thread context. For example, a device driver would use the NT path "\??\Global\SpamDevice" instead of "\??\SpamDevice".
splitdrive('//') == ('', '//') | ||
splitdrive('//server/') == ('', '//server/') | ||
splitdrive('///server/share') == ('', '///server/share') | ||
|
||
splitdrive('//?/UNC/') == ('', '//?/UNC/') | ||
splitdrive('//?/UNC/server/') == ('', '//?/UNC/server/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, to align with PathCchSkipRoot()
, these examples should have a drive, as follows:
splitdrive('//') == ('//', '')
splitdrive('//server/') == ('//server/', '')
splitdrive('///share/file') == ('///share', '/file')
splitdrive('//?/UNC/') == ('//?/UNC/', '')
splitdrive('//?/UNC/server/') == ('//?/UNC/server/', '')
I changed some component names to reflect how they're classified. These examples can be included in a section about paths with malformed drives.
if isinstance(p, bytes): | ||
empty = b'' | ||
colon = b':' | ||
sep = b'\\' | ||
altsep = b'/' | ||
device_domains = (b'?', b'.') | ||
global_name = b'GLOBAL' | ||
unc_name = b'UNC' | ||
else: | ||
empty = '' | ||
colon = ':' | ||
sep = '\\' | ||
altsep = '/' | ||
device_domains = ('?', '.') | ||
global_name = 'GLOBAL' | ||
unc_name = 'UNC' | ||
|
||
# Check for a DOS drive. | ||
if p[1:2] == colon: | ||
return p[:2], p[2:] | ||
|
||
# UNC drive for the file and device namespaces. | ||
# \\domain\junction\object | ||
# Separators may be repeated, except at the root. | ||
|
||
def _next(): | ||
'''Get the next component, ignoring repeated separators.''' | ||
i0 = index | ||
while normp[i0:i0+1] == sep: | ||
i0 += 1 | ||
if i0 >= len(p): | ||
return -1, len(p) | ||
i1 = normp.find(sep, i0) | ||
if i1 == -1: | ||
i1 = len(p) | ||
return i0, i1 | ||
|
||
index = 0 | ||
normp = p.replace(altsep, sep) | ||
# Consume the domain (server). | ||
i, index = _next() | ||
if i != 2: | ||
return empty, p | ||
domain = p[i:index] | ||
# Consume the junction (share). | ||
i, index = _next() | ||
if i == -1: | ||
return empty, p | ||
|
||
if domain not in device_domains: | ||
return p[:index], p[index:] | ||
|
||
# GLOBAL and UNC are special in the device domain. | ||
junction = p[i:index].upper() | ||
# GLOBAL can be repeated. | ||
while junction == global_name: | ||
i, index = _next() | ||
if i == -1: | ||
# GLOBAL must be a prefix. | ||
return empty, p | ||
junction = p[i:index].upper() | ||
|
||
if junction == unc_name: | ||
# Allow the "UNC" device with no remaining path. | ||
if index == len(p): | ||
return p, empty | ||
# Consume the meta-domain (server). | ||
i, index = _next() | ||
if i == -1: | ||
return empty, p | ||
# Consume the meta-junction (share). | ||
i, index = _next() | ||
if i == -1: | ||
return empty, p | ||
|
||
return p[:index], p[index:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a first attempt at implementing the simplified proposal:
def splitdrive(p):
p = os.fspath(p)
if isinstance(p, bytes):
empty = b''
sep = b'\\'
altsep = b'/'
colon = b':'
device_domains = (b'?', b'.')
unc_root = b'\\\\'
unc_name = b'UNC'
else:
empty = ''
sep = '\\'
altsep = '/'
colon = ':'
device_domains = ('?', '.')
unc_root = '\\\\'
unc_name = 'UNC'
# Handle a DOS drive path, rooted path, or relative path.
#
# drive
# vvvvvvvvvvv
# ([A-Z] ":")? ("\"? name ("\"+ name)*)?
# ^^^^^^^^^^^^^^^^^^^^^^^^
# file path
normp = p.replace(altsep, sep)
if normp[:2] != unc_root:
if p[1:2] == colon and p[:1].isalpha():
return p[:2], p[2:]
return empty, p
# Handle a UNC drive path.
#
# drive
# vvvvvvvvvvvvvvvvvvvvvvvvvv
# "\\" (domain ("\" junction ("\"+ name)*)?)?
# ^^^^^^^^^^^
# namespace path
# drive
# vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# "\\" ("?"|".") "\UNC" ("\" server ("\" share ("\"+ name)*)?)?
# ^^^^^^^^^^^
# file path
parts = []
start = index = 1
for _ in range(2):
start = index + 1
index = normp.find(sep, start)
if index == -1:
return p, empty
parts.append(p[start:index])
if parts[0] not in device_domains or parts[1].upper() != unc_name:
return p[:index], p[index:]
# "UNC" device path
for i in range(2):
start = index + 1
index = normp.find(sep, start)
if index == -1:
return p, empty
return p[:index], p[index:]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above is only an initial attempt at implementing the proposed behavior. The final version should use a new ntpath.splitroot()
function. In Windows, this will leverage nt._path_splitroot()
, with extensions, and otherwise use a fallback implementation written in Python. splitdrive()
is a bit modified since the filesystem or namespace root slash should not be part of the drive in the splitdrive()
result. Also, both splitroot()
and splitdrive()
should support all DOS device names as 'drives'. For example, "\\?\BootPartition" is another name for "\\?\C:" in Windows 10+. New DOS devices can be created in the context of the current user via DefineDosDeviceW()
. They can target a volume device name, or an arbitrary path on the volume (i.e. the way subst.exe creates substitute drives).
>>> os.path.splitdrive('//?/BootPartition/Windows')
('//?/BootPartition', '/Windows')
>>> os.path.samefile('//?/BootPartition/Windows', 'C:/Windows')
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an implementation that splits a path into its component parts: drive, root, and rest. This could be useful for direct consumption in pathlib. It's trivial to use this function to implement splitdrive()
.
def splitparts(p):
p = os.fspath(p)
if isinstance(p, bytes):
sep = b'\\'
altsep = b'/'
colon = b':'
extendedpath = b'?'
uncroot = b'\\\\'
uncname = b'UNC'
else:
sep = '\\'
altsep = '/'
colon = ':'
extendedpath = '?'
uncroot = '\\\\'
uncname = 'UNC'
normp = p.replace(altsep, sep)
# Handle a DOS drive path, rooted path, or relative path.
if normp[:2] != uncroot:
if p[1:2] == colon and p[:1].isalpha():
index = 2
else:
index = 0
if normp[index:index+1] == sep:
return p[:index], p[index:index+1], p[index+1:]
return p[:index], p[:0], p[index:]
# Handle a UNC drive path.
index = normp.find(sep, 2)
if index == -1:
index = len(p)
domain = p[2:index]
index += 1
i = normp.find(sep, index)
if i == -1:
i = len(p)
junction = p[index:i]
index = i
if domain == extendedpath and junction.upper() == uncname:
index = normp.find(sep, index + 1)
if index != -1:
index = normp.find(sep, index + 1)
if index == -1:
index = len(p)
return p[:index], p[index:index+1], p[index+1:]
Fixing up the result from PathCchSplitRoot()
is currently more trouble than it's worth. It only supports extended paths for drives, volume GUID names, and "UNC" paths. This distinction goes too far because there are reasons to need an extended path with arbitrary device names, either to get a literal path (i.e. forward slashes in a device path) or to use a long path if long DOS paths are disabled for the current process or the system. Plus mounted volumes can have any device name, such as "BootPartition", which is legitimate within the Windows API (not just the NT API) by extension of DefineDosDeviceW()
. Also, the support for drives in extended paths and volume GUID names has a serious bug. It splits r'\\?\C:spam'
as (r'\\?\C:', 'spam')
. The OS will try to access a device named "C:spam", so splitting a DOS drive-letter drive out of the device name is wrong.
Maybe using PathCchSplitRoot()
will be worth it if these quirks can be worked around in the C implementation of nt._path_splitroot()
.
I'm trying a more targeted approach to avoid backwards compatibility problems. PR here: #91882. Closing this PR. |
Implementation by @eryksun. They note:
https://bugs.python.org/issue37609