-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-37609 - Support "UNC" and "GLOBAL" junctions in ntpath.splitdrive()
.
#31702
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -122,54 +122,169 @@ def join(path, *paths): | |
# colon) and the path specification. | ||
# It is always true that drivespec + pathspec == p | ||
def splitdrive(p): | ||
"""Split a pathname into drive/UNC sharepoint and relative path specifiers. | ||
Returns a 2-tuple (drive_or_unc, path); either part may be empty. | ||
|
||
If you assign | ||
result = splitdrive(p) | ||
It is always true that: | ||
result[0] + result[1] == p | ||
|
||
If the path contained a drive letter, drive_or_unc will contain everything | ||
up to and including the colon. e.g. splitdrive("c:/dir") returns ("c:", "/dir") | ||
|
||
If the path contained a UNC path, the drive_or_unc will contain the host name | ||
and share up to but not including the fourth directory separator character. | ||
e.g. splitdrive("//host/computer/dir") returns ("//host/computer", "/dir") | ||
|
||
Paths cannot contain both a drive letter and a UNC path. | ||
|
||
"""Split path p conservatively into a drive and remaining path. | ||
Returns a 2-tuple, (drive, rest). Either component may be empty. | ||
|
||
If the source path contains a DOS drive (i.e. a letter plus a colon), the | ||
remaining path is everything after the colon. | ||
|
||
DOS drive examples: | ||
|
||
splitdrive('C:') == ('C:', '') | ||
splitdrive('C:dir') == ('C:', 'dir') | ||
splitdrive('C:/') == ('C:', '/') | ||
splitdrive('C:/dir') == ('C:', '/dir') | ||
|
||
A UNC path is parsed as follows: | ||
|
||
drive | ||
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv | ||
"//" domain "/" junction ["/" junction] ["/" object] | ||
^^^^^^^^^^^^ | ||
rest | ||
|
||
The UNC root must be exactly two separators. Other separators may be | ||
repeated. | ||
|
||
This is a generalization of the UNC specification in [MS-DTYP] 2.2.57. The | ||
latter specifies the file namespace, for which the domain is referred to | ||
as "host-name" (more generally "server") and the junction as "share-name". | ||
The server is commonly a local or remote network name (i.e. NETBIOS name, | ||
DNS name, or IP address). It can also be a non-network server provided by | ||
a local redirector. The share is a resource provided by the server, such | ||
as a file-system directory. | ||
|
||
UNC drive examples in the file namespace: | ||
|
||
splitdrive('//server/share') == ('//server/share', '') | ||
splitdrive('//server///share') == ('//server///share', '') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This example should be changed to match
Corresponding
The share component is an empty string. This example can be moved to a section that discusses malformed drives, particularly a section about paths that normalize as functional paths. In this case the path normalizes as if "dir" is a UNC share component. |
||
splitdrive('//server/share/') == ('//server/share', '/') | ||
splitdrive('//server/share/dir') == ('//server/share', '/dir') | ||
|
||
The other supported namespace is the device namespace, which is mapped as | ||
two domains, "." and "?". These domains are handled differently in some | ||
contexts, such as when creating or opening a file, but for our puposes | ||
here they are equivalent. In this namespace, the junction is case- | ||
insensitive. Any device junction is recognized as a UNC drive, with | ||
two exceptions that require additional qualification: "GLOBAL" and "UNC". | ||
|
||
Normally the device namespace includes the local device junctions of a | ||
user, such as mapped and subst drives. The "GLOBAL" junction limits this | ||
view to just global devices. It must be followed either by a device | ||
junction or another "GLOBAL" junction. | ||
Comment on lines
+168
to
+174
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In light of the desire to natively use |
||
|
||
The equivalent of the UNC file namespace in the device namespace is the | ||
"UNC" device junction, but only when there is a remaining path (e.g. at | ||
least a trailing separator). For consistency with the file namespace, if | ||
the "UNC" device junction has a reminaing path, it must include a server | ||
and share in order to be recognized as a drive. | ||
|
||
UNC drive examples in the device namespace: | ||
|
||
splitdrive('//./C:') == ('//./C:', '') | ||
splitdrive('//?/C:/dir') == ('//?/C:', '/dir') | ||
|
||
splitdrive('//./UNC') == ('//./UNC', '') | ||
splitdrive('//?/UNC/server/share') == ('//?/UNC/server/share', '') | ||
splitdrive('//?/UNC/server/share/dir') == ( | ||
'//?/UNC/server/share', '/dir') | ||
|
||
splitdrive('//./Global/C:') == ('//./Global/C:', '') | ||
splitdrive('//?/Global/Global/C:/') == ('//?/Global/Global/C:', '/') | ||
splitdrive('//?/Global/UNC/server/share/dir') == ( | ||
'//?/Global/UNC/server/share', '/dir') | ||
|
||
Examples with no drive: | ||
|
||
splitdrive('') == ('', '') | ||
splitdrive('dir') == ('', 'dir') | ||
splitdrive('/dir') == ('', '/dir') | ||
|
||
splitdrive('//') == ('', '//') | ||
splitdrive('//server/') == ('', '//server/') | ||
splitdrive('///server/share') == ('', '///server/share') | ||
|
||
splitdrive('//?/UNC/') == ('', '//?/UNC/') | ||
splitdrive('//?/UNC/server/') == ('', '//?/UNC/server/') | ||
Comment on lines
+203
to
+208
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As mentioned above, to align with
I changed some component names to reflect how they're classified. These examples can be included in a section about paths with malformed drives. |
||
splitdrive('//?/Global') == ('', '//?/Global') | ||
""" | ||
p = os.fspath(p) | ||
if len(p) >= 2: | ||
if isinstance(p, bytes): | ||
sep = b'\\' | ||
altsep = b'/' | ||
colon = b':' | ||
else: | ||
sep = '\\' | ||
altsep = '/' | ||
colon = ':' | ||
normp = p.replace(altsep, sep) | ||
if (normp[0:2] == sep*2) and (normp[2:3] != sep): | ||
# is a UNC path: | ||
# vvvvvvvvvvvvvvvvvvvv drive letter or UNC path | ||
# \\machine\mountpoint\directory\etc\... | ||
# directory ^^^^^^^^^^^^^^^ | ||
index = normp.find(sep, 2) | ||
if index == -1: | ||
return p[:0], p | ||
index2 = normp.find(sep, index + 1) | ||
# a UNC path can't have two slashes in a row | ||
# (after the initial two) | ||
if index2 == index + 1: | ||
return p[:0], p | ||
if index2 == -1: | ||
index2 = len(p) | ||
return p[:index2], p[index2:] | ||
if normp[1:2] == colon: | ||
return p[:2], p[2:] | ||
return p[:0], p | ||
if isinstance(p, bytes): | ||
empty = b'' | ||
colon = b':' | ||
sep = b'\\' | ||
altsep = b'/' | ||
device_domains = (b'?', b'.') | ||
global_name = b'GLOBAL' | ||
unc_name = b'UNC' | ||
else: | ||
empty = '' | ||
colon = ':' | ||
sep = '\\' | ||
altsep = '/' | ||
device_domains = ('?', '.') | ||
global_name = 'GLOBAL' | ||
unc_name = 'UNC' | ||
|
||
# Check for a DOS drive. | ||
if p[1:2] == colon: | ||
return p[:2], p[2:] | ||
|
||
# UNC drive for the file and device namespaces. | ||
# \\domain\junction\object | ||
# Separators may be repeated, except at the root. | ||
|
||
def _next(): | ||
'''Get the next component, ignoring repeated separators.''' | ||
i0 = index | ||
while normp[i0:i0+1] == sep: | ||
i0 += 1 | ||
if i0 >= len(p): | ||
return -1, len(p) | ||
i1 = normp.find(sep, i0) | ||
if i1 == -1: | ||
i1 = len(p) | ||
return i0, i1 | ||
|
||
index = 0 | ||
normp = p.replace(altsep, sep) | ||
# Consume the domain (server). | ||
i, index = _next() | ||
if i != 2: | ||
return empty, p | ||
domain = p[i:index] | ||
# Consume the junction (share). | ||
i, index = _next() | ||
if i == -1: | ||
return empty, p | ||
|
||
if domain not in device_domains: | ||
return p[:index], p[index:] | ||
|
||
# GLOBAL and UNC are special in the device domain. | ||
junction = p[i:index].upper() | ||
# GLOBAL can be repeated. | ||
while junction == global_name: | ||
i, index = _next() | ||
if i == -1: | ||
# GLOBAL must be a prefix. | ||
return empty, p | ||
junction = p[i:index].upper() | ||
|
||
if junction == unc_name: | ||
# Allow the "UNC" device with no remaining path. | ||
if index == len(p): | ||
return p, empty | ||
# Consume the meta-domain (server). | ||
i, index = _next() | ||
if i == -1: | ||
return empty, p | ||
# Consume the meta-junction (share). | ||
i, index = _next() | ||
if i == -1: | ||
return empty, p | ||
|
||
return p[:index], p[index:] | ||
Comment on lines
+212
to
+287
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's a first attempt at implementing the simplified proposal: def splitdrive(p):
p = os.fspath(p)
if isinstance(p, bytes):
empty = b''
sep = b'\\'
altsep = b'/'
colon = b':'
device_domains = (b'?', b'.')
unc_root = b'\\\\'
unc_name = b'UNC'
else:
empty = ''
sep = '\\'
altsep = '/'
colon = ':'
device_domains = ('?', '.')
unc_root = '\\\\'
unc_name = 'UNC'
# Handle a DOS drive path, rooted path, or relative path.
#
# drive
# vvvvvvvvvvv
# ([A-Z] ":")? ("\"? name ("\"+ name)*)?
# ^^^^^^^^^^^^^^^^^^^^^^^^
# file path
normp = p.replace(altsep, sep)
if normp[:2] != unc_root:
if p[1:2] == colon and p[:1].isalpha():
return p[:2], p[2:]
return empty, p
# Handle a UNC drive path.
#
# drive
# vvvvvvvvvvvvvvvvvvvvvvvvvv
# "\\" (domain ("\" junction ("\"+ name)*)?)?
# ^^^^^^^^^^^
# namespace path
# drive
# vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# "\\" ("?"|".") "\UNC" ("\" server ("\" share ("\"+ name)*)?)?
# ^^^^^^^^^^^
# file path
parts = []
start = index = 1
for _ in range(2):
start = index + 1
index = normp.find(sep, start)
if index == -1:
return p, empty
parts.append(p[start:index])
if parts[0] not in device_domains or parts[1].upper() != unc_name:
return p[:index], p[index:]
# "UNC" device path
for i in range(2):
start = index + 1
index = normp.find(sep, start)
if index == -1:
return p, empty
return p[:index], p[index:] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The above is only an initial attempt at implementing the proposed behavior. The final version should use a new
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's an implementation that splits a path into its component parts: drive, root, and rest. This could be useful for direct consumption in pathlib. It's trivial to use this function to implement def splitparts(p):
p = os.fspath(p)
if isinstance(p, bytes):
sep = b'\\'
altsep = b'/'
colon = b':'
extendedpath = b'?'
uncroot = b'\\\\'
uncname = b'UNC'
else:
sep = '\\'
altsep = '/'
colon = ':'
extendedpath = '?'
uncroot = '\\\\'
uncname = 'UNC'
normp = p.replace(altsep, sep)
# Handle a DOS drive path, rooted path, or relative path.
if normp[:2] != uncroot:
if p[1:2] == colon and p[:1].isalpha():
index = 2
else:
index = 0
if normp[index:index+1] == sep:
return p[:index], p[index:index+1], p[index+1:]
return p[:index], p[:0], p[index:]
# Handle a UNC drive path.
index = normp.find(sep, 2)
if index == -1:
index = len(p)
domain = p[2:index]
index += 1
i = normp.find(sep, index)
if i == -1:
i = len(p)
junction = p[index:i]
index = i
if domain == extendedpath and junction.upper() == uncname:
index = normp.find(sep, index + 1)
if index != -1:
index = normp.find(sep, index + 1)
if index == -1:
index = len(p)
return p[:index], p[index:index+1], p[index+1:] Fixing up the result from Maybe using |
||
|
||
|
||
# Split a path in head (everything up to the last '/') and tail (the | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
:func:`os.path.splitdrive` now understands ``UNC`` and ``GLOBAL`` junctions | ||
in Windows device paths. Contributed by Barney Gale and Eryk Sun. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for picking this up, but, fair warning, it's going to need more work. If you're up for that, great. I'd love to get your opinions on what needs to be done.
Steve Dower introduced the idea of leveraging
PathCchSkipRoot()
in Windows, and I like that idea, despite some reservations on my part. So I thinksplitdrive()
should try to conform withPathCchSkipRoot()
. As such, repetition of separators between the domain and junctions (e.g. UNC server and share) should be parsed as empty values for those components. Similar behavior would be extended to "\\?\UNC" paths.At the time I wrote this, I was thinking to support whatever paths work in practice, but on closer scrutiny even
GetFullPathNameW()
handles the initial slashes for the server and share components without normalizing repeated slashes. For example:For normal UNC filepaths, if
os.chdir()
handles the path as a UNC path instead of as a rooted path on the current drive, thensplitdrive()
should split out the 'drive' component, even if it's malformed (e.g. empty server or share component). Extend this behavior to "\\?\UNC" paths, asPathCchSkipRoot()
does, even though they're not valid for the current working directory.