Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ntpath.normpath('\\\\') produces different result on Windows #96290

Closed
barneygale opened this issue Aug 25, 2022 · 13 comments
Closed

ntpath.normpath('\\\\') produces different result on Windows #96290

barneygale opened this issue Aug 25, 2022 · 13 comments
Labels
3.11 only security fixes 3.12 bugs and security fixes OS-windows type-bug An unexpected behavior, bug, or error

Comments

@barneygale
Copy link
Contributor

barneygale commented Aug 25, 2022

In Python 3.11.0rc1 on Windows:

>>> import ntpath
>>> ntpath.normpath('\\\\')
'\\\\'

In Python 3.11.0rc1 on non-Windows platforms, and in 3.10 across all platforms:

>>> import ntpath
>>> ntpath.normpath('\\\\')
'\\'

Linked PRs

@barneygale barneygale added the type-bug An unexpected behavior, bug, or error label Aug 25, 2022
@barneygale
Copy link
Contributor Author

@zooba I think this might be related to 99fcf15

@zooba
Copy link
Member

zooba commented Aug 25, 2022

Is this an edge case? Or an example of a more general issue?

It's not an interesting path if it's only this case, so may as well special-case it. It may also be because of some code to skip \\?\ prefixes just bailing out when it hits the end of the string early.

@barneygale
Copy link
Contributor Author

I spotted this while working on #31691, which makes pathlib use os.path.normpath() to normalize path "anchors" (drive + root). It causes this test to fail. I could remove the test if we don't care about this case.

@zooba
Copy link
Member

zooba commented Aug 25, 2022

I think we care about unnecessary regressions, but can't promise I'll get time myself to address it for 3.11.

@barneygale
Copy link
Contributor Author

barneygale commented Aug 25, 2022

Of course, no worries. My PR targets main anyway.

In the old implementation, leading slashes are collapsed unless a UNC path is supplied. splitdrive() identifies UNC paths that match a pattern like r'//[^/]+/([^/].*)?$'. So:

>>> ntpath.normpath('\\\\foo\\bar')
'\\\\foo\\bar'
>>> ntpath.normpath('\\\\foo\\\\')
'\\foo'
>>> ntpath.normpath('\\\\foo\\')
'\\\\foo\\'
>>> ntpath.normpath('\\\\foo')
'\\foo'
>>> ntpath.normpath('\\\\')
'\\'

The new implementation returns the input unchanged in each case above.

@eryksun
Copy link
Contributor

eryksun commented Aug 26, 2022

The pure-Python implementation of normpath() returns the correct result for "//foo/" because in this case splitdrive() returns a UNC drive with an empty share component. A practical solution would be to simply double down on this behavior. Allow splitdrive() to return all of the following UNC fragments as a drive:

  • empty server & missing share: "//"
  • empty server & empty share: "///"
  • empty server & valid share: "///y"
  • valid server & missing share: "//x"
  • valid server & empty share: "//x/"

This behavior is consistent with the system's PathCchSkipRoot() function, which is the basis for nt._path_splitroot().

>>> nt._path_splitroot('//')
('//', '')
>>> nt._path_splitroot('///')
('///', '')
>>> nt._path_splitroot('///y')
('///y', '')
>>> nt._path_splitroot('//x')
('//x', '')
>>> nt._path_splitroot('//x/')
('//x/', '')

GetFullPathNameW() also handles these UNC fragments as part of the root path. If a path begins with two or more slashes, everything up to and including the third slash absorbs a ".." component. The share component to the right of the third slash also absorbs ".." components, as long as the name isn't "." or "..".

>>> print(nt._getfullpathname('//..'))
\\
>>> print(nt._getfullpathname('///..'))
\\\
>>> print(nt._getfullpathname('//x/..'))
\\x\
>>> print(nt._getfullpathname('//x//..'))
\\x\

>>> print(nt._getfullpathname('//x/y/..'))
\\x\y
>>> print(nt._getfullpathname('//x/./..'))
\\x\
>>> print(nt._getfullpathname('//x/../..'))
\\x\

Unfortunately, GetFullPathNameW() doesn't implement an extended drive concept for device paths, unlike PathCchSkipRoot() and Python. Instead, GetFullPathNameW() considers the root of a device path to be the "//?/" or "//./" prefix. For example:

>>> print(nt._getfullpathname('//?/C:/..'))
\\?\
>>> print(nt._getfullpathname('//./C:/..'))
\\.\

The C++ path class makes the same choice. This makes it consistent with opening such paths via CreateFileW(), which uses an internal analog of GetFullPathNameW(). For example, here's the output of a C++ program that shows the components of a path:

Info for path: \\?\C:\dir\file.txt

root name: \\?
root path: \\?\
root directory: \
parent path: \\?\C:\dir
filename: file.txt
stem: file
extension: .txt

Iteration
item 0: \\?
item 1: \
item 2: C:
item 3: dir
item 4: file.txt

That said, I think the extended drive concept is more useful and worth the minor inconsistency with GetFullPathNameW(). Fortunately, Python's os.path.abspath() calls normpath() before it calls _getfullpathname(). The C implementation of normpath() implements extended drives by way of simple UNC parsing. For example:

>>> nt._path_normpath('//./pipe/..')
'\\\\.\\pipe\\'
>>> nt._path_normpath('//?/C:/..')
'\\\\?\\C:\\'
>>> nt._path_normpath('//server/share/..')
'\\\\server\\share\\'

It's just missing the special case for "UNC" device paths that was recently implemented for splitdrive(). For example:

>>> nt._path_normpath('//?/UNC/server/share/../..')
'\\\\?\\UNC\\'

The result should keep the server and share components as part of the root. For example:

>>> nt._path_splitroot('//?/UNC/server/share/../..')
('//?/UNC/server/share/', '../..')

Additional notes on UNC handling

The special_prefixes handling from the pure-Python implementation of normpath() has to be removed. It was never correct, never made sense, was added long ago just to avoid a bug that no longer exists, and it's inconsistent with the C implementation.

isabs() wrongly returns false for some UNC paths, such as "//server/share" and "//./con". UNC paths are always absolute. If a path starts with at least two slashes, isabs() should return true. There's no need to call splitdrive() in this case.

barneygale added a commit to barneygale/cpython that referenced this issue Aug 27, 2022
@eryksun
Copy link
Contributor

eryksun commented Aug 29, 2022

If splitdrive() is modified to allow returning UNC fragments as drives, then the following tests would have to change, as well the test with backslashes instead of forward slashes:

>>> ntpath.splitdrive("///conky/mountpoint/foo/bar")
('///conky', '/mountpoint/foo/bar')
>>> ntpath.splitdrive("//conky//mountpoint/foo/bar")
('//conky/', '/mountpoint/foo/bar')
>>> ntpath.splitdrive("//?/UNC")
('//?/UNC', '')
>>> ntpath.splitdrive("//?/UNC/")
('//?/UNC/', '')

For the most part, this change aligns with WinAPI PathCchSkipRoot() and, for regular drive-letter and UNC paths, how GetFullPathNameW() handles the drive and root. Compared to PathCchSkipRoot(), a simple implementation in Python is more generalized in how it handles "\\?\" extended device paths since support for extended paths isn't limited to just "UNC", driver-letter, and volume GUID device names.

Here's an example implementation:

def splitdrive(p):
    p = os.fspath(p)
    if isinstance(p, bytes):
        sep = b'\\'
        altsep = b'/'
        colon = b':'
        unc_prefix = b'\\\\?\\UNC\\'
    else:
        sep = '\\'
        altsep = '/'
        colon = ':'
        unc_prefix = '\\\\?\\UNC\\'
    normp = p.replace(altsep, sep)
    if normp[1:2] == colon and normp[:1] != sep:
        # Drive-letter logical drives, e.g. X:
        return p[:2], p[2:]
    if normp[:2] == sep * 2:
        # UNC drives, e.g. \\server\share or \\?\UNC\server\share
        # Device drives, e.g. \\.\device or \\?\device
        start = 8 if normp[:8].upper() == unc_prefix else 2
        index = normp.find(sep, start)
        if index == -1:
            return p, p[:0]
        index = normp.find(sep, index + 1)
        if index == -1:
            return p, p[:0]
        return p[:index], p[index:]
    return p[:0], p

@eryksun
Copy link
Contributor

eryksun commented Aug 30, 2022

Here's a small change to _Py_normpath() in "Python/fileutils.c" that adds support for extended UNC paths. It also excludes paths that start with a path separator when skipping past a logical drive (e.g. "/:" is not a drive, so the slash should be normalized to backslash).

#ifdef MS_WINDOWS
    // Skip past drive segment and update minP2
    else if (p1[0] && !IS_SEP(&p1[0]) && p1[1] == L':') {
        *p2++ = *p1++;
        *p2++ = *p1++;
        minP2 = p2;
        lastC = L':';
    }
    // Skip past all \\ prefixed paths, including \\?\, \\.\,
    // and network paths, including the first segment.
    else if (IS_SEP(&p1[0]) && IS_SEP(&p1[1])) {
        *p2++ = SEP;
        *p2++ = SEP;
        p1 += 2;
        int maxSeps = 2;
        int sepCount = 0;
        for (; !IS_END(p1) && sepCount < maxSeps; ++p1) {
            if (IS_SEP(p1)) {
                sepCount++;
                *p2++ = lastC = SEP;
                if (sepCount == 2 && !_wcsnicmp(path, L"\\\\?\\UNC\\", 8)) {
                    maxSeps = 4;
                }
            } else {
                *p2++ = lastC = *p1;
            }
        }
        if (sepCount < maxSeps) {
            minP2 = p2;      // Implicit root or invalid path
        } else {
            minP2 = p2 - 1;  // Absolute path has SEP at minP2
        }
    }
#else

I switched to incrementing sepCount from 0 instead of decrementing from 2, and I added a maxSeps variable. Then if the in-place normalized value of path is "\\?\UNC\", it simply increases maxSeps to 4.

@zooba
Copy link
Member

zooba commented Aug 30, 2022

That's a lot of great analysis - I admit I haven't absorbed it all well enough to implement the changes, so hopefully Barney has.

All I'll add is that I think it's critical to maintain the existing base cases when the split/dirname functions are applied recursively. So if x = dirname(x) currently regresses to "" or the root of x, it should keep doing that. Similarly normpath("x\\..") and anything else we can think of. This is to ensure we don't break anyone doing something like:

path = get_path_for_this_demo()
while path:
    do_something_with_path(path)
    path = os.path.dirname(path)

@barneygale
Copy link
Contributor Author

That's a lot of great analysis - I admit I haven't absorbed it all well enough to implement the changes, so hopefully Barney has.

I agree with Eryk's earlier comment that we should handle everything beginning // as a UNC path. But my C is pretty terrible, so I won't attempt to fix this myself, sorry!

@eryksun
Copy link
Contributor

eryksun commented Aug 30, 2022

Supporting extended UNC paths in _path_normpath() can be moved to a new issue. The "\\" case is due to ntpath.splitdrive(). If it's agreed to split UNC fragments as drives, then a PR only has to modify ntpath and some test cases.

Additionally, the special_prefixes kludge should be removed from normpath(), and isabs() should always return true for UNC and device paths. I don't think this entails modifying any of the existing tests, but new ones should be added.

The implementation in isabs() could be inlined instead of calling splitdrive(). I think the following covers all cases, including preserving the bug with paths that have a root but no drive, because legacy.

def isabs(s):
    """Test whether a path is absolute"""
    s = os.fspath(s)
    if isinstance(s, bytes):
        sep = b'\\'
        altsep = b'/'
        colon_sep = b':\\'
    else:
        sep = '\\'
        altsep = '/'
        colon_sep = ':\\'
    s = s[:3].replace(altsep, sep)
    # Absolute: UNC, device, and paths with a drive and root.
    # LEGACY BUG: isabs("/x") should be false since the path has no drive.
    if s.startswith(sep) or s.startswith(colon_sep, 1):
        return True
    return False

@iritkatriel iritkatriel added OS-windows 3.11 only security fixes labels Oct 3, 2022
barneygale added a commit to barneygale/cpython that referenced this issue Nov 15, 2022
barneygale added a commit to barneygale/cpython that referenced this issue Nov 15, 2022
barneygale added a commit to barneygale/cpython that referenced this issue Dec 19, 2022
…splitdrive()

This brings the Python implementation of `ntpath.normpath()` in line with
the C implementation added in 99fcf15

Co-authored-by: Eryk Sun <eryksun@gmail.com>
barneygale added a commit to barneygale/cpython that referenced this issue Dec 20, 2022
…splitdrive()

This brings the Python implementation of `ntpath.normpath()` in line with
the C implementation added in 99fcf15

Co-authored-by: Eryk Sun <eryksun@gmail.com>
barneygale added a commit to barneygale/cpython that referenced this issue Dec 20, 2022
…splitdrive()

This brings the Python implementation of `ntpath.normpath()` in line with
the C implementation added in 99fcf15

Co-authored-by: Eryk Sun <eryksun@gmail.com>
barneygale added a commit to barneygale/cpython that referenced this issue Dec 24, 2022
@barneygale
Copy link
Contributor Author

PR available: #100351

@barneygale barneygale added the 3.12 bugs and security fixes label Jan 11, 2023
zooba pushed a commit that referenced this issue Jan 12, 2023
… splitdrive() (GH-100351)

This brings the Python implementation of `ntpath.normpath()` in line with the C implementation added in 99fcf15

Co-authored-by: Eryk Sun <eryksun@gmail.com>
zooba pushed a commit to zooba/cpython that referenced this issue Jan 12, 2023
…() and splitdrive() (pythonGH-100351)

This brings the Python implementation of `ntpath.normpath()` in line with the C implementation added in 99fcf15

Co-authored-by: Eryk Sun <eryksun@gmail.com>
zooba added a commit that referenced this issue Jan 12, 2023
… splitdrive() (GH-100351)

This brings the Python implementation of `ntpath.normpath()` in line with the C implementation added in 99fcf15

Co-authored-by: Barney Gale <barney.gale@gmail.com>
Co-authored-by: Eryk Sun <eryksun@gmail.com>
@barneygale
Copy link
Contributor Author

Fixed in 3.11 and 3.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes OS-windows type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants