Skip to content

os.path.realpath(symlink to DOS devices path that starts with '\\?\Some without ":"\') returns without prefix. #102440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wintermaples opened this issue Mar 5, 2023 · 7 comments
Labels
OS-windows type-bug An unexpected behavior, bug, or error

Comments

@wintermaples
Copy link

os.path.realpath(symblic link to DOS Device Paths) returns a path without \\?\ prefix even if patch that fix.

For example, there is below symlinks on C:\test.

C:\test>dir
03/05/2023  06:16 AM    <SYMLINKD>     media [\\?\ContainerMappedDirectories\CE94A662-0837-4F45-B403-55B3E57CE848]
03/05/2023  06:19 AM    <SYMLINKD>     to_c [\\?\C:\]

Then exexute python and call os.path.realpath and call some functions to inspect.

>>> import os
>>> from ntpath import normpath, normcase, devnull, join, isabs, _getfinalpathname, _getfinalpathname_nonstrict
>>>
>>> to_c_fn = r'to_c'
>>> media_fn = r'media'
>>> os.path.realpath(to_c_fn)
'C:\\'
>>> os.path.realpath(media_fn)
'\\ContainerMappedDirectories\\CE94A662-0837-4F45-B403-55B3E57CE848'
>>> _getfinalpathname(to_c_fn)
'\\\\?\\C:\\'
>>> _getfinalpathname(media_fn)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'media'
>>> _getfinalpathname_nonstrict(to_c_fn)
'\\\\?\\C:\\'
>>> _getfinalpathname_nonstrict(media_fn)
'\\ContainerMappedDirectories\\CE94A662-0837-4F45-B403-55B3E57CE848'

As the log indicates, os.path.realpath returns without a prefix.

I think it is wrong that _getfinalpathname_nonstrict removes DOS devices paths prefix.
So to fix the problem, I think we need to fix _getfinalpathname function on "Modules/posixmodule.c".

Please some opinions.

@wintermaples wintermaples added the type-bug An unexpected behavior, bug, or error label Mar 5, 2023
@wintermaples wintermaples changed the title os.path.realpath(symlink to DOS devices path that starts with '\\?\') returns without prefix. os.path.realpath(symlink to DOS devices path that starts with '\\?\Some without ":"\') returns without prefix. Mar 5, 2023
@eryksun
Copy link
Contributor

eryksun commented Mar 5, 2023

I don't see how ntpath._getfinalpathname_nonstrict() is responsible for removing the "\\?\" prefix, unless maybe there's something wrong with the target path of the "media" symlink.

A symbolic link reparse point contains a substitute path that the system uses, plus an optional print path for display1. What you see in the shell is the print path, but os.readlink() returns the substitute path.

What is the result of os.readlink(media_fn)?

As the log indicates, os.path.realpath returns without a prefix.

It's a known issue (#89760) that ntpath.realpath() mistakenly strips the prefix from some device paths for which the prefix must be retained. But that's unrelated to the given example in which the prefix gets stripped by ntpath._getfinalpathname_nonstrict().

Footnotes

  1. The protocol specifies what the print path "SHOULD" be. Per RFC 2119, "SHOULD" means that an item is strongly recommended but optional.

@wintermaples
Copy link
Author

What is the result of os.readlink(media_fn)?

>>> os.readlink('media')
'\\ContainerMappedDirectories\\93AA221C-4061-460F-8C4E-A2EAC1BF3324'

I read a code of ntpath._getfinalpathname_nonstrict(), perhaps the bug is caused by os.readlink behaviors.

The code of ntpath.realpath() processes L702-L706(Python3.11.0 amd64) because _getfinalpathname() causes FileNotFoundError (The specified path is symlink to DOS devices path that starts with '\?\Some without ":"').

        except OSError as ex:
            if strict:
                raise
            initial_winerror = ex.winerror
            path = _getfinalpathname_nonstrict(path)

That calls _getfinalpathname_nonstrict() and _getfinalpathname_nonstrict() calls os.readlink().

It's a known issue (#89760) that ntpath.realpath() mistakenly strips the prefix from some device paths for which the prefix must be retained. But that's unrelated to the given example in which the prefix gets stripped by ntpath._getfinalpathname_nonstrict().

I know and read #89760 but the bug is remained even if patch #89760 (comment).

@eryksun
Copy link
Contributor

eryksun commented Mar 6, 2023

This appears to be a shortcoming in os.readlink(), but a problem that's uncommon in practice. Normal symlinks created by WinAPI CreateSymbolicLinkW() are restricted to target a volume device name in NT's "\??\" directory, which contains local and global aliases for native NT device names, such as "\??\C:" -> "\Device\HarddiskVolume2". However, when manually creating a symlink via FSCTL_SET_REPARSE_POINT, it's possible to target an arbitrary NT path instead of a "\??\" prefixed path.

If the reparse point is a symlink with SYMLINK_FLAG_RELATIVE set in Flags, and the substitute path is a rooted path such as "\spam", then readlink() has to return the rooted path "\spam". This path is relative to the volume device of the opened path of the symlink. We currently implement this, except we don't check for SYMLINK_FLAG_RELATIVE.

If the reparse point is a mountpoint or if it's a symlink without SYMLINK_FLAG_RELATIVE set in Flags, and the substitute path is a rooted path such as "\spam", then readlink() has to prepend the prefix "\\?\GLOBALROOT" to the substitute path. Currently this is not implemented.

Assuming the target of "media" really exists, then the following stat() call should work:

os.stat(r'\\?\GLOBALROOT\ContainerMappedDirectories\93AA221C-4061-460F-8C4E-A2EAC1BF3324')

@wintermaples
Copy link
Author

wintermaples commented Mar 6, 2023

This appears to be a shortcoming in os.readlink(), but a problem that's uncommon in practice.

This problem happens on volume mount of windows container(docker), so that is not uncommon in practice...

Assuming the target of "media" really exists, then the following stat() call should work:

>>> os.stat(r'\\?\GLOBALROOT\ContainerMappedDirectories\3284C586-8F0D-4325-99EB-76DFF9D5D419')
os.stat_result(st_mode=16895, st_ino=3096224744370362, st_dev=3393136209, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1678028615, st_mtime=1677934524, st_ctime=1677934246)
>>>
>>> os.stat(r'\\?\ContainerMappedDirectories\3284C586-8F0D-4325-99EB-76DFF9D5D419')
os.stat_result(st_mode=16895, st_ino=3096224744370362, st_dev=3393136209, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1678028615, st_mtime=1677934524, st_ctime=1677934246)

I show reparsepoint of media symbolic link below.

C:\inetpub\wwwroot>fsutil reparsepoint query media
Reparse Tag Value : 0xa000000c
Tag value: Microsoft
Tag value: Name Surrogate
Tag value: Symbolic Link

Reparse Data Length: 0x00000116
Reparse Data:
0000:  00 00 80 00 82 00 86 00  00 00 00 00 5c 00 43 00  ............\.C.
0010:  6f 00 6e 00 74 00 61 00  69 00 6e 00 65 00 72 00  o.n.t.a.i.n.e.r.
0020:  4d 00 61 00 70 00 70 00  65 00 64 00 44 00 69 00  M.a.p.p.e.d.D.i.
0030:  72 00 65 00 63 00 74 00  6f 00 72 00 69 00 65 00  r.e.c.t.o.r.i.e.
0040:  73 00 5c 00 33 00 32 00  38 00 34 00 43 00 35 00  s.\.3.2.8.4.C.5.
0050:  38 00 36 00 2d 00 38 00  46 00 30 00 44 00 2d 00  8.6.-.8.F.0.D.-.
0060:  34 00 33 00 32 00 35 00  2d 00 39 00 39 00 45 00  4.3.2.5.-.9.9.E.
0070:  42 00 2d 00 37 00 36 00  44 00 46 00 46 00 39 00  B.-.7.6.D.F.F.9.
0080:  44 00 35 00 44 00 34 00  31 00 39 00 00 00 5c 00  D.5.D.4.1.9...\.
0090:  5c 00 3f 00 5c 00 43 00  6f 00 6e 00 74 00 61 00  \.?.\.C.o.n.t.a.
00a0:  69 00 6e 00 65 00 72 00  4d 00 61 00 70 00 70 00  i.n.e.r.M.a.p.p.
00b0:  65 00 64 00 44 00 69 00  72 00 65 00 63 00 74 00  e.d.D.i.r.e.c.t.
00c0:  6f 00 72 00 69 00 65 00  73 00 5c 00 33 00 32 00  o.r.i.e.s.\.3.2.
00d0:  38 00 34 00 43 00 35 00  38 00 36 00 2d 00 38 00  8.4.C.5.8.6.-.8.
00e0:  46 00 30 00 44 00 2d 00  34 00 33 00 32 00 35 00  F.0.D.-.4.3.2.5.
00f0:  2d 00 39 00 39 00 45 00  42 00 2d 00 37 00 36 00  -.9.9.E.B.-.7.6.
0100:  44 00 46 00 46 00 39 00  44 00 35 00 44 00 34 00  D.F.F.9.D.5.D.4.
0110:  31 00 39 00 00 00                                 1.9...

(Flags is 0x00000000)

@wintermaples
Copy link
Author

I'm trying to write a fix code to implement prepending "\?GLOBALROOT" to the result of os.readlink().

Modules/posixmodule.c -> os_readlink_impl

    wchar_t *name = NULL;
    Py_ssize_t nameLen = 0;
    if (rdb->ReparseTag == IO_REPARSE_TAG_SYMLINK)
    {
        name = (wchar_t *)((char*)rdb->SymbolicLinkReparseBuffer.PathBuffer +
                           rdb->SymbolicLinkReparseBuffer.SubstituteNameOffset);
        nameLen = rdb->SymbolicLinkReparseBuffer.SubstituteNameLength / sizeof(wchar_t);
    }
    else if (rdb->ReparseTag == IO_REPARSE_TAG_MOUNT_POINT)
    {
        name = (wchar_t *)((char*)rdb->MountPointReparseBuffer.PathBuffer +
                           rdb->MountPointReparseBuffer.SubstituteNameOffset);
        nameLen = rdb->MountPointReparseBuffer.SubstituteNameLength / sizeof(wchar_t);
    }
    else
    {
        PyErr_SetString(PyExc_ValueError, "not a symbolic link");
    }

    if (name == NULL) {
        return NULL;
    }

    if (nameLen > 4 && wcsncmp(name, L"\\??\\", 4) == 0) {
        /* Our buffer is mutable, so this is okay */
        name[1] = L'\\';
    }

    /* If the substitute name is a rooted path then */
    /* the reparse point is MountPoint || ( Symlink && Flags == 0 ) then */
    /* should prepend \\?\GLOBALROOT prefix. */
    wchar_t prefix[] = L"\\\\?\\GLOBALROOT";
    if (!(nameLen > 4 && wcsncmp(name, L"\\\\?\\", 4) == 0) && (nameLen > 1 && wcsncmp(name, L"\\", 1) == 0)) {
        bool shouldPrependPrefix =
            (rdb->ReparseTag == IO_REPARSE_TAG_MOUNT_POINT) ||
            (rdb->ReparseTag == IO_REPARSE_TAG_SYMLINK && rdb->SymbolicLinkReparseBuffer.Flags == 0);

        if (shouldPrependPrefix) {
            wchar_t *oldName = name;
            nameLen = wcslen(prefix) + wcslen(oldName);
            name = (wchar_t *) PyMem_Malloc(sizeof(wchar_t) * (nameLen + 1));
            wcscpy_s(name, nameLen + 1, prefix);
            wcscat_s(name, nameLen + 1, oldName);
            PyMem_Free(oldName);
        }
    }

    result = PyUnicode_FromWideChar(name, nameLen);
    if (result && path->narrow) {
        Py_SETREF(result, PyUnicode_EncodeFSDefault(result));
    }

    return result;

If without PyMem_Free(oldName), the compiled python runs expectedly(but memory leaking).
If with PyMem_Free(oldName), the compiled python crashes when os.readlink('media').
I have no idea why this happens.

Could you give me some advice and review it?

@eryksun
Copy link
Contributor

eryksun commented Mar 6, 2023

This problem happens on volume mount of windows container(docker), so that is not uncommon in practice...

Using containers on Windows is still a niche usage that's relatively uncommon compared to the thousands of Python installations on Windows that will never see such a symlink that cannot be created by the normal CreateSymbolicLinkW() API function. That doesn't mean that it's not worth enhancing os.readlink() to support symlinks and mountpoints that target native paths other than "\??\" device paths.

Note that "\\?\ContainerMappedDirectories\3284C586-8F0D-4325-99EB-76DFF9D5D419" is the "print path" in the reparse point data buffer. We can't rely on the print path because the specification only says what it "SHOULD" be.

If without PyMem_Free(oldName)

You can't call PyMem_Free() on a stack allocated buffer. Anyway, oldName isn't even the base address. After creating the str result, however, you do have to free the newly allocated buffer. Also, you can't call wcslen(oldName) because it isn't a null-terminated string; it's a counted string.

Here's what I wrote to address this problem before I saw your code. The SYMLINK_FLAG_RELATIVE macro should be defined in "Modules/winreparse.h" instead of here, but this was just for testing.

#elif defined(MS_WINDOWS)
#ifndef SYMLINK_FLAG_RELATIVE
#define SYMLINK_FLAG_RELATIVE 0x00000001
#endif
    PyObject *result = NULL;
    char buf[_Py_MAXIMUM_REPARSE_DATA_BUFFER_SIZE];
    _Py_REPARSE_DATA_BUFFER *rdb = (_Py_REPARSE_DATA_BUFFER *)buf;

    BOOL fs_result = FALSE;
    Py_BEGIN_ALLOW_THREADS
    HANDLE hfile = CreateFileW(path->wide, 0, 0, NULL, OPEN_EXISTING,
                               FILE_FLAG_OPEN_REPARSE_POINT |
                               FILE_FLAG_BACKUP_SEMANTICS, NULL);
    if (hfile != INVALID_HANDLE_VALUE) {
        fs_result = DeviceIoControl(hfile, FSCTL_GET_REPARSE_POINT,
                                    NULL, 0, buf, sizeof(buf),
                                    NULL, NULL);
        CloseHandle(hfile);
    }
    Py_END_ALLOW_THREADS

    if (!fs_result) {
        return path_error(path);
    }

    wchar_t *name = NULL;
    Py_ssize_t length;
    BOOL is_absolute;
    if (rdb->ReparseTag == IO_REPARSE_TAG_SYMLINK) {
        name = (wchar_t *)((char *)rdb->SymbolicLinkReparseBuffer.PathBuffer +
                    rdb->SymbolicLinkReparseBuffer.SubstituteNameOffset);
        length = rdb->SymbolicLinkReparseBuffer.SubstituteNameLength /
                    sizeof(wchar_t);
        is_absolute = rdb->SymbolicLinkReparseBuffer.Flags &
                        SYMLINK_FLAG_RELATIVE ? FALSE : TRUE;
    }
    else if (rdb->ReparseTag == IO_REPARSE_TAG_MOUNT_POINT) {
        name = (wchar_t *)((char *)rdb->MountPointReparseBuffer.PathBuffer +
                    rdb->MountPointReparseBuffer.SubstituteNameOffset);
        length = rdb->MountPointReparseBuffer.SubstituteNameLength /
                    sizeof(wchar_t);
        is_absolute = TRUE;
    }

    if (name == NULL) {
        PyErr_SetString(PyExc_ValueError, "not a symbolic link");
    }
    else {
        wchar_t *temp_name = NULL;
        if (is_absolute) {
            assert(name[0] == L'\\');
            if (length >= 4 && wcsncmp(name, L"\\??\\", 4) == 0) {
                name[1] = L'\\';
            }
            else {
                temp_name = PyMem_Malloc((length + 14 + 1) * sizeof(wchar_t));
                if (temp_name == NULL) {
                    return PyErr_NoMemory();
                }
                wcscpy(temp_name, L"\\\\?\\GLOBALROOT");
                wcsncat(temp_name, name, length);
                name = temp_name;
                length += 14;
            }
        }
        result = PyUnicode_FromWideChar(name, length);
        if (result && path->narrow) {
            Py_SETREF(result, PyUnicode_EncodeFSDefault(result));
        }
        if (temp_name) {
            PyMem_Free(temp_name);
        }
    }

    return result;
#endif

@wintermaples
Copy link
Author

Thankyou your advice!
I'll fix mycode based on your code and advice.

We can't rely on the print path because the specification only says what it "SHOULD" be.

I agree your notion. We must use substitute name. But if os.readlink use substitute name without prepending suffix, we cannot resolve these symbolic paths on Python(except read and analyze reparse point on Python directly but it may be bad).
I think we should enhance os.readlink or make new function.

  • Enhance os.readlink without optional args to prepend suffix if absolute
  • Enhance os.readlink with optional args to prepend suffix if absolute
  • Make new function to read SYMLINK_FLAG_RELATIVE Flag is true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants