Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add os.listdrives on Windows #102519

Closed
zooba opened this issue Mar 8, 2023 · 12 comments
Closed

Add os.listdrives on Windows #102519

zooba opened this issue Mar 8, 2023 · 12 comments
Assignees
Labels
3.12 bugs and security fixes OS-windows type-feature A feature request or enhancement

Comments

@zooba
Copy link
Member

zooba commented Mar 8, 2023

We don't currently have a way to get all the root directories on Windows, since os.listdir('/') doesn't contain everything.

I propose a very simple API:

>>> os.listdrives()
["C:\\", "D:\\", ...]
>>> os.listdrives(uuids=True)
["\\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}\", ...]

Basically, listdrives(uuids=True) would return everything found by FindNextVolume, while listdrives() would apply GetVolumePathNamesForVolumeName to each GUID and return all of those. (This may return the same volume multiple times under different names, which is fine - os.stat can be used to see if they're the same st_dev.)

There's an endless amount of variations we could also apply here, but the most important functionality in my mind is to expose the OS API. App developers at least then have a starting point to do whatever they may need.

Linked PRs

@zooba zooba added type-feature A feature request or enhancement OS-windows 3.12 bugs and security fixes labels Mar 8, 2023
@zooba zooba self-assigned this Mar 8, 2023
@zooba
Copy link
Member Author

zooba commented Mar 8, 2023

Ping @eryksun in case you have any thoughts :)

@eryksun
Copy link
Contributor

eryksun commented Mar 8, 2023

This would really be a listvolumes() function. In particular, it would list volume devices that are registered with the mountpoint manager, but not all defined logical drives (e.g. not mapped drives or subst drives).

The mountpoint manager was added to the Windows kernel over 20 years ago, but there are still some legacy drivers that don't support it. Instead, they manually create their own drive-letter names. The ImDisk RAM disk driver does this, IIRC, which makes GetFinalPathNameByHandleW() fail. Such legacy volumes would not be included in the list.

I presume if a volume has no DOS path name, then it won't be included in the list, unless guid_name is true.

(This may return the same volume multiple times under different names, which is fine - os.stat can be used to see if they're the same st_dev.)

A volume can only be assigned one drive-letter name, a policy that's enforced by IOCTL_MOUNTMGR_CREATE_POINT. The drive-letter name should be used if the volume has one. This should be first in the list, but I'd check the whole list to be certain. Otherwise use the first path in the list (e.g. "\\?\C:\Mount\SpamVolume\"). This is the canonical DOS path for the volume, which is what GetFinalPathNameByHandleW() uses.

If you don't mind diverging from the latter, then use the first path that exists. The problem is that it's possible to mount a volume on a directory that a standard user is allowed to delete. RemoveDirectoryW() tries to call DeleteVolumeMountPointW(), but only an administrator is allowed to update the kernel mountpoint manager. Thus, if a standard user deletes the canonical mount point, then GetFinalPathNameByHandleW() will return a path that doesn't exist. It should be corrected after a reboot.


Regarding drives in general

Logical drives "[A-Z]:" are an abstraction implemented by symbolic links in the object namespace. A logical drive can target an object path that traverses only directory objects and symbolic link objects up to a named device object, with possibly a remaining path that's traversed by the I/O manager (via IRP_MJ_CREATE) -- e.g. "\??\W:" -> "\??\C:\Windows" -> "\Device\HarddiskVolume2\Windows". If the target path includes a remaining path on the device object, it's usually called a mapped drive or substitute drive, though generally speaking all logical drives are mapped.

Every thread has an associated logon session based on its effective token (either the process token or an impersonation token). Every logon session has an associated device mapping, which currently has the following structure:

lkd> dt nt!_DEVICE_MAP
   +0x000 DosDevicesDirectory : Ptr64 _OBJECT_DIRECTORY
   +0x008 GlobalDosDevicesDirectory : Ptr64 _OBJECT_DIRECTORY
   +0x010 ServerSilo       : Ptr64 _EJOB
   +0x018 GlobalDeviceMap  : Ptr64 _DEVICE_MAP
   +0x020 DriveObject      : [26] _EX_FAST_REF
   +0x0f0 ReferenceCount   : Int8B
   +0x0f8 DosDevicesDirectoryHandle : Ptr64 Void
   +0x100 DriveMap         : Uint4B
   +0x104 DriveType        : [32] UChar

A logon session's device mapping contains references to

  • the local object directory for DOS device mappings1 (e.g. "\Sessions\0\DosDevices\<logon session ID>");
  • the system's global object directory for DOS device mappings (e.g. "\GLOBAL??");
  • possibly a silo (job object) that implements a container (e.g. for Docker);
  • the global device map for the SYSTEM logon session (i.e. logon ID 0x3E7).

It also contains

  • DriveMap, a bitmap of the defined logical drives that are local to the logon session;
  • DriveType, a corresponding array of drive types that match the types used by WinAPI GetDriveTypeW():
    • DRIVE_UNKNOWN,
    • DRIVE_FIXED,
    • DRIVE_REMOVABLE,
    • DRIVE_CDROM,
    • DRIVE_RAMDISK,
    • DRIVE_REMOTE;
  • DriveObject, a corresponding array of references to volume device objects, if cached.

The bitmap of defined logical drives can be queried via GetLogicalDrives() or GetLogicalDriveStringsW(). Whenever the object manager adds or removes a DOS drive-letter name of the form "X:" in a device-mapping directory, it updates the device mapping in the logon session to set or clear the corresponding bit in the drive bitmap. This could be due to

  • a volume coming online or going offline that has a registered or automatically assigned logical drive;
  • assigning or deleting a logical drive via "mountvol.exe", SetVolumeMountPointW(), or DeleteVolumeMountPointW();
  • creating or removing a mapped drive via "net.exe", NetUseAdd(), NetUseDel(), WNetUseConnectionW(), or WNetCancelConnection2W();
  • creating or removing a substitute drive via subst.exe or DefineDosDeviceW().

For example:

>>> logical_drives = win32api.GetLogicalDriveStrings().split('\0')
>>> 'W:\\' in logical_drives
False
>>> win32file.DefineDosDevice(0, 'W:', 'C:\\Windows')
>>> logical_drives = win32api.GetLogicalDriveStrings().split('\0')
>>> 'W:\\' in logical_drives
True

Note that a defined drive doesn't necessarily resolve to an existing path. The following example creates an "X:" drive that resolves to a path that doesn't exist.

>>> 'Z:\\' in logical_drives
False
>>> win32file.DefineDosDevice(0, 'X:', 'Z:\\')
>>> logical_drives = win32api.GetLogicalDriveStrings().split('\0')
>>> 'X:\\' in logical_drives
True

Footnotes

  1. The term "DOS" (Disk Operating System) in this context originally referred to MS-DOS. However, the system's DOS device mapping is not for just devices related to disks and drives, and it's not just for the names of classic MS-DOS devices such as "CON", "PRN", and "AUX". It implements the canonical and registered names of all system devices. Even in native NT programming, it's preferred to use persistent "\??\" device names instead of using non-persistent names in "\Device".

@zooba
Copy link
Member Author

zooba commented Mar 8, 2023

This would really be a listvolumes() function.

I agree, but I think user expectations win out.

What if listvolumes() returned the GUID names and listdrives() returned the logical names from GetLogicalDriveStrings? listvolumes(mountpoints=True) could return the resolved names.

The drive-letter name should be used if the volume has one.

Surely the use depends on the user's intent? I'm keen to include the mount point so that code like this can find the longest matching segment (in particular, because this enables user-friendly path handling that realpath breaks):

device_root = next(p for p in pathlib.Path(x).parents if str(p) in os.listdrives()) # listvolumes(mountpoints=True)

You could write something similar with stat() and return the last parent with the same st_dev as the full path, I guess.

(I know you didn't suggest this, but I'm quite happy to leave defining aliases to other libraries. We only really need querying, since without this it's impossible to enumerate all files on the system.)

@eryksun
Copy link
Contributor

eryksun commented Mar 8, 2023

What if listvolumes() returned the GUID names and listdrives() returned the logical names from GetLogicalDriveStrings?

That seems reasonable.

listvolumes(mountpoints=True) could return the resolved names.

If listvolumes(mountpoints=True) returns all of the volume mountpoints (i.e. the paths that are mounted by each volume's root directory), then I'd prefer to return a dict {volume_name0: [mountpoint0, ...], ...}. With mountpoints=False it would return a list of strings [volume_name0, ...], or maybe a set {volume_name0, ...}.

@zooba
Copy link
Member Author

zooba commented Mar 8, 2023

If listvolumes(mountpoints=True) returns all of the volume mountpoints ... then I'd prefer to return a dict

I thought you would, and on some level I'd prefer it too 😄 But I definitely don't want changing return types from a Boolean argument, and the list part of the name really does suggest it should be a list.

Maybe it should just be a three part API? listvolumes(), listmounts(volume=None), listdrives()?

@eryksun
Copy link
Contributor

eryksun commented Mar 8, 2023

I thought you would, and on some level I'd prefer it too smile But I definitely don't want changing return types from a Boolean argument, and the list part of the name really does suggest it should be a list.

I first considered returning a list of tuples [(volume_name0, [mountpoint0, ...]), ...], but that would be too awkward, and unlike a dict, it wouldn't default to iterating over the sequence of volume names.

I suppose listing all of the mountpoints without grouping is similar to what you'd get on Linux, listing mountpoints via glibc setmntent(), getmntent(), and endmntent(). The latter yields a record that includes the device, mount path, filesystem type, and mount options. Something similar on Windows would emit the volume GUID name, mount path, and information from GetVolumeInformationW(). If the latter fails because there's no mounted filesystem (i.e. ERROR_UNRECOGNIZED_VOLUME), the volume could be omitted from listmounts() because there's no filesystem mounted on it. The information from GetVolumeInformationW() includes the volume name (label), volume serial number, filesystem type (e.g. "NTFS"), filesystem flags (e.g. FILE_SUPPORTS_REPARSE_POINTS), and maximum component length (usually 255; but may be less, e.g. 110 for CDFS Joliet).

@zooba
Copy link
Member Author

zooba commented Mar 9, 2023

The information from GetVolumeInformationW() includes the volume name (label), volume serial number, filesystem type (e.g. "NTFS"), filesystem flags (e.g. FILE_SUPPORTS_REPARSE_POINTS), and maximum component length (usually 255; but may be less, e.g. 110 for CDFS Joliet).

A DirEntry-style record would make sense, though we really don't have any precedent for it. I think what I've proposed is a reasonable minimum to match os.listdir('/') on POSIX, but unless there's a way to get the mount info on POSIX I haven't seen then I don't think that's filling the same gap.

scandir was a whole PEP. So a similar API for mounts would probably be as well.

@eryksun
Copy link
Contributor

eryksun commented Mar 9, 2023

A DirEntry-style record would make sense, though we really don't have any precedent for it. I think what I've proposed is a reasonable minimum to match os.listdir('/') on POSIX, but unless there's a way to get the mount info on POSIX I haven't seen then I don't think that's filling the same gap

POSIX doesn't specify a way to list mountpoints. As mentioned, glibc on Linux provides an abstraction for iterating the contents of the mount table, "/proc/self/mounts" (or use the old name, "/etc/mtab", which is a symlink now). Open a FILE stream for the table via fp = setmntent("/proc/self/mounts", "r"); iterate the mntent records via getmntent(fp) or getmntent_r(fp, mntbuf, buf, buflen); and close the FILE stream via endmntent(fp).

I haven't tested, but BSD and macOS apparently support similar functionality via getmntinfo64(), which returns an array of statfs64 records.

In common on Windows, Linux, and macOS/BSD, we could return the device name, mount path, and filesystem type. Extra platform data could also be returned, such as the mount options on Linux; flags on macOS/BSD; and filesystem flags on Windows.

@ambv
Copy link
Contributor

ambv commented Mar 14, 2023

Hi there, the bigmem buildbot is failing funnily with:

Traceback (most recent call last):
  File "R:\buildarea\3.x.ambv-bb-win11.bigmem\build\Lib\test\test_os.py", line 2686, in test_listmounts
    mounts = os.listmounts(volume)
             ^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified

See example failure:
https://buildbot.python.org/all/#/builders/1079/builds/843/steps/4/logs/stdio

@eryksun
Copy link
Contributor

eryksun commented Mar 14, 2023

cpython/Lib/test/test_os.py

Lines 2684 to 2691 in d77c487

def test_listmounts(self):
for volume in os.listvolumes():
mounts = os.listmounts(volume)
self.assertIsInstance(mounts, list)
self.assertSetEqual(
set(mounts),
self.known_mounts & set(mounts),
)

This test should ignore FileNotFoundError if the volume name doesn't exist. Maybe the mountpoint manager database is stale, or maybe there's a race condition with a volume going offline right as or right after os.listvolumes() is called.

The test should also ignore an error due to a raw volume that has no filesystem (i.e. ERROR_UNRECOGNIZED_VOLUME, 1005). Or maybe os_listmounts_impl() in "Modules/posixmodule.c" should be modified to return an empty list in this case. But definitely listmounts() shouldn't call GetVolumePathNamesForVolumeNameW() on a raw volume. The mountpoint manager can return registered volume path names for the root path of a volume that isn't actually mounted by a filesystem (via IRP_MJ_FILE_SYSTEM_CONTROL: IRP_MN_MOUNT_VOLUME). It's the difference between asking for "mount points" vs "volume path names".

@zooba
Copy link
Member Author

zooba commented Mar 14, 2023

Yeah, that's real weird. I'll handle errors in the test for now, because I'm not sure what the right errors to suppress in the function would be. Seems like unsupported volumes ought to raise an error, but we definitely want to just skip over those for the test.

@ambv
Copy link
Contributor

ambv commented Mar 15, 2023

This latest PR fixed the problem, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes OS-windows type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants