Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Windows performance of pathlib.Path.is_file and friends #101357

Closed
mdboom opened this issue Jan 26, 2023 · 14 comments
Closed

Improve Windows performance of pathlib.Path.is_file and friends #101357

mdboom opened this issue Jan 26, 2023 · 14 comments
Assignees
Labels
3.14 new features, bugs and security fixes OS-windows performance Performance or resource usage stdlib Python modules in the Lib dir topic-pathlib type-feature A feature request or enhancement

Comments

@mdboom
Copy link
Contributor

mdboom commented Jan 26, 2023

Feature or enhancement

We could implement / reuse the same fast paths that we did for os.path.* in #101324 for pathlib.Path.*. It will require a little more work, since the pathlib.Path API can return OSError under certain circumstances, whereas os.path would return False in those cases.

Pitch

We saw a 12-25% speedup doing this for os.path. It would be unfortunate if people chose the lower-level API for performance reasons because pathlib.Path didn't match it in performance.

Linked PRs

@mdboom mdboom added type-feature A feature request or enhancement performance Performance or resource usage OS-windows labels Jan 26, 2023
@barneygale
Copy link
Contributor

It would be a shame to lose the generic implementations in pathlib -- they'll be useful in a future AbstractPath class where stat() is abstract.

@eryksun
Copy link
Contributor

eryksun commented Jan 26, 2023

@barneygale, wouldn't this enhancement entail WindowsPath overriding the base Path implementations?

@barneygale
Copy link
Contributor

We support subclassing as of #31691, which means that a user on Windows can do this:

class MyClass(pathlib.Path):
    pass

P = MyClass()

... and expect that p works exactly as if it was a WindowsPath. But it's not actually a subclass of WindowsPath - its behaviour is controlled entirely by the value of its _flavour attribute.

Hence I'm hesistant to add things to WindowsPath and PosixPath.

barneygale added a commit to barneygale/cpython that referenced this issue Jan 27, 2023
The following `pathlib.Path` methods now directly call functions in
`os.path`:

- `samefile()`
- `exists()`
- `is_dir()`
- `is_file()`
- `is_symlink()`

The original generic implementations of these methods are preserved in a
new, internal `_PathWithStat` class. In future this may form the basis of
a public `AbstractPath` class. The generic implementation of `is_mount()`,
which was removed in 29650fea, is also restored to this class.
barneygale added a commit to barneygale/cpython that referenced this issue Jan 27, 2023
The following `pathlib.Path` methods now directly call functions in
`os.path`:

- `samefile()`
- `exists()`
- `is_dir()`
- `is_file()`
- `is_symlink()`

In conjunction with python#101324, this improves performance of these methods
by using making fewer system calls than `stat()`

The original generic implementations of these methods are preserved in a
new, internal `_PathWithStat` class. In future this may form the basis of
a public `AbstractPath` class. The generic implementation of `is_mount()`,
which was removed in 29650fea, is also restored to this class.
@eryksun
Copy link
Contributor

eryksun commented Jan 27, 2023

We can do like what you implemented for os.path.realpath(). Add a strict=False keyword-only parameter to exists(), isfile(), isdir(), islink(), and isjunction(). In strict mode it ignores only a select set of errors. Move the definition of _ignore_error() into ntpath and posixpath, with _IGNORED_ERRNOS defined in genericpath.

@barneygale
Copy link
Contributor

barneygale commented Jan 27, 2023

👍 that sounds good to me! Just as long as we adjust the pathlib.Path code and not WindowsPath specifically if that's OK. Hopefully my draft PR illustrates what I mean here.

@eryksun
Copy link
Contributor

eryksun commented Jan 27, 2023

Should nt._exists() and the others be modified to raise an exception that gets handled by a wrapper function such as ntpath.exists()? Or should we have a list of ignored error codes in the C implementation that's kept in sync with _IGNORED_ERRNOS and _IGNORED_WINERRORS? I guess it would help to benchmark the performance cost of raising and handling an exception for common errors that should always be handled by returning False, such as ERROR_FILE_NOT_FOUND and ERROR_PATH_NOT_FOUND, compared to simply returning False from the builtin function.

This should be the complete set of ignored Windows error codes:

// ENOENT
ERROR_FILE_NOT_FOUND
ERROR_PATH_NOT_FOUND
ERROR_INVALID_DRIVE
ERROR_NO_MORE_FILES
ERROR_BAD_NETPATH
ERROR_BAD_NET_NAME
ERROR_BAD_PATHNAME
ERROR_FILENAME_EXCED_RANGE

// ENOTDIR
ERROR_DIRECTORY

// EBADF
ERROR_INVALID_HANDLE
ERROR_INVALID_TARGET_HANDLE
ERROR_DIRECT_ACCESS_HANDLE

// ELOOP
ERROR_CANT_RESOLVE_FILENAME

// others
ERROR_NOT_READY
ERROR_INVALID_NAME

@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 23, 2023
@barneygale
Copy link
Contributor

I wouldn't mind if the is_*() methods were changed to suppress all OSErrors. The current selection is undocumented and fairly arbitrary. Users can call stat() if they need an exception to be raised.

@barneygale
Copy link
Contributor

barneygale commented Jan 26, 2024

Oh, but we'd need to add optional follow_symlinks arguments to os.path.exists(), isfile() and isdir(), in order to support the equivalent arguments in pathlib methods. Or we'd need to add an islink() check to each of those pathlib methods, which would slow things back down.

@zooba
Copy link
Member

zooba commented Jan 29, 2024

Or switch function called in the pathlib methods based on that parameter.

In newer versions of Windows, stat is going to be almost as fast as the "fast path" functions (and only because we updated the fast path functions to use the faster stat call, or else they'd be slower), so eventually it won't matter too much which call is used. Not following symlinks should always be faster anyway.

barneygale added a commit to barneygale/cpython that referenced this issue Apr 24, 2024
…`is_*()`

Suppress all `OSError` exceptions from `pathlib.Path.exists()` and `is_*()`
rather than a selection of more common errors as we do presently. Also
adjust the implementations to call `os.path.exists()` etc, which are much
faster on Windows thanks to pythonGH-101196.
@barneygale barneygale added the 3.14 new features, bugs and security fixes label May 8, 2024
barneygale added a commit to barneygale/cpython that referenced this issue May 8, 2024
@barneygale
Copy link
Contributor

PR: #118243

@barneygale barneygale self-assigned this May 11, 2024
barneygale added a commit that referenced this issue May 14, 2024
…)` (#118243)

Suppress all `OSError` exceptions from `pathlib.Path.exists()` and `is_*()`
rather than a selection of more common errors as we do presently. Also
adjust the implementations to call `os.path.exists()` etc, which are much
faster on Windows thanks to GH-101196.
@barneygale
Copy link
Contributor

Resolved in 3.14 / fbe6a09 / #118243. Thanks everyone.

@fireattack
Copy link
Contributor

fireattack commented May 14, 2024

Apologize for my confusion, it has been a while since I followed this ticket. I was wondering if someone could clarify a little bit about the situation.

From what I can see, the closing commit/PR only seems to change behaviors about OSErrors.
Would it still improve the performance of pathlib.Path.is_file (i.e. what this ticket is about)?

@zooba's comment seems to imply that the improvement was already made at other places, but needing "newer version of Windows".

In other words, if I upgraded to 3.14 in future, but have to use "older" version of Windows (for example. Win10), would I still get any notifiable performance improvement on pathlib.Path.is_file after this patch?

Thanks in advance!

@barneygale
Copy link
Contributor

From what I can see, the closing commit/PR only seems to change behaviors about OSErrors. Would it still improve the performance of pathlib.Path.is_file (i.e. what this ticket is about)?

The change of OSError handling means the pathlib methods are now compatible with the os.path functions, and so we now call those functions:

def exists(self, *, follow_symlinks=True):
"""
Whether this path exists.
This method normally follows symlinks; to check whether a symlink exists,
add the argument follow_symlinks=False.
"""
if follow_symlinks:
return os.path.exists(self)
return os.path.lexists(self)
def is_dir(self, *, follow_symlinks=True):
"""
Whether this path is a directory.
"""
if follow_symlinks:
return os.path.isdir(self)
return PathBase.is_dir(self, follow_symlinks=follow_symlinks)
def is_file(self, *, follow_symlinks=True):
"""
Whether this path is a regular file (also True for symlinks pointing
to regular files).
"""
if follow_symlinks:
return os.path.isfile(self)
return PathBase.is_file(self, follow_symlinks=follow_symlinks)
def is_mount(self):
"""
Check if this path is a mount point
"""
return os.path.ismount(self)
def is_symlink(self):
"""
Whether this path is a symbolic link.
"""
return os.path.islink(self)
def is_junction(self):
"""
Whether this path is a junction.
"""
return os.path.isjunction(self)

@fireattack
Copy link
Contributor

Thank you, now I understand.

estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…`is_*()` (python#118243)

Suppress all `OSError` exceptions from `pathlib.Path.exists()` and `is_*()`
rather than a selection of more common errors as we do presently. Also
adjust the implementations to call `os.path.exists()` etc, which are much
faster on Windows thanks to pythonGH-101196.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.14 new features, bugs and security fixes OS-windows performance Performance or resource usage stdlib Python modules in the Lib dir topic-pathlib type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

7 participants