-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup posixpath.abspath()
for relative paths
#117587
Comments
No, because it's a Python implementation, it's WAY slower than the old code using But, comparing it with the Python implementation of |
I don't have time to pursue it, but this may achieve what you're describing: diff --git a/Lib/posixpath.py b/Lib/posixpath.py
index 0e8bb5ab10..54d325fe28 100644
--- a/Lib/posixpath.py
+++ b/Lib/posixpath.py
@@ -386,20 +386,13 @@ def normpath(path):
def abspath(path):
"""Return an absolute path."""
- path = os.fspath(path)
- if isinstance(path, bytes):
- if not path.startswith(b'/'):
- path = join(os.getcwdb(), path)
- else:
- if not path.startswith('/'):
- path = join(os.getcwd(), path)
- return normpath(path)
+ return realpath(path, querying=False)
# Return a canonical path (i.e. the absolute location of a file on the
# filesystem).
-def realpath(filename, *, strict=False):
+def realpath(filename, *, strict=False, querying=True):
"""Return the canonical path of the specified filename, eliminating any
symbolic links encountered in the path."""
filename = os.fspath(filename)
@@ -433,7 +426,7 @@ def realpath(filename, *, strict=False):
# Whether we're calling lstat() and readlink() to resolve symlinks. If we
# encounter an OSError for a symlink loop in non-strict mode, this is
# switched off.
- querying = True
+ #querying = True
while rest:
name = rest.pop() You'll need to add a little private helper function to avoid exposing a new |
That's even slower:
We really need a C implementation for this idea. |
Are you planning to write this implementation? If not I'll close this issue - the bug tracker isn't the right place for unproven optimization targets. |
I sadly can not, I don't understand how the C implementation works.
OK, here's a more extreme benchmark to prove this would work:
Let's break down the time loss:
Two observations:
|
Theoretical speedups are not a good use of the issue tracker. Without a patch to discuss there's nothing more to do here. |
OK, here's my suggested patch, add a parameter Lines 2414 to 2415 in 733e56e
+ // Skip start
+ if (start > 0) {
+ path += start;
+ p1 = p2 = minP2 = path;
+ lastC = *(path - 1);
+ }
// Skip leading '.\'
- if (p1[0] == L'.' && IS_SEP(&p1[1])) {
+ else if (p1[0] == L'.' && IS_SEP(&p1[1])) { I would love to try this, but I have not the slightest clue how to add arguments to this function and its callers. |
@eryksun, could you help me with this? |
The implementation of the new I wouldn't expose The PR may as well fix Like the |
Sorry for my misunderstanding of the code, in the Python implementation everything is way clearer. Could something like this work? Lines 2458 to 2459 in 733e56e
+ if (path + start > p1) {
+ p1 = p2 = path + start;
+ lastC = *(p1-1);
+ }
/* if pEnd is specified, check that. Else, check for null terminator */
for (; !IS_END(p1); ++p1) {
In the long run, it should be implemented as a separate function, but first I want to try if this idea works. |
Add a I'd copy |
The idea is working, but I don't know what to do now: /*[clinic input]
os._path_abspath
path: object
start: Py_ssize_t=0
Make path absolute.
[clinic start generated code]*/
static PyObject *
os__path_abspath_impl(PyObject *module, PyObject *path, Py_ssize_t start)
/*[clinic end generated code: output=69e536dbe18ecf3a input=29df0995bc21a9cf]*/
{
if (!PyUnicode_Check(path)) {
PyErr_Format(PyExc_TypeError, "expected 'str', not '%.200s'",
Py_TYPE(path)->tp_name);
return NULL;
}
Py_ssize_t len;
wchar_t *buffer = PyUnicode_AsWideCharString(path, &len);
if (!buffer) {
return NULL;
}
Py_ssize_t abs_len;
wchar_t *abs_path = _Py_normpath_and_size(buffer, len, start, &abs_len);
PyObject *result = PyUnicode_FromWideChar(abs_path, abs_len);
PyMem_Free(buffer);
return result;
} |
Work on getting |
This comment was marked as resolved.
This comment was marked as resolved.
Reading through the posts in the issue, it is not clear to me if there is significant core dev support for this idea. It would surprise me if @barneygale and @serhiy-storchaka: thoughts? |
I reckon it unlikely that It's a slightly dangerous function too - it calls |
Note that I also extended the capabilities of
And @eryksun asked specifically to implement
I just don't like the fact that we're normalising the cwd which is already normalised. Otherwise I definitely wouldn't have implemented it in C (there are other functions which we would benefit more of). And making it faster certainly doesn't hurt, right? I think it's even faster now that I got rid of the Python wrapper. |
We've implemented enough of the implementation in C in terms of |
When adding new code, we try hard to weigh a lot of different factors: performance, usefulness, maintainability, readability, code complexity, the number of lines added or subtracted, and backwards compatibility, to name a few. We also try to understand any opposition to a change; "why is my PR/issue being criticised or rejected?" I do not think this change is worth it. Performance optimisations are nice, but only if the benefits outweigh the added cost. Adding a C implementation of existing Python code needs a really good reason. C code has a lot more maintenance overhead than Python code. If the added C code is already in a state where it needs refactoring, you're directly adding technical dept. Moreover, there has been talk about rewriting C code in Python when the interpreter has become fast enough1. This should also be taken into account when considering to add C code. Footnotes |
I'll leave it for Barney to decide if this issue should be kept open or if it should be closed. |
Sorry @nineteendo, I don't think the C implementation is worth it on balance, for the same reasons as Erlend gave. |
@barneygale, what about implementing |
Hmm, |
No; please do not pollute the already too long list of open PRs. Discuss any possible change in an actionable issue first. If you need to open an experimental PR, do so on your own fork. The CPython repo is not the place for experimentation. |
I have already done that: #119826 |
Feature or enhancement
Proposal:
When normalising a relative path, we don't need to process the cwd as it's already normalised, which could get expensive if it's long:
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs
os.path.abspath
#117855The text was updated successfully, but these errors were encountered: