Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117201: Handle leading // for posixpath.commonpath #117334

Closed
wants to merge 33 commits into from

Conversation

nineteendo
Copy link
Contributor

@nineteendo nineteendo commented Mar 28, 2024

NOTE: this is NOT a duplicate of #117202, the branch has been renamed, causing the old pull request to be closed.

Benchmark:

# test.sh
echo short && python -m timeit -s "import before.posixpath" "before.posixpath.commonpath(['/foo/bar', '/foo/baz'])" && python -m timeit -s "import after_alt.posixpath" "after_alt.posixpath.commonpath(['/foo/bar', '/foo/baz'])" && python -m timeit -s "import after.posixpath" "after.posixpath.commonpath(['/foo/bar', '/foo/baz'])"
echo long && python -m timeit -s "import before.posixpath" "before.posixpath.commonpath(['/usr/lib/python2.6/site-packages/gnome_sudoku/game_selector.py', '/usr/lib/python2.6/site-packages/gnome_sudoku/dialog_swallower.py'])" && python -m timeit -s "import after_alt.posixpath" "after_alt.posixpath.commonpath(['/usr/lib/python2.6/site-packages/gnome_sudoku/game_selector.py', '/usr/lib/python2.6/site-packages/gnome_sudoku/dialog_swallower.py'])" && python -m timeit -s "import after.posixpath" "after.posixpath.commonpath(['/usr/lib/python2.6/site-packages/gnome_sudoku/game_selector.py', '/usr/lib/python2.6/site-packages/gnome_sudoku/dialog_swallower.py'])"
echo many && python -m timeit -s "import before.posixpath" "before.posixpath.commonpath(['/foo/bar', '/foo/baz'] * 10)" && python -m timeit -s "import after_alt.posixpath" "after_alt.posixpath.commonpath(['/foo/bar', '/foo/baz'] * 10)" && python -m timeit -s "import after.posixpath" "after.posixpath.commonpath(['/foo/bar', '/foo/baz'] * 10)"
short
100000 loops, best of 5: 2.52 usec per loop # before
100000 loops, best of 5: 2.92 usec per loop # after without zip
100000 loops, best of 5: 2.91 usec per loop # after
# -> 1.15x slower
long
100000 loops, best of 5: 3.35 usec per loop # before
100000 loops, best of 5: 3.96 usec per loop # after without zip
100000 loops, best of 5: 3.73 usec per loop # after
# -> 1.11x slower
many
20000 loops, best of 5: 11.4 usec per loop # before
20000 loops, best of 5: 16.6 usec per loop # after without zip
20000 loops, best of 5: 16.5 usec per loop # after
# -> 1.45x slower

@nineteendo nineteendo marked this pull request as ready for review March 28, 2024 19:18
@nineteendo nineteendo mentioned this pull request Mar 29, 2024
16 tasks
@barneygale
Copy link
Contributor

Could you please add test cases covering:

commonpath(['//foo', '/foo', 'foo'])
commonpath(['//foo', '/foo'])
commonpath(['//foo', 'foo'])

I think I'd expect all three to raise ValueError with an appropriate message.

@nineteendo
Copy link
Contributor Author

Could you please add test cases covering:

commonpath(['//foo', '/foo'])

Do I have permission to change this test then to 3 slashes? In the current implementation I made sure it wouldn't break.

check(['/usr//local', '//usr/local'], '/usr/local')

Achieving the desired behaviour would be easy:

if min(roots) != max(roots):  # -> not all roots are the same
    raise ValueError

But I'm not sure what the error message would have to be. //foo & /foo are both absolute paths.

@barneygale
Copy link
Contributor

Ah interesting, OK! Preserving existing behaviour for commonpath(['//foo', '/foo']) sounds fine.

@barneygale
Copy link
Contributor

The patch seems larger than necessary to fix the bug. Could something like this work instead?

diff --git a/Lib/posixpath.py b/Lib/posixpath.py
index 4fc02be69b..4cbe1b1367 100644
--- a/Lib/posixpath.py
+++ b/Lib/posixpath.py
@@ -541,10 +541,9 @@ def commonpath(paths):
     try:
         split_paths = [path.split(sep) for path in paths]
 
-        try:
-            isabs, = set(p[:1] == sep for p in paths)
-        except ValueError:
-            raise ValueError("Can't mix absolute and relative paths") from None
+        prefixes = [splitroot(path)[1] for path in paths]
+        if any(prefixes) != all(prefixes):
+            raise ValueError("Can't mix absolute and relative paths")
 
         split_paths = [[c for c in s if c and c != curdir] for s in split_paths]
         s1 = min(split_paths)
@@ -555,7 +554,7 @@ def commonpath(paths):
                 common = s1[:i]
                 break
 
-        prefix = sep if isabs else sep[:0]
+        prefix = min(prefixes)
         return prefix + sep.join(common)
     except (TypeError, AttributeError):
         genericpath._check_arg_types('commonpath', *paths)

@nineteendo
Copy link
Contributor Author

I've tried to keep as much of the old structure as possible. Your code appears to do more work.

@nineteendo
Copy link
Contributor Author

The patch seems larger than necessary to fix the bug. Could something like this work instead?

python -m timeit -s "from test import commonpath1" "commonpath1(('/foo/bar', '/foo/baz'))"; python -m timeit -s "from test import commonpath2" "commonpath2(('/foo/bar', '/foo/baz'))"
100000 loops, best of 5: 3.02 usec per loop # current fix
100000 loops, best of 5: 3.34 usec per loop # your fix
# -> 1.11x slower

@nineteendo
Copy link
Contributor Author

I've made some changes to further reduce the diff. Now the fix is way more readable.

@nineteendo nineteendo requested a review from barneygale April 1, 2024 20:04
Copy link
Contributor

@barneygale barneygale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be faster to sort the roots up front and then check the first and last elements? (Untested).

Lib/posixpath.py Outdated Show resolved Hide resolved
Lib/posixpath.py Show resolved Hide resolved
Lib/posixpath.py Show resolved Hide resolved
@nineteendo
Copy link
Contributor Author

@barneygale, do you think this can be merged now? There don't seem to be any optimisations remaining.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can break user code which expects the result always has a single slash prefix. It is unsafe to backport it. So it should be documented as a new feature.

How does this change affect performance? Please test for short (2 items) and long (e.g. /usr/lib/**) sequences.

Lib/posixpath.py Outdated Show resolved Hide resolved
Lib/test/test_posixpath.py Outdated Show resolved Hide resolved
Lib/posixpath.py Outdated Show resolved Hide resolved
Lib/posixpath.py Outdated Show resolved Hide resolved
Lib/test/test_posixpath.py Outdated Show resolved Hide resolved
Lib/posixpath.py Outdated Show resolved Hide resolved
@nineteendo
Copy link
Contributor Author

How does this change affect performance? Please test for short (2 items) and long (e.g. /usr/lib/**) sequences.

The speed difference is smaller for long paths, but bigger for many paths (20). See, the updated benchmark.
I don't think the latter can be easily fixed.

@erlend-aasland
Copy link
Contributor

This change can break user code which expects the result always has a single slash prefix.

We should be careful about subtle breaking changes like this.

@nineteendo: Ideally, a discussion would happen on the issue first, and then if a consensus on how to proceed has been reached, a PR can be created. Creating a lot of PRs before a decision has been reached tie up a reviewer resources, creating unneeded review churn and CI churn.

@nineteendo nineteendo marked this pull request as draft April 5, 2024 11:31
@nineteendo nineteendo closed this Apr 5, 2024
@nineteendo nineteendo deleted the fix-posixpath.commonpath branch April 6, 2024 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants