Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-43757: make pathlib use os.path.realpath() to resolve all symlinks in a path #25264

Merged
merged 23 commits into from
Apr 28, 2021

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Apr 7, 2021

Removes a pathlib-specific implementation of realpath() that's mostly
copied from ntpath and posixpath.

https://bugs.python.org/issue43757

Lib/pathlib.py Show resolved Hide resolved
@barneygale barneygale force-pushed the bpo-43757-accessor-resolve branch from 46b35c6 to 2bb4dd4 Compare April 7, 2021 22:18
@barneygale
Copy link
Contributor Author

Summary of the differences between os.path.realpath() and pathlib.Path.resolve().

Before this patch:

symlink loop missing ancestors
os.path.realpath(p) return path muddle on!
p.resolve(strict=False) raise RuntimeError muddle on!
p.resolve(strict=True) raise RuntimeError raise OSError

After this patch:

symlink loop missing ancestors
os.path.realpath(p, strict=False) raise OSError muddle on!
os.path.realpath(p, strict=True) raise OSError raise OSError
p.resolve(strict=False) raise RuntimeError muddle on!
p.resolve(strict=True) raise RuntimeError raise OSError

@eryksun
Copy link
Contributor

eryksun commented Apr 8, 2021

Note that ntpath._getfinalpathname_nonstrict() currently ignores ERROR_CANT_RESOLVE_FILENAME (1921) in order to match the behavior of posixpath.realpath() for a symlink loop. It has to be removed from the list of allowed errors if you want strict=False to raise OSError for a reparse loop.

@barneygale barneygale force-pushed the bpo-43757-accessor-resolve branch from b6ca430 to 984a6fc Compare April 8, 2021 01:59
@barneygale
Copy link
Contributor Author

Note that ntpath._getfinalpathname_nonstrict() currently ignores ERROR_CANT_RESOLVE_FILENAME (1921) in order to match the behavior of posixpath.realpath() for a symlink loop. It has to be removed from the list of allowed errors if you want strict=False to raise OSError for a reparse loop.

Thanks. I've made this adjustment and also rebased to fix conflicts.

@barneygale
Copy link
Contributor Author

barneygale commented Apr 8, 2021

Looks like the realpath() symlink loop behaviour dates back to bpo-930024:

Create a symlink pointing to itself: ln -s infinite infinite
/home/amk/infinite is a perfectly good path, though it can't be followed.

This doesn't agree with my expectations, nor with the first sentence of the function documentation:

Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path

A symlink loop doesn't seem like a good reason to break the bolded guarantee.

Lib/pathlib.py Outdated Show resolved Hide resolved
@barneygale
Copy link
Contributor Author

Path.resolve() currently raises a RuntimeError when it hits a symlink loop. Arguably an OSError would make more sense, and would conform with the proposed realpath() behaviour. Any thoughts on whether it's worth changing?

Lib/test/test_ntpath.py Outdated Show resolved Hide resolved
@barneygale barneygale marked this pull request as ready for review April 8, 2021 20:39
@barneygale barneygale changed the title WIP: bpo-43757: make pathlib use os.path.realpath() to resolve all symlinks in a path bpo-43757: make pathlib use os.path.realpath() to resolve all symlinks in a path Apr 8, 2021
@eryksun
Copy link
Contributor

eryksun commented Apr 8, 2021

Path.resolve() currently raises a RuntimeError when it hits a symlink loop. Arguably an OSError would make more sense, and would conform with the proposed realpath() behaviour. Any thoughts on whether it's worth changing?

I have nothing useful to add. I think @serhiy-storchaka is the person to ask.

Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not declaring an opinion yet on the behaviour change, but I think it's probably okay. I'd be happier if the link cycle change was also tied to the strict argument. (My main concern is that virtually nobody will be testing for it, and it can become an issue based entirely on user's setup - nothing to do with the code that will fail.)

Doc/library/os.path.rst Outdated Show resolved Hide resolved
Lib/test/test_ntpath.py Outdated Show resolved Hide resolved
@barneygale
Copy link
Contributor Author

Not declaring an opinion yet on the behaviour change, but I think it's probably okay. I'd be happier if the link cycle change was also tied to the strict argument. (My main concern is that virtually nobody will be testing for it, and it can become an issue based entirely on user's setup - nothing to do with the code that will fail.)

I can make the new symlink loop handling optional, but in order to use it from pathlib it would need to be an additional argument, as pathlib.Path.resolve() always raises on broken symlinks regardless of the value of strict. So I'd need to adjust this patch to add two arguments to realpath(), which I'm happy to do if we feel it's the way forward.

I'd argue that the new behaviour is justified because the current behaviour seems entirely non-useful. I can't find another example of a realpath function/executable that doesn't error out when it hits a symlink loop - even PHP's version seems to return false in this case. I suspect the system misconfiguration in your example will result in an exception regardless of this change, as the result of realpath() will be fed into something that will barf on the symlink loop. Better to fail earlier?

Lib/ntpath.py Show resolved Hide resolved
@zooba
Copy link
Member

zooba commented Apr 12, 2021

I suspect the system misconfiguration in your example will result in an exception regardless of this change, as the result of realpath() will be fed into something that will barf on the symlink loop. Better to fail earlier?

This is probably true, but I don't want to add an earlier (and less expected) failure without more warning. Raising a warning in this case but not an error would be fine for 3.10 and 3.11, and then we could confidently make it raise an error in 3.12. That gives devs a few years to realise that an error might be occurring here and handle it properly (or choose to suppress it explicitly).

@barneygale
Copy link
Contributor Author

Sounds very reasonable, thanks! I'll revise this patch.

@barneygale
Copy link
Contributor Author

barneygale commented Apr 12, 2021

Some options for how we could retain the current realpath() behaviour and still remove the duplicate implementation in pathlib:

  1. We could make Path.resolve() match realpath() behaviour, i.e. don't raise for symlink loops in non-strict mode.
    • Just moves the behaviour change elsewhere
    • Seems like a step backwards
  2. We could add another argument to realpath() that controls whether to raise on symlink loops. strict_loops? raise_on_symlink_loop?
    • Makes the realpath() signature a bit unwieldy
    • Hard to document a reasonable use case for strict_loops=False
  3. We could perform some post-processing of the return value of realpath() in Path.resolve(). Basically lstat() every ancestor, and if we detect a symlink, raise an error.
    • Entails making additional lstat() calls, which could be slower

Leaning towards option 3. Any thoughts, or options I've missed?

@barneygale
Copy link
Contributor Author

barneygale commented Apr 13, 2021

OK, I've updated this such that there's no change in non-strict realpath() behaviour (the default), so this should be fully backwards-compatible.

Before this PR:

symlink loop other OSError
os.path.realpath(p) muddle on muddle on
p.resolve() raise RuntimeError muddle on
p.resolve(strict=True) raise RuntimeError raise OSError

After this PR:

symlink loop other OSError
os.path.realpath(p) muddle on muddle on
os.path.realpath(p, strict=True) raise OSError raise OSError
p.resolve() raise RuntimeError muddle on
p.resolve(strict=True) raise RuntimeError raise OSError

(by "muddle on" I mean append the remaining path components and return)

@barneygale barneygale force-pushed the bpo-43757-accessor-resolve branch from 8ee02f2 to df04357 Compare April 24, 2021 21:15
Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this now, just want to wait for one more approval before merging.

@zooba
Copy link
Member

zooba commented Apr 28, 2021

Going to interpret silence as approval. Good work! Thanks for this patch.

@zooba zooba merged commit baecfbd into python:master Apr 28, 2021
@barneygale
Copy link
Contributor Author

Thanks very much for your help!

kfollstad added a commit to kfollstad/cpython that referenced this pull request May 6, 2021
Bug was fixed by pythonGH-25264 which, for other reasons, moved to a system
independent resolve. Adds tests for verifying resolve() on cases other
than for symlinks. New cases include relative and absolute paths with
and without dotted (./..) paths. Also renames test helper function to
avoid name overloading and adds new REL_PATH test constant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants