Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pathlib strips trailing slash #65238

Closed
hvenev mannequin opened this issue Mar 23, 2014 · 21 comments
Closed

pathlib strips trailing slash #65238

hvenev mannequin opened this issue Mar 23, 2014 · 21 comments
Labels
stdlib Python modules in the Lib dir topic-pathlib type-bug An unexpected behavior, bug, or error

Comments

@hvenev
Copy link
Mannequin

hvenev mannequin commented Mar 23, 2014

BPO 21039
Nosy @akuchling, @pitrou, @serhiy-storchaka, @sigmavirus24, @hvenev
Files
  • pathlib.patch: patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-04-14.16:54:25.933>
    created_at = <Date 2014-03-23.13:55:56.401>
    labels = ['type-bug', 'library']
    title = 'pathlib strips trailing slash'
    updated_at = <Date 2015-04-14.16:54:25.932>
    user = 'https://github.com/hvenev'

    bugs.python.org fields:

    activity = <Date 2015-04-14.16:54:25.932>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-04-14.16:54:25.933>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2014-03-23.13:55:56.401>
    creator = 'h.venev'
    dependencies = []
    files = ['34586']
    hgrepos = []
    issue_num = 21039
    keywords = ['patch']
    message_count = 15.0
    messages = ['214581', '214594', '214595', '214596', '214597', '214598', '214599', '214600', '214601', '214604', '224889', '224942', '240930', '240947', '240948']
    nosy_count = 7.0
    nosy_names = ['akuchling', 'pitrou', 'BreamoreBoy', 'serhiy.storchaka', 'icordasc', 'h.venev', 'ischwabacher']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue21039'
    versions = ['Python 3.4', 'Python 3.5']

    Blocks issues

    Linked PRs

    @hvenev
    Copy link
    Mannequin Author

    hvenev mannequin commented Mar 23, 2014

    Some programs' behavior is different depending on whether the path has a trailing slash or not. Examples include ls, cp, mv, ln, rm and rsync. URL paths may also behave differently. For example http://xkcd.com/1 redirects to http://xkcd.com/1/

    Boost.Filesystem's path class also supports trailing slashes in paths. C++'s filesystem library proposal is also based on Boost.Filesystem.

    @hvenev hvenev mannequin added the stdlib Python modules in the Lib dir label Mar 23, 2014
    @pitrou
    Copy link
    Member

    pitrou commented Mar 23, 2014

    Yes, this is by design. The occasional difference between slash-ended and non-slash-ended paths is unexpected and potentially confusing. Moreover, it's not a property of the OS itself - it's just some syntactic sugar to enable an option such as resolving symlinks. pathlib paths represent filesystem paths, not arbitrary shell arguments.

    Similarly, pathlib doesn't have special processing for "~someuser" parts.

    (as for URL paths, they are not part of the design space of pathlib)

    @pitrou
    Copy link
    Member

    pitrou commented Mar 23, 2014

    Closing as rejected, sorry.

    @pitrou pitrou closed this as completed Mar 23, 2014
    @hvenev
    Copy link
    Mannequin Author

    hvenev mannequin commented Mar 23, 2014

    What about OpenVMS?

    @pitrou
    Copy link
    Member

    pitrou commented Mar 23, 2014

    Can you elaborate? Python hasn't supported VMS for quite some time...

    @hvenev
    Copy link
    Mannequin Author

    hvenev mannequin commented Mar 23, 2014

    AFAIK paths on OpenVMS are represented in a strange way. [dir.subdir]filename is a path for a file and [dir.subdir.anothersubdir] is a path for a directory.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 23, 2014

    Then I'm afraid the current Path classes won't do a good job of representing them :-)

    But as I said, Python probably doesn't run on VMS anymore, so this is a rather theoretical problem. Maybe if some day Python supports VMS again, someone can contribute a VMSPath implementation.

    @hvenev
    Copy link
    Mannequin Author

    hvenev mannequin commented Mar 23, 2014

    Or maybe URLPath?

    @pitrou
    Copy link
    Member

    pitrou commented Mar 23, 2014

    Or maybe URLPath?

    I'm skeptical about that. I think someone should first prototype a
    PureURLPath and maybe publish it on PyPI.
    (as for the non-pure variant, URLPath, it doesn't seem to make sense)

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Mar 23, 2014

    PEP-11 states that VMS was unsupported in 3.3. Code was removed from 3.4 via bpo-16136.

    @ischwabacher
    Copy link
    Mannequin

    ischwabacher mannequin commented Aug 5, 2014

    This may be only syntactic sugar, but it is POSIX-specified syntactic sugar: according to http://pubs.opengroup.org/onlinepubs/9699919799/. trailing slashes in pathnames are semantically meaningful in pathname resolution. Tilde escapes are not mentioned.

    4.12 Pathname Resolution
    ========================

    [...]

    A pathname that contains at least one non- <slash> character and that ends with one or more trailing <slash> characters shall not be resolved successfully unless the last pathname component before the trailing <slash> characters names an existing directory or a directory entry that is to be created for a directory immediately after the pathname is resolved. Interfaces using pathname resolution may specify additional constraints[1] when a pathname that does not name an existing directory contains at least one non- <slash> character and contains one or more trailing <slash> characters.

    If a symbolic link is encountered during pathname resolution, the behavior shall depend on whether the pathname component is at the end of the pathname and on the function being performed. If all of the following are true, then pathname resolution is complete:

    1. This is the last pathname component of the pathname.
    
    2. The pathname has no trailing <slash>.
    
    3. The function is required to act on the symbolic link itself, or certain arguments direct that the function act on the symbolic link itself.
    

    In all other cases, the system shall prefix the remaining pathname, if any, with the contents of the symbolic link. [...]

    @pitrou
    Copy link
    Member

    pitrou commented Aug 6, 2014

    Isaac, thanks for the reference. I'm reopening the issue for discussion (although I'm still not convinced this would be actually a good thing).
    May I ask you to post on the python-dev mailing-list for further feedback?

    @pitrou pitrou reopened this Aug 6, 2014
    @serhiy-storchaka serhiy-storchaka added the type-bug An unexpected behavior, bug, or error label Aug 6, 2014
    @akuchling
    Copy link
    Member

    The general mood on python-dev seemed to be that the trailing slash shouldn't be normalized. Can this still be fixed, or is it too late since pathlib was shipped in 3.4?

    The python-dev discussion was at https://mail.python.org/pipermail/python-dev/2014-August/135670.html

    @pitrou
    Copy link
    Member

    pitrou commented Apr 14, 2015

    I beg to disagree :) Pathlib tries to find a compromise between user-friendliness and power, but it's definitely more on the user-friendliness side than, say, the os module APIs. In other words, I don't think it's a problem if not all details of OS semantics can be encoded in Path objects. In practice, the situations where it's useful to make a difference between a slash-ending path and a non-slash-ending path are few and far between.

    There are all kinds of small API decisions which have to be revisited if we allow trailing slashes to be significant. For example, what should be the last component of the path? The component just before the ending slash, or the empty string? What if slice off the last part? What is the name, stem, suffix? etc.

    A path, conceptually, is just that: a sequence of names designating the nodes in the filesystem tree that you walk to get to the terminal node. If you start making trailing slash significants then this simple, intuitive abstraction breaks and things become much more awkward to understand.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 14, 2015

    Therefore, I'm finally closing this as won't fix :)

    @pitrou pitrou closed this as completed Apr 14, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @barneygale
    Copy link
    Contributor

    barneygale commented Feb 25, 2023

    I'm ~8 years late here, but I think this might be worth revisiting.

    Pathlib's path normalization is conservative and tries hard not to change the meaning of paths. Some users complain that not collapsing .. segments is unfriendly, or that treating // and / as different roots is unnecessary, but it's the safe thing to do.

    This is one of only two cases (that I'm aware of) where pathlib's normalization can change the meaning of a path (the other is #80486).

    It's also incompatible with POSIX as previously noted.

    For example, what should be the last component of the path? The component just before the ending slash, or the empty string? What if slice off the last part? What is the name, stem, suffix? etc.

    The name, stem and suffix should probably be empty to align with PurePath("/") IMO.

    @barneygale
    Copy link
    Contributor

    I will work on this once #102476 lands. Turns out it's necessary for the optimization I'm going for in #101560.

    @shiftagain
    Copy link

    shiftagain commented Mar 16, 2023

    For a script I'm currently writing, I have to append a directory path to sys.path in order to make some Python imports work properly, and it seems that sys.path interprets a path differently depending on if it has a trailing slash or not.

    I'm sure there are many other examples of code in the Python ecosystem requiring a trailing slash or behaving differently without one, so arguments about how pathlib is "Only for Python internals" or "Just represents a filesystem path" don't make sense to me.

    I would be content with just adding a resolve_with_trailing_slash function in addition to the resolve function. I think it would make sense to change the default behavior to have a trailing slash for folders (so that you have the additional information in the raw path that it is a folder), but breaking changes are not ideal.

    @gvanrossum
    Copy link
    Member

    (Cross-posted to Discourse: https://discuss.python.org/t/pathlib-preserve-trailing-slash/33389/9)

    I'm sorry Antoine chose to let pathlib deviate from the os.path module's behavior for this. I recall thinking long and hard about edge cases like this and carefully implementing what I thought was best. I haven't read that POSIX standard but I presume I was influenced by actual behavior of various UNIX utilities.

    But I agree that changing this will break plenty of user code relying on the current behavior, and we can't have that. I also don't think that some kind of deprecation path makes sense here. The best we can do is have some way to indicate the preferred behavior when a path is created, and have that inherited by operations that return new paths.

    I'm not sure what form the user preference should take -- I'd say it shouldn't be global, so it could take the form of either an alternative class, an alternative constructor, or a flag keyword argument to path-constructing operations.

    Even so, there could be problems -- suppose we have a library that accepts Path arguments and expects them to behave the old way, and a user constructs paths using the alternate constructor and passes those in. It might be quite a while before the user ends up passing a path that causes the library to crash or misbehave.

    So maybe an alternative approach could be not to have the behavior be indicated by some property of the Path instance but by using different attributes. So maybe e.g. Path("foo/").name would return "foo" but Path("foo/").alt_name would return "". This would avoid the scenario I described just above.

    Now we just have to decide on names for the attributes that could have this alternate behavior (are there others besides .parent and .name?). And probably the implementation will have to keep track of the trailing slash somehow.

    matthewfeickert pushed a commit to scikit-hep/pyhf that referenced this issue Oct 24, 2023
    * Use pathlib to build the stem for the schema to use (version + type of schema).
       - c.f. python/cpython#65238
    matthewfeickert pushed a commit to scikit-hep/pyhf that referenced this issue Oct 25, 2023
    * Backport PR https://github.com/scikit-hep/pyhf/pull/ 2357
    * Use pathlib to build the stem for the schema to use (version + type of schema).
       - c.f. python/cpython#65238
    matthewfeickert added a commit to scikit-hep/pyhf that referenced this issue Oct 25, 2023
    * Backport PR #2357
    * Use pathlib to build the stem for the schema to use (version + type of schema).
       - c.f. python/cpython#65238
    
    Co-authored-by: Giordon Stark <kratsg@gmail.com>
    @barneygale
    Copy link
    Contributor

    So maybe an alternative approach could be not to have the behavior be indicated by some property of the Path instance but by using different attributes. So maybe e.g. Path("foo/").name would return "foo" but Path("foo/").alt_name would return "". This would avoid the scenario I described just above.

    Now we just have to decide on names for the attributes that could have this alternate behavior (are there others besides .parent and .name?). And probably the implementation will have to keep track of the trailing slash somehow.

    I've been working on a patch along these lines. The idea is to ignore any trailing slash whenever we split a path into (dirname, basename). So:

    >>> from pathlib import PurePosixPath
    >>> p = PurePosixPath('/home/barney/')
    >>> p.parent
    PurePosixPath('/home')
    >>> list(p.parents)
    [PurePosixPath('/home'), PurePosixPath('/')]
    >>> p.name
    'barney'
    >>> p.with_name('fred')
    PurePosixPath('/home/fred/')

    This also applies to [with_]stem, [with_]suffix, suffixes, [is_]relative_to.

    Users wouldn't be able to call path.parent to remove a trailing slash, so we'd need to add something like PurePath.[with_]trailer I think.

    barneygale added a commit to barneygale/cpython that referenced this issue Nov 24, 2023
    barneygale added a commit to barneygale/cpython that referenced this issue Nov 24, 2023
    Add trailing slashes to expected `Path.glob()` results wherever a pattern
    has a trailing slash. This matches what `glob.glob()` produces.
    
    Due to another bug (pythonGH-65238) pathlib strips all trailing slashes, so this
    change is academic for now.
    barneygale added a commit that referenced this issue Dec 3, 2023
    Add trailing slashes to expected `Path.glob()` results wherever a pattern
    has a trailing slash. This matches what `glob.glob()` produces.
    
    Due to another bug (GH-65238) pathlib strips all trailing slashes, so this
    change is academic for now.
    barneygale added a commit to barneygale/cpython that referenced this issue Dec 18, 2023
    Ensure that trailing slashes are ignored whenever pathlib splits a basename
    from a dirname. This commit adds test cases for `parent`, `parents`,
    `name`, `stem`, `suffix`, `suffixes`, `with_name()`, `with_stem()`,
    `with_suffix()`, `relative_to()`, `is_relative_to()`, `expanduser()` and
    `absolute()`.
    
    Any solution for pythonGH-65238 should keep these tests passing.
    @barneygale
    Copy link
    Contributor

    Re-resolving as "won't fix" - the present behaviour is desirable for many users and it's too dangerous to change now. The crux of the issue is:

    >>> str(PurePath('a/')) == 'a'
    True
    >>> PurePath('a/') == PurePath('a')
    True

    Fixing this bug entails changing the first result, which entails changing the second too. For some this is clearly a bugfix, but for many others it comes close to defeating the purpose of pathlib.

    I don't think an initialiser argument would help: users handed a Path object would still need to cope with the possible equality and string representation change.

    Much more discussion here: https://discuss.python.org/t/pathlib-preserve-trailing-slash/33389/

    @barneygale barneygale closed this as not planned Won't fix, can't repro, duplicate, stale Dec 19, 2023
    aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
    …ython#112365)
    
    Add trailing slashes to expected `Path.glob()` results wherever a pattern
    has a trailing slash. This matches what `glob.glob()` produces.
    
    Due to another bug (pythonGH-65238) pathlib strips all trailing slashes, so this
    change is academic for now.
    Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
    …ython#112365)
    
    Add trailing slashes to expected `Path.glob()` results wherever a pattern
    has a trailing slash. This matches what `glob.glob()` produces.
    
    Due to another bug (pythonGH-65238) pathlib strips all trailing slashes, so this
    change is academic for now.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-pathlib type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants