-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pathlib strips trailing slash #65238
Comments
Some programs' behavior is different depending on whether the path has a trailing slash or not. Examples include ls, cp, mv, ln, rm and rsync. URL paths may also behave differently. For example http://xkcd.com/1 redirects to http://xkcd.com/1/ Boost.Filesystem's path class also supports trailing slashes in paths. C++'s filesystem library proposal is also based on Boost.Filesystem. |
Yes, this is by design. The occasional difference between slash-ended and non-slash-ended paths is unexpected and potentially confusing. Moreover, it's not a property of the OS itself - it's just some syntactic sugar to enable an option such as resolving symlinks. pathlib paths represent filesystem paths, not arbitrary shell arguments. Similarly, pathlib doesn't have special processing for "~someuser" parts. (as for URL paths, they are not part of the design space of pathlib) |
Closing as rejected, sorry. |
What about OpenVMS? |
Can you elaborate? Python hasn't supported VMS for quite some time... |
AFAIK paths on OpenVMS are represented in a strange way. [dir.subdir]filename is a path for a file and [dir.subdir.anothersubdir] is a path for a directory. |
Then I'm afraid the current Path classes won't do a good job of representing them :-) But as I said, Python probably doesn't run on VMS anymore, so this is a rather theoretical problem. Maybe if some day Python supports VMS again, someone can contribute a VMSPath implementation. |
Or maybe URLPath? |
I'm skeptical about that. I think someone should first prototype a |
This may be only syntactic sugar, but it is POSIX-specified syntactic sugar: according to http://pubs.opengroup.org/onlinepubs/9699919799/. trailing slashes in pathnames are semantically meaningful in pathname resolution. Tilde escapes are not mentioned. 4.12 Pathname Resolution [...] A pathname that contains at least one non- <slash> character and that ends with one or more trailing <slash> characters shall not be resolved successfully unless the last pathname component before the trailing <slash> characters names an existing directory or a directory entry that is to be created for a directory immediately after the pathname is resolved. Interfaces using pathname resolution may specify additional constraints[1] when a pathname that does not name an existing directory contains at least one non- <slash> character and contains one or more trailing <slash> characters. If a symbolic link is encountered during pathname resolution, the behavior shall depend on whether the pathname component is at the end of the pathname and on the function being performed. If all of the following are true, then pathname resolution is complete:
In all other cases, the system shall prefix the remaining pathname, if any, with the contents of the symbolic link. [...] |
Isaac, thanks for the reference. I'm reopening the issue for discussion (although I'm still not convinced this would be actually a good thing). |
The general mood on python-dev seemed to be that the trailing slash shouldn't be normalized. Can this still be fixed, or is it too late since pathlib was shipped in 3.4? The python-dev discussion was at https://mail.python.org/pipermail/python-dev/2014-August/135670.html |
I beg to disagree :) Pathlib tries to find a compromise between user-friendliness and power, but it's definitely more on the user-friendliness side than, say, the os module APIs. In other words, I don't think it's a problem if not all details of OS semantics can be encoded in Path objects. In practice, the situations where it's useful to make a difference between a slash-ending path and a non-slash-ending path are few and far between. There are all kinds of small API decisions which have to be revisited if we allow trailing slashes to be significant. For example, what should be the last component of the path? The component just before the ending slash, or the empty string? What if slice off the last part? What is the name, stem, suffix? etc. A path, conceptually, is just that: a sequence of names designating the nodes in the filesystem tree that you walk to get to the terminal node. If you start making trailing slash significants then this simple, intuitive abstraction breaks and things become much more awkward to understand. |
Therefore, I'm finally closing this as won't fix :) |
I'm ~8 years late here, but I think this might be worth revisiting. Pathlib's path normalization is conservative and tries hard not to change the meaning of paths. Some users complain that not collapsing This is one of only two cases (that I'm aware of) where pathlib's normalization can change the meaning of a path (the other is #80486). It's also incompatible with POSIX as previously noted.
The name, stem and suffix should probably be empty to align with |
For a script I'm currently writing, I have to append a directory path to sys.path in order to make some Python imports work properly, and it seems that sys.path interprets a path differently depending on if it has a trailing slash or not. I'm sure there are many other examples of code in the Python ecosystem requiring a trailing slash or behaving differently without one, so arguments about how pathlib is "Only for Python internals" or "Just represents a filesystem path" don't make sense to me. I would be content with just adding a |
This brings pathlib in line with *IEEE Std 1003.1-2017*, where trailing slashes are meaningful to path resolution and should not be discarded. See https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13
(Cross-posted to Discourse: https://discuss.python.org/t/pathlib-preserve-trailing-slash/33389/9) I'm sorry Antoine chose to let But I agree that changing this will break plenty of user code relying on the current behavior, and we can't have that. I also don't think that some kind of deprecation path makes sense here. The best we can do is have some way to indicate the preferred behavior when a path is created, and have that inherited by operations that return new paths. I'm not sure what form the user preference should take -- I'd say it shouldn't be global, so it could take the form of either an alternative class, an alternative constructor, or a flag keyword argument to path-constructing operations. Even so, there could be problems -- suppose we have a library that accepts So maybe an alternative approach could be not to have the behavior be indicated by some property of the Now we just have to decide on names for the attributes that could have this alternate behavior (are there others besides |
* Use pathlib to build the stem for the schema to use (version + type of schema). - c.f. python/cpython#65238
* Backport PR https://github.com/scikit-hep/pyhf/pull/ 2357 * Use pathlib to build the stem for the schema to use (version + type of schema). - c.f. python/cpython#65238
* Backport PR #2357 * Use pathlib to build the stem for the schema to use (version + type of schema). - c.f. python/cpython#65238 Co-authored-by: Giordon Stark <kratsg@gmail.com>
I've been working on a patch along these lines. The idea is to ignore any trailing slash whenever we split a path into (dirname, basename). So: >>> from pathlib import PurePosixPath
>>> p = PurePosixPath('/home/barney/')
>>> p.parent
PurePosixPath('/home')
>>> list(p.parents)
[PurePosixPath('/home'), PurePosixPath('/')]
>>> p.name
'barney'
>>> p.with_name('fred')
PurePosixPath('/home/fred/') This also applies to Users wouldn't be able to call |
Add trailing slashes to expected `Path.glob()` results wherever a pattern has a trailing slash. This matches what `glob.glob()` produces. Due to another bug (pythonGH-65238) pathlib strips all trailing slashes, so this change is academic for now.
Add trailing slashes to expected `Path.glob()` results wherever a pattern has a trailing slash. This matches what `glob.glob()` produces. Due to another bug (GH-65238) pathlib strips all trailing slashes, so this change is academic for now.
Ensure that trailing slashes are ignored whenever pathlib splits a basename from a dirname. This commit adds test cases for `parent`, `parents`, `name`, `stem`, `suffix`, `suffixes`, `with_name()`, `with_stem()`, `with_suffix()`, `relative_to()`, `is_relative_to()`, `expanduser()` and `absolute()`. Any solution for pythonGH-65238 should keep these tests passing.
Re-resolving as "won't fix" - the present behaviour is desirable for many users and it's too dangerous to change now. The crux of the issue is: >>> str(PurePath('a/')) == 'a'
True
>>> PurePath('a/') == PurePath('a')
True Fixing this bug entails changing the first result, which entails changing the second too. For some this is clearly a bugfix, but for many others it comes close to defeating the purpose of pathlib. I don't think an initialiser argument would help: users handed a Much more discussion here: https://discuss.python.org/t/pathlib-preserve-trailing-slash/33389/ |
…ython#112365) Add trailing slashes to expected `Path.glob()` results wherever a pattern has a trailing slash. This matches what `glob.glob()` produces. Due to another bug (pythonGH-65238) pathlib strips all trailing slashes, so this change is academic for now.
…ython#112365) Add trailing slashes to expected `Path.glob()` results wherever a pattern has a trailing slash. This matches what `glob.glob()` produces. Due to another bug (pythonGH-65238) pathlib strips all trailing slashes, so this change is academic for now.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Blocks issues
pathlib.PurePath.__fspath__()
#102783Linked PRs
The text was updated successfully, but these errors were encountered: