Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nturl2path.pathname2url() doesn't handle forward slashes #120423

Closed
nineteendo opened this issue Jun 12, 2024 · 9 comments
Closed

nturl2path.pathname2url() doesn't handle forward slashes #120423

nineteendo opened this issue Jun 12, 2024 · 9 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@nineteendo
Copy link
Contributor

nineteendo commented Jun 12, 2024

Bug report

Bug description:

Paths with forward slashes aren't handled the same as paths with backslashes:

>>> from nturl2path import pathname2url
>>> pathname2url('//?/unc/server/share/dir')
'//%3F/unc/server/share/dir' # NOK
>>> pathname2url(r'\\?\unc\server\share\dir')
'//server/share/dir' # OK

CPython versions tested on:

3.13

Operating systems tested on:

Windows

Linked PRs

@nineteendo nineteendo added the type-bug An unexpected behavior, bug, or error label Jun 12, 2024
@nineteendo
Copy link
Contributor Author

nineteendo commented Jun 12, 2024

Quoting @eryksun:

The standard library doesn't use urllib.request.pathname2url(), so it isn't currently an internal issue. However, the documentation claims to support "the local syntax for a path", which would include using forward slash as the path separator. The Windows implementation of pathname2url() is tested in "Lib/test/test_urllib.py", but AFAICT only with backslash as the path separator. I think the input path should be normalized via normpath() and then split via splitroot() to process it sanely.

Also, currently nothing else in the standard library uses the nturl2path module, and its existence isn't documented. The two functions could be defined directly in urllib.request. The way some of the tests are implemented would have to be updated as well.

@eryksun
Copy link
Contributor

eryksun commented Jun 12, 2024

While using an empty authority (i.e. hostname or namespace) for a UNC path is valid according to RFC8089 (e.g. "file:////spam/eggs/foo"), the two-slash form with an authority is preferred on Windows (e.g. "file://spam/eggs/foo"). For DOS drive paths, the form with an empty authority is preferred (e.g. "file:///C:/eggs/foo"). But using an explicit authority is also supported (e.g. "file://localhost/C:/eggs/foo", or sometimes "file://./C:/eggs/foo").

@barneygale
Copy link
Contributor

barneygale commented Jun 12, 2024

On POSIX, urllib.request.pathname2url() simply calls quote(), so you might expect to use it like this:

>>> 'file://' + urllib.request.pathname2url('/etc/hosts')
'file:///etc/hosts'

But if we try the same thing using the Windows version we get unusual results:

>>> 'file://' + nturl2path.pathname2url(r'c:\foo')
'file://///C:/foo'
>>> 'file://' + nturl2path.pathname2url('c:/foo')
'file://///C://foo'
>>> 'file://' + nturl2path.pathname2url(r'\\server\share\file')
'file://////server/share/file'
>>> 'file://' + nturl2path.pathname2url('//server/share/file')
'file:////server/share/file'

I don't think there's a way to use it correctly. IMHO we should deprecate + remove it.

@barneygale
Copy link
Contributor

barneygale commented Jun 12, 2024

We can recommend Path.as_uri() to users as a replacement. It's been available since pathlib was added to the standard library.

It's tempting to also deprecate and remove url2pathname(), but:

  1. It's used a couple of times internally (unlike pathname2url()), and
  2. The potential replacement (Path.from_uri() was only added in 3.13, and if we want to be kind to package maintainers, I think we ought to wait until 3.12 drops out of support.

@pygeek
Copy link
Contributor

pygeek commented Jun 13, 2024

We can recommend Path.as_uri() to users as a replacement. It's been available since pathlib was added to the standard library.

It's tempting to also deprecate and remove url2pathname(), but:

  1. It's used a couple of times internally (unlike pathname2url()), and

  2. The potential replacement (Path.from_uri() was only added in 3.13, and if we want to be kind to package maintainers, I think we ought to wait until 3.12 drops out of support.

If we want to encourage usage of Path.from_uri over url2pathname wouldn't that justify a depreciation notice, at least a recommendation in the docs?

@barneygale
Copy link
Contributor

#126148 will fix this

@picnixz picnixz added the stdlib Python modules in the Lib dir label Oct 30, 2024
@nineteendo
Copy link
Contributor Author

Thanks, I was already wondering if that was the case.

barneygale added a commit to barneygale/cpython that referenced this issue Nov 8, 2024
…paths

Adjust `urllib.request.pathname2url()` so that forward slashes in Windows
paths are handled identically to backward slashes.
@barneygale
Copy link
Contributor

barneygale commented Nov 8, 2024

I decided to solve URL parsing/generation problems piecemeal rather than doing everything in #126148

Specific PR available: #126593

barneygale added a commit that referenced this issue Nov 12, 2024
…126593)

Adjust `urllib.request.pathname2url()` so that forward slashes in Windows
paths are handled identically to backward slashes.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 12, 2024
…paths (pythonGH-126593)

Adjust `urllib.request.pathname2url()` so that forward slashes in Windows
paths are handled identically to backward slashes.
(cherry picked from commit bf224bd)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 12, 2024
…paths (pythonGH-126593)

Adjust `urllib.request.pathname2url()` so that forward slashes in Windows
paths are handled identically to backward slashes.
(cherry picked from commit bf224bd)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale added a commit that referenced this issue Nov 12, 2024
… paths (GH-126593) (#126763)

GH-120423: `pathname2url()`: handle forward slashes in Windows paths (GH-126593)

Adjust `urllib.request.pathname2url()` so that forward slashes in Windows
paths are handled identically to backward slashes.
(cherry picked from commit bf224bd)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale added a commit that referenced this issue Nov 12, 2024
… paths (GH-126593) (#126764)

GH-120423: `pathname2url()`: handle forward slashes in Windows paths (GH-126593)

Adjust `urllib.request.pathname2url()` so that forward slashes in Windows
paths are handled identically to backward slashes.
(cherry picked from commit bf224bd)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
@barneygale
Copy link
Contributor

Fixed! Thanks for your input, all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants