Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.request.url2pathname() mishandles empty authority sections (mostly) #126766

Closed
barneygale opened this issue Nov 12, 2024 · 1 comment
Closed
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@barneygale
Copy link
Contributor

barneygale commented Nov 12, 2024

Bug report

Bug description:

File URIs that start with 3+ slashes should be parsed as having an empty authority section (ref), but urllib.request.url2pathname() incorrectly retains the slashes introducing the authority section. This means it can't properly parse the most common form of POSIX absolute file URIs (e.g. file:///etc/hosts).

On Windows, url2pathname() correctly discards slashes before DOS drives (so file:///c:/foo is parsed as c:\foo), and before old-fashioned UNC URIs (so file:////server/share is parsed as \\server\share), but incorrectly retains slashes if a rooted, driveless path is decoded (so file:///foo/bar is decoded as \\\foo\bar instead of \foo\bar). This is much less of a problem because such paths are rare on Windows.

>>> from urllib.request import url2pathname
>>> url2pathname('///etc/hosts')
'///etc/hosts'  # expected: '/etc/hosts'

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux, Windows

Linked PRs

@barneygale barneygale added the type-bug An unexpected behavior, bug, or error label Nov 12, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Nov 12, 2024
Discard two leading slashes from the beginning of a `file:` URI if they
introduce an empty authority section. As a result, file URIs like
`///etc/hosts` are correctly parsed as `/etc/hosts`.
@barneygale barneygale added 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Nov 13, 2024
@picnixz picnixz added the stdlib Python modules in the Lib dir label Nov 14, 2024
barneygale added a commit that referenced this issue Nov 14, 2024
Discard two leading slashes from the beginning of a `file:` URI if they
introduce an empty authority section. As a result, file URIs like
`///etc/hosts` are correctly parsed as `/etc/hosts`.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 14, 2024
…ythonGH-126767)

Discard two leading slashes from the beginning of a `file:` URI if they
introduce an empty authority section. As a result, file URIs like
`///etc/hosts` are correctly parsed as `/etc/hosts`.
(cherry picked from commit cae9d9d)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 14, 2024
…ythonGH-126767)

Discard two leading slashes from the beginning of a `file:` URI if they
introduce an empty authority section. As a result, file URIs like
`///etc/hosts` are correctly parsed as `/etc/hosts`.
(cherry picked from commit cae9d9d)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale added a commit that referenced this issue Nov 14, 2024
…H-126767) (#126836)

GH-126766: `url2pathname()`: handle empty authority section. (GH-126767)

Discard two leading slashes from the beginning of a `file:` URI if they
introduce an empty authority section. As a result, file URIs like
`///etc/hosts` are correctly parsed as `/etc/hosts`.
(cherry picked from commit cae9d9d)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale added a commit that referenced this issue Nov 14, 2024
…H-126767) (#126837)

GH-126766: `url2pathname()`: handle empty authority section. (GH-126767)

Discard two leading slashes from the beginning of a `file:` URI if they
introduce an empty authority section. As a result, file URIs like
`///etc/hosts` are correctly parsed as `/etc/hosts`.
(cherry picked from commit cae9d9d)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
@barneygale
Copy link
Contributor Author

Re-opening: we should handle 'localhost' authorities in exactly the same way.

@barneygale barneygale reopened this Nov 22, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Nov 22, 2024
Discard any 'localhost' authority from the beginning of a `file:` URI. As a
result, file URIs like `//localhost/etc/hosts` are correctly decoded as
`/etc/hosts`.
barneygale added a commit that referenced this issue Nov 22, 2024
Discard any 'localhost' authority from the beginning of a `file:` URI. As a
result, file URIs like `//localhost/etc/hosts` are correctly decoded as
`/etc/hosts`.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 22, 2024
…onGH-127129)

Discard any 'localhost' authority from the beginning of a `file:` URI. As a
result, file URIs like `//localhost/etc/hosts` are correctly decoded as
`/etc/hosts`.
(cherry picked from commit ebf564a)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 22, 2024
…onGH-127129)

Discard any 'localhost' authority from the beginning of a `file:` URI. As a
result, file URIs like `//localhost/etc/hosts` are correctly decoded as
`/etc/hosts`.
(cherry picked from commit ebf564a)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale added a commit that referenced this issue Nov 22, 2024
…127129) (#127130)

GH-126766: `url2pathname()`: handle 'localhost' authority (GH-127129)

Discard any 'localhost' authority from the beginning of a `file:` URI. As a
result, file URIs like `//localhost/etc/hosts` are correctly decoded as
`/etc/hosts`.
(cherry picked from commit ebf564a)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale added a commit that referenced this issue Nov 22, 2024
…127129) (#127131)

GH-126766: `url2pathname()`: handle 'localhost' authority (GH-127129)

Discard any 'localhost' authority from the beginning of a `file:` URI. As a
result, file URIs like `//localhost/etc/hosts` are correctly decoded as
`/etc/hosts`.
(cherry picked from commit ebf564a)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants