Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.request.pathname2url() mishandles UNC paths #126205

Closed
barneygale opened this issue Oct 30, 2024 · 4 comments
Closed

urllib.request.pathname2url() mishandles UNC paths #126205

barneygale opened this issue Oct 30, 2024 · 4 comments
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes type-bug An unexpected behavior, bug, or error

Comments

@barneygale
Copy link
Contributor

barneygale commented Oct 30, 2024

Bug report

Bug description:

When given a Windows UNC path, urllib.request.pathname2url() incorrectly generates a URI that begins with four slashes. The correct number is two, see ref1, ref2.

>>> import urllib.request
>>> urllib.request.pathname2url(r'\\server\share')
'////server/share'

Furthermore, when given an extended UNC path like \\?\unc\server\share, pathname2url() incorrectly generates a URI that begins with only one slash:

>>> urllib.request.pathname2url(r'\\?\unc\server\share')
'/server/share'

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows

Linked PRs

@barneygale barneygale added type-bug An unexpected behavior, bug, or error 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Oct 30, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Oct 30, 2024
File URIs for Windows UNC paths should begin with two slashes, not four.
barneygale added a commit that referenced this issue Oct 30, 2024
File URIs for Windows UNC paths should begin with two slashes, not four.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 31, 2024
…26208)

File URIs for Windows UNC paths should begin with two slashes, not four.
(cherry picked from commit 951cb2c)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 31, 2024
…26208)

File URIs for Windows UNC paths should begin with two slashes, not four.
(cherry picked from commit 951cb2c)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale added a commit that referenced this issue Oct 31, 2024
#126249)

GH-126205: Fix conversion of UNC paths to file URIs (GH-126208)

File URIs for Windows UNC paths should begin with two slashes, not four.
(cherry picked from commit 951cb2c)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale added a commit that referenced this issue Oct 31, 2024
#126248)

GH-126205: Fix conversion of UNC paths to file URIs (GH-126208)

File URIs for Windows UNC paths should begin with two slashes, not four.
(cherry picked from commit 951cb2c)

Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale added a commit to barneygale/cpython that referenced this issue Nov 23, 2024
… POSIX paths

When handed an absolute Windows path such as `C:\foo` or `//server/share`,
the `urllib.request.pathname2url()` function returns a URL with an
authority section, such as `///C:/foo` or `//server/share` (or before
pythonGH-126205, `////server/share`). Only the `file:` prefix is omitted.

But when handed an absolute POSIX path such as `/etc/hosts`, or a Windows
path of the same form (rooted but lacking a drive), the function returns a
URL without an authority section, such as `/etc/hosts`.

This patch corrects the discrepancy by adding a `//` prefix before
drive-less, rooted paths when generating URLs.
@serhiy-storchaka
Copy link
Member

I do not think the case for \\server\share was a bug. file:////server/share and file://server/share are both acceptable. The latter form is more preferable, because it is generated on Windows by default, but I would not backport this change.

@barneygale
Copy link
Contributor Author

barneygale commented Nov 24, 2024

file:////server/share and file://server/share are both acceptable.

The former is considered "incorrect" and "unhealthy" by the Microsoft docs:

Incorrect: file:////applib/products/a%2Db/ abc%5F9/4148.920a/media/start.swf
Correct: file://applib/products/a-b/abc_9/4148.920a/media/start.swf

The author of this URI was heading in the correct direction. They converted the backslashes in their Windows file path to forward slashes and they percent-encoded characters they thought should be encoded. Although they meant well, there are a couple of problems. First, ‘applib’ is meant to be the host, but is preceded by two extra slashes.

Also: https://learn.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/platform-apis/ms775098(v=vs.85)#creating-file-schemes-from-file-paths

There are two kinds of file scheme URIs. The first is the well-formed, or "healthy," URL style that supports query strings, fragments, percent-encoded octets, and so on. The other is basically a DOS file path with "file://" prepended to the front. This latter form is generated when Uri_CREATE_FILE_USE_DOS_PATH is set and should be used only for legacy communication.

Warning: Legacy file scheme URIs should be used only with legacy APIs that will not accept healthy file scheme URIs. Legacy file scheme URIs do not allow percent encoded octets, which can lead to ambiguity. Therefore, legacy file scheme URIs should not be used unless absolutely necessary.

@barneygale
Copy link
Contributor Author

I can revert this in 3.12 and 3.13 if you think that's the right thing to do!

@serhiy-storchaka
Copy link
Member

No, I do not suggest to revert this in 3.12 and 3.13.

The Microsoft docs talk about abominations like file://C:\Windows\My Documents 100%20\file.txt and file://\\server\share\My Documents 100%20\file.txt, so I am not sure that it is related to URIs with correct slashes and escaping. Both RFC 8089 and the Wikipedia page refer to these forms as more or less equivalent. If percent-escaping, query and fragment are handled differently for these two forms, we have a larger issue.

barneygale added a commit that referenced this issue Nov 25, 2024
… path (#127194)

When handed an absolute Windows path such as `C:\foo` or `//server/share`,
the `urllib.request.pathname2url()` function returns a URL with an
authority section, such as `///C:/foo` or `//server/share` (or before
GH-126205, `////server/share`). Only the `file:` prefix is omitted.

But when handed an absolute POSIX path such as `/etc/hosts`, or a Windows
path of the same form (rooted but lacking a drive), the function returns a
URL without an authority section, such as `/etc/hosts`.

This patch corrects the discrepancy by adding a `//` prefix before
drive-less, rooted paths when generating URLs.
picnixz pushed a commit to picnixz/cpython that referenced this issue Dec 8, 2024
)

File URIs for Windows UNC paths should begin with two slashes, not four.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants