Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.request.url2pathname() mishandles colons in URLs on Windows #126367

Open
barneygale opened this issue Nov 3, 2024 · 7 comments
Open

urllib.request.url2pathname() mishandles colons in URLs on Windows #126367

barneygale opened this issue Nov 3, 2024 · 7 comments
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes type-bug An unexpected behavior, bug, or error

Comments

@barneygale
Copy link
Contributor

barneygale commented Nov 3, 2024

Bug report

Bug description:

On Windows, urllib.request.url2pathname() has overly simplistic handling of DOS drives in URLs.

Per @eryksun:

>>> from urllib.request import url2pathname
>>> url2pathname('//host/share/spam.txt:eggs')
'T:\\eggs'  # expected: r'\\host\share\spam.txt:eggs'

Also:

>>> url2pathname('///c:/spam.txt:eggs')
OSError: Bad URL: ///c|/spam.txt|eggs  # expected: r'c:\spam.txt:eggs'

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows

@barneygale barneygale added type-bug An unexpected behavior, bug, or error 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Nov 3, 2024
@samtstephens
Copy link

Hi, could this be assigned to me? I'm an undergrad working to make an open source contribution to a software library. Thanks!

@barneygale
Copy link
Contributor Author

barneygale commented Nov 4, 2024

Hey, yes that's absolutely fine, thank you for offering! A couple of pointers:

@barneygale
Copy link
Contributor Author

I should also note the Python Dev Guide's Lifecycle of a Pull Request guide :)

@samtstephens
Copy link

Thank you! Will check it out.

@barneygale
Copy link
Contributor Author

Just FYI, the PR I mentioned previously has now landed, so feel free to have a crack at solving this issue! Let me know if you need a hand.

@barneygale
Copy link
Contributor Author

Hey @samtstephens, if you need a hint, you could look at this code in pathlib:

if path[:3] == '///':
# Remove empty authority
path = path[2:]
elif path[:12] == '//localhost/':
# Remove 'localhost' authority
path = path[11:]
if path[:3] == '///' or (path[:1] == '/' and path[2:3] in ':|'):
# Remove slash before DOS device/UNC path
path = path[1:]
if path[1:2] == '|':
# Replace bar with colon in DOS drive
path = path[:1] + ':' + path[2:]

I think it does roughly what we want to happen in nturl2path.url2pathname() :)

@samtstephens
Copy link

Thanks! Me and my partner are trying this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants