Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils.infer_storage_options does not support passwords with hashes #171

Open
c0ffymachyne opened this issue Nov 3, 2019 · 4 comments
Open

Comments

@c0ffymachyne
Copy link

print (fsspec.utils.infer_storage_options('ftp://user:password@host:21/path'))
{'protocol': 'ftp', 'path': '/path', 'host': 'host', 'port': 21, 'username': 'user', 'password': 'password'}

print (fsspec.utils.infer_storage_options('ftp://user:password#@host:21/path'))

Traceback (most recent call last):
  File "filesystem.py", line 481, in <module>
    print (fsspec.utils.infer_storage_options('ftp://user:password#@host:21/path'))
  File "...\lib\site-packages\fsspec\utils.py", line 72, in infer_storage_options
    if parsed_path.port:
  File "...\lib\urllib\parse.py", line 167, in port
    port = int(port, 10)
ValueError: invalid literal for int() with base 10: 'password'
@c0ffymachyne
Copy link
Author

Seems _splitnetloc thinks # is a delimiter, removing hash from the list helps but I am not sure about potential implications of doing so. Is there any way to prevent this behavior ?

image

@martindurant
Copy link
Member

There is not really a way to do this because, as you remark, some characters have special meaning in the context of a URI. In the case of FTP, you can pass the arguments explicitly:

# direct instantiation
fs = FTPFileSystem('host', username='user', password='password#')
# convenience
fs = fsspec.filesystem('ftp', 'host', username='user', password='password#')
# files
of = fsspec.open('ftp://host:21/path', username='user', password='password#')

Other FSs might have slightly different names for their parameters, so please check the docstrings.

@c0ffymachyne
Copy link
Author

Hm... According to page 2 of RFC 1738 the character "#" is unsafe and should always be encoded.

All unsafe characters must always be encoded within a URL

Is my understanding correct ? Would it be possible to make fsspec handle encoded characters in a url ?
After little modification to infer_storage_option , test shows encoding/decoding works fine with:
print (fsspec.utils.infer_storage_options('ftp://user:password%23@host:21/path'))
Modification to infer_storage_option :

    parsed_path = urlsplit(urlpath)
    parsed_path = SplitResult(parsed_path.scheme, unquote(parsed_path.netloc) , parsed_path.path, parsed_path.query, parsed_path.fragment)

@martindurant
Copy link
Member

Hm, but people would be used to ftp and ssh paths not necessarily to follow URL conventions. I suppose supporting encoding can be ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants