Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Network File Sharing Protocol (SMB Protocol) #765

Closed
lucasjamar opened this issue May 12, 2021 · 2 comments
Closed

Support Network File Sharing Protocol (SMB Protocol) #765

lucasjamar opened this issue May 12, 2021 · 2 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@lucasjamar
Copy link
Contributor

Description

I often work with data located on a network drive. If I am working locally, I can easily load such data by specifying the displayed path, e.g.:

companies:
  type: pandas.CSVDataSet
  filepath: O:/department_name/team_name/companies.csv

However, if I want to move my work to the cloud, I need to use the SMB protocol. It is already supported by fsspec. Using normal pandas read/ write csv, I do something like this:

df = pd.read_csv("smb://myuser:mypassword@myserver.com:445/department_name/team_name/companies.csv")

However, when I plug something similar into my data catalog, I get the following error message:

companies:
  type: pandas.CSVDataSet
  filepath: smb://myuser:mypassword@myserver.com:445/department_name/team_name/companies.csv
File "C:\Users\JAMAR_L\Anaconda3\envs\kedro-alpiq\lib\site-packages\kedro\extras\datasets\pandas\csv_dataset.py", line 121, in __init__
    self._fs = fsspec.filesystem(self._protocol, **_credentials, **_fs_args)
  File "C:\Users\JAMAR_L\Anaconda3\envs\kedro-alpiq\lib\site-packages\fsspec\registry.py", line 244, in filesystem
    return cls(**storage_options)
  File "C:\Users\JAMAR_L\Anaconda3\envs\kedro-alpiq\lib\site-packages\fsspec\spec.py", line 69, in __call__
    obj = super().__call__(*args, **kwargs)
TypeError: __init__() missing 1 required positional argument: 'host'

When I look at the code of kedro.io.core, I notice that smb protocol is not listed.

HTTP_PROTOCOLS = ("http", "https")
PROTOCOL_DELIMITER = "://"
CLOUD_PROTOCOLS = ("s3", "gcs", "gs", "adl", "abfs")

Possible Implementation

Has anyone come across this issue before and perhaps have a workaround? If not, I will implement this by adding smb to the list of protocols.

Thanks :)

@lucasjamar lucasjamar added the Issue: Feature Request New feature or improvement to existing feature label May 12, 2021
@merelcht
Copy link
Member

Hi @lucasjamar, this sounds like a good addition. We'd be happy to accept a PR for this 🙂

@lorenabalan
Copy link
Contributor

Closing this as per #782 (comment) and related discussion on the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

3 participants