Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree,remote: add support for WebDAV #4256

Merged
merged 16 commits into from
Jul 29, 2020
Merged

tree,remote: add support for WebDAV #4256

merged 16 commits into from
Jul 29, 2020

Conversation

iksnagreb
Copy link
Contributor

@iksnagreb iksnagreb commented Jul 22, 2020

  • ❗ I have followed the Contributing to DVC checklist.

  • πŸ“– If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.

  • ❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Should fix #1153
Took WebDAVURLInfo from #3647
Relates to iterative/dvc.org#1617

TODO

  • Tests (at least some unit tests to check whether a valid webdavclient3 can be constructed from config should be doable)
  • Provide documentation (after discussing the added configuration options for WebDAV)

setup.py Outdated
@@ -92,12 +92,13 @@ def run(self):
oss = ["oss2==2.6.1"]
ssh = ["paramiko>=2.5.0"]
hdfs = ["pyarrow>=0.17.0"]
webdav = ["webdavclient3==3.14.5"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do >=?

Copy link
Contributor Author

@iksnagreb iksnagreb Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should be safe, the webdavclient3 API seems to be rather fixed. Just used what I got out of pip freeze...

PATH_CLS = WebDAVURLInfo

# Non traversable as walk_files is not implemented
CAN_TRAVERSE = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if you could implement walk_files, no pressure though.

dvc/config.py Outdated
"user": str,
"password": str,
"ask_password": Bool,
"root": str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't need root here, as that url from `REMOTE_COMMON should work, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably it is not necessary any more, I used it to specify the location of the WebDAV API endpoint (e.g. /public.php/webdav). But I still have a bit of confusion there...

Since I now terminate the makedirs at the path part of the initial path_info (which comes from the url of REMOTE_COMMON), the root option might be removed to not confuse others as well.

I somehow still wonder if there could be a valid use case for the option... At least there must be a reason for webdavclient3 to offer this option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we split that into hostname and root from the url:
Eg: http://owncloud.com/remote.php/webdav -> http://owncloud.com and /remote.php/webdav?

Comment on lines 83 to 87
hostname = (
self.path_info.scheme.replace("webdav", "http")
+ "://"
+ self.path_info.host
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

port is not passed, try something like following, perhaps?

http_info = HTTPURLInfo(self.path_info.url)
hostname = http_info.replace(path="").url

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, I somehow forgot that there is more in the path_info than scheme and host... Should be fixed now.

Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iksnagreb for the PR. Just tried locally, works for the most part (I hit an issue with port though, see above). Also, I am quite not sure with webdav:// urls though (and, also, root can just be deduced from url as I mentioned above). πŸ™‚

Comment on lines 302 to 310
class WebDAVConfigError(DvcException):
def __init__(self, host):
super().__init__(f"Configuration for WebDAV {host} is invalid.")


class WebDAVConnectionError(DvcException):
def __init__(self, host):
super().__init__(f"Unable to connect to WebDAV {host}.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are only used in webdav.py, so let's move them there. Also i'm not sure we need a special WebDAVConfigError exception. We could just raise ConfigError or at least inherit from it.

# Return constructed client (cached)
return client

# Checks whether file exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these comments are not useful, they are declaring obvious things.

Comment on lines 199 to 204
# Copies file/directory at remote
def copy(self, from_info, to_info):
# Webdav client copy
self._client.copy(from_info.path, to_info.path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is probably not needed. I have implemented it, because it comes almost for free as it directly matches the webdavclient3 method (the same goes for move/remove). Shall I remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's remove it to not leave dead code around. If we ever need it - it will be trivial to introduce.

# pylint: disable=unused-argument

# Webdav client download
self._client.download(from_info.path, to_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shame this library doesn't support progress bars for download/upload, the ui will not be very helpful :(

We need to at least put a dummy progress bar for this.

# Gets file hash 'etag'
def get_file_hash(self, path_info):
# Use webdav client info method to get etag
etag = self._client.info(path_info.path)["etag"].strip('"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is etag or something like that part of the webdav standard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As WebDAV is an extension of HTTP, it should be part of the standard.

Christoph Berganski and others added 5 commits July 27, 2020 15:18
Webdav support is based on https://pypi.org/project/webdavclient3/ and
supports basic download/upload operation, directory creation as well as
existence, file hash and isdir query. Copy, move and remove are also
implemented, though probably not used yet.

WebdavURLInfo is taken from https://github.com/shizacat/dvc/tree/remote-webdav

Fixes iterative#1153
Webdav token auth, certificate and key path and connection timeout are
configurable. Webdav username might be specified or extracted from URL.

Refs iterative#1153
This enables the WebDAV api location (e.g. '/public.php/webdav') to be
part of the remote 'url' configuration instead of beeing specified
separately via the 'root' option. The 'root' option may then be used to
specify real directories at the WebDAV storage, although using it to
set the api location is still possible.

Refs iterative#1153
@iksnagreb iksnagreb marked this pull request as ready for review July 27, 2020 17:59
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thank you!

I know that we've discussed tests in discord, but as @pmrowla and @skshetry noted, maybe we could use a docker-based fixture for testing this. E.g. see https://github.com/iterative/dvc/blob/master/tests/remotes/azure.py , where we use azurite docker image declared in https://github.com/iterative/dvc/blob/master/tests/docker-compose.yml . Maybe we could use something like https://hub.docker.com/r/bytemark/webdav to test webdav the same way?

But regardless of the docker-based test, you've confirmed that this implementation works for you, so let's merge for now πŸ™‚

@efiop efiop merged commit 2f48bb4 into iterative:master Jul 29, 2020
@skshetry
Copy link
Member

We can also use wsgidav which has a good documentation on how to use it as a library. It requires a wsgi server (eg: gunicorn, etc), but probably we can get away with just wsgiref from the Python's standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WebDav: new remote type
4 participants