-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tree,remote: add support for WebDAV #4256
Conversation
setup.py
Outdated
@@ -92,12 +92,13 @@ def run(self): | |||
oss = ["oss2==2.6.1"] | |||
ssh = ["paramiko>=2.5.0"] | |||
hdfs = ["pyarrow>=0.17.0"] | |||
webdav = ["webdavclient3==3.14.5"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do >=?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, should be safe, the webdavclient3 API seems to be rather fixed. Just used what I got out of pip freeze...
dvc/tree/webdav.py
Outdated
PATH_CLS = WebDAVURLInfo | ||
|
||
# Non traversable as walk_files is not implemented | ||
CAN_TRAVERSE = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great if you could implement walk_files
, no pressure though.
dvc/config.py
Outdated
"user": str, | ||
"password": str, | ||
"ask_password": Bool, | ||
"root": str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't need root
here, as that url
from `REMOTE_COMMON should work, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, probably it is not necessary any more, I used it to specify the location of the WebDAV API endpoint (e.g. /public.php/webdav
). But I still have a bit of confusion there...
Since I now terminate the makedirs
at the path
part of the initial path_info
(which comes from the url
of REMOTE_COMMON
), the root
option might be removed to not confuse others as well.
I somehow still wonder if there could be a valid use case for the option... At least there must be a reason for webdavclient3
to offer this option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot we split that into hostname
and root
from the url
:
Eg: http://owncloud.com/remote.php/webdav
-> http://owncloud.com
and /remote.php/webdav
?
dvc/tree/webdav.py
Outdated
hostname = ( | ||
self.path_info.scheme.replace("webdav", "http") | ||
+ "://" | ||
+ self.path_info.host | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
port
is not passed, try something like following, perhaps?
http_info = HTTPURLInfo(self.path_info.url)
hostname = http_info.replace(path="").url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, I somehow forgot that there is more in the path_info
than scheme
and host
... Should be fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @iksnagreb for the PR. Just tried locally, works for the most part (I hit an issue with port
though, see above). Also, I am quite not sure with webdav://
urls though (and, also, root
can just be deduced from url
as I mentioned above). π
dvc/exceptions.py
Outdated
class WebDAVConfigError(DvcException): | ||
def __init__(self, host): | ||
super().__init__(f"Configuration for WebDAV {host} is invalid.") | ||
|
||
|
||
class WebDAVConnectionError(DvcException): | ||
def __init__(self, host): | ||
super().__init__(f"Unable to connect to WebDAV {host}.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are only used in webdav.py, so let's move them there. Also i'm not sure we need a special WebDAVConfigError exception. We could just raise ConfigError or at least inherit from it.
dvc/tree/webdav.py
Outdated
# Return constructed client (cached) | ||
return client | ||
|
||
# Checks whether file exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these comments are not useful, they are declaring obvious things.
dvc/tree/webdav.py
Outdated
# Copies file/directory at remote | ||
def copy(self, from_info, to_info): | ||
# Webdav client copy | ||
self._client.copy(from_info.path, to_info.path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this method is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is probably not needed. I have implemented it, because it comes almost for free as it directly matches the webdavclient3
method (the same goes for move/remove). Shall I remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's remove it to not leave dead code around. If we ever need it - it will be trivial to introduce.
dvc/tree/webdav.py
Outdated
# pylint: disable=unused-argument | ||
|
||
# Webdav client download | ||
self._client.download(from_info.path, to_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shame this library doesn't support progress bars for download/upload, the ui will not be very helpful :(
We need to at least put a dummy progress bar for this.
# Gets file hash 'etag' | ||
def get_file_hash(self, path_info): | ||
# Use webdav client info method to get etag | ||
etag = self._client.info(path_info.path)["etag"].strip('"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is etag or something like that part of the webdav standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As WebDAV is an extension of HTTP, it should be part of the standard.
Webdav support is based on https://pypi.org/project/webdavclient3/ and supports basic download/upload operation, directory creation as well as existence, file hash and isdir query. Copy, move and remove are also implemented, though probably not used yet. WebdavURLInfo is taken from https://github.com/shizacat/dvc/tree/remote-webdav Fixes iterative#1153
Webdav token auth, certificate and key path and connection timeout are configurable. Webdav username might be specified or extracted from URL. Refs iterative#1153
This enables the WebDAV api location (e.g. '/public.php/webdav') to be part of the remote 'url' configuration instead of beeing specified separately via the 'root' option. The 'root' option may then be used to specify real directories at the WebDAV storage, although using it to set the api location is still possible. Refs iterative#1153
The WebDAV 'root' option was rather confusing and should be handled by the initial 'path_info' from the config 'url' option. Context: iterative#4256 (comment) While stripping the path/root from the hostname the port got lost, which is fixed now by simply using the URLInfo 'replace' method as suggested. Context: iterative#4256 (comment) The WebDAV client connection is tested by probing the existence of the root (self.path_info.path). Refs iterative#1153
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thank you!
I know that we've discussed tests in discord, but as @pmrowla and @skshetry noted, maybe we could use a docker-based fixture for testing this. E.g. see https://github.com/iterative/dvc/blob/master/tests/remotes/azure.py , where we use azurite docker image declared in https://github.com/iterative/dvc/blob/master/tests/docker-compose.yml . Maybe we could use something like https://hub.docker.com/r/bytemark/webdav to test webdav the same way?
But regardless of the docker-based test, you've confirmed that this implementation works for you, so let's merge for now π
We can also use |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
β I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. π
Should fix #1153
Took WebDAVURLInfo from #3647
Relates to iterative/dvc.org#1617
TODO