Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url path for webdav push gets duplicated in dvc versions >=2.9.0 #7288

Closed
conrad-stork opened this issue Jan 19, 2022 · 8 comments · Fixed by #7488
Closed

url path for webdav push gets duplicated in dvc versions >=2.9.0 #7288

conrad-stork opened this issue Jan 19, 2022 · 8 comments · Fixed by #7488
Assignees
Labels
A: data-sync Related to dvc get/fetch/import/pull/push bug Did we break something? fs: webdav Related to the Webdav filesystem p1-important Important, aka current backlog of things to do

Comments

@conrad-stork
Copy link

Bug Report

Description

We have identified a bug(?) introduced from version 2.8.3 to 2.9.0. When pushing dvc data to a webdav server the path of the url gets duplicated. For example: If we push to webdavs://servername.com/path/to/location it pushes to webdavs://servername.com/path/to/location/path/to/location (creating all folders of the second "path/to/location" on the server). As it works with dvc version 2.8.3 we think that this might be a bug within dvc version 2.9.0 and our setup might not be too interesting for you. However, here is a rough overview of the webdav server setup: We are running an apache2 webdav server as our remote storage for our dvc projects. This webdav server is behind a reverse proxy (nginx).

Reproduce

Using dvc version <=2.8.3 everything works as expected. dvc version >=2.9.0 it does not work anymore.

Expected

Version 2.9.* should work the same as 2.8.*

Environment information

The environment information might not be too interesting, but I can add more information, if needed

Output of dvc doctor:

$ dvc doctor

version 2.8.3 fails withtin wsl, as reported elsewhere (https://issueexplorer.com/issue/iterative/dvc/7012)

Additional Information (if any):

Hope the report is detailed enough. Will add information if necessary.

@pared
Copy link
Contributor

pared commented Jan 22, 2022

@conrad-stork Hello!
Could you provide some information on how to reproduce this particular problem?

@conrad-stork
Copy link
Author

@pared Yes I will try to explain it as accurate as possible. If something missing let me know. Concrete questions would help, because its not clear what you know anyways.

We are running a nginx server as a reverse proxy for all our web applications that we run in-house. This nginx server forwards the incoming requests to the web applications that are build in docker containers. The most important part of the nginx config looks the following:

server {
        ... some authentication settings ...
        location /dvc {
                proxy_pass http://127.0.0.1:"port";
                proxy_set_header Host $host;

                client_max_body_size 100m;
        }
}

If you need more details regarding the nginx server configuration, please let me know what exactly you need.

So every request coming to out server/dvc is redirected to out locally running webdav within a docker container on port "port". Therefore we have set up an apache2 server (default settings except of the vhost) which provides the webdav server for our dvc storing location. The vhost config of the apache2 server is quiet easy:

DavLockDB /path to/DavLock

<VirtualHost *:80>
    ServerAdmin admin@localhost
    ServerName localhost
    ServerAlias localhost
    DocumentRoot /path to/webdav
    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined
    DirectorySlash Off

    <Directory /path to webdav>
        DAV On
        AuthType Basic
        AuthName "Restricted Files"
        AuthBasicProvider file
        AuthUserFile "path to passwd/passwds"
        Require user some_users
    </Directory>

</VirtualHost>

Pushing now data from somewhere in the network to our server/dvc/Path/To/Location works fine with version 2.8.3. From version on 2.9.0 the push works also fine, but the location on the server gets messed up to server/dvc/Path/To/Location/dvc/Path/To/Location (so everything behind server gets duplicated).

For me it looks like at some point where the server location is build (I am not familiar with any part of the code but the path on the server is given relatively and not absolute anymore?) something gets wrong. Might be that the forwarding of the request has something to do with it?

Let me know if I can help you somehow, also a short talk might be possible if it makes things easier.

Thanks in advance!
Best
Conrad

@pared
Copy link
Contributor

pared commented Jan 27, 2022

@conrad-stork ok, I've been playing with my own webdav account and it seems you are right, when I switch between different versions, the path on the server (for cache) is different. Investigating...

@pared
Copy link
Contributor

pared commented Jan 27, 2022

did a bit of bisecting and it seems our path change has been the culprit:
9aec8f1

@pared
Copy link
Contributor

pared commented Jan 27, 2022

@conrad-stork temporary workaround is to downgrade to 2.8.3.

@pared
Copy link
Contributor

pared commented Jan 27, 2022

Did a brief research and the problem is that before 9aec8f1 _strip_protocol returned whole PathInfo for WebDav:

dvc/dvc/fs/__init__.py

Lines 116 to 122 in 4175f9f

path_info = cls.PATH_CLS(
cls._strip_protocol(url) # pylint:disable=protected-access
)
extras = cls._get_kwargs_from_urls(url) # pylint:disable=protected-access
conf = {**extras, **remote_conf} # remote config takes priority
return cls, conf, path_info

Before this change we used fs.base one:

dvc/dvc/fs/base.py

Lines 73 to 74 in 4175f9f

def _strip_protocol(cls, path: str):
return path

In 9aec8f1 we introduce webdav-specific implementation:

dvc/dvc/fs/webdav.py

Lines 38 to 41 in 9aec8f1

def _strip_protocol(cls, path: str) -> str:
from fsspec.utils import infer_storage_options
return infer_storage_options(path)["path"].lstrip("/")

Before 9aec8f1 we used to provide the whole path to transfer method. Now the odb is partially responsible for building the path. Need to investigate it further.

@pared pared added bug Did we break something? p1-important Important, aka current backlog of things to do labels Jan 28, 2022
@dberenbaum dberenbaum added this to DVC Jan 28, 2022
@dberenbaum dberenbaum moved this to Backlog in DVC Jan 28, 2022
@dtrifiro dtrifiro added the release-blocker Blocks a release label Feb 1, 2022
@efiop efiop assigned efiop and pared Feb 1, 2022
@efiop
Copy link
Contributor

efiop commented Feb 7, 2022

Another part is that we don't use any webdav subpaths in our tests, only using root https://github.com/iterative/dvc-webdav/blob/main/dvc_webdav/tests/cloud.py#L11 , hence why we didn't catch it too.

Removing release-blocker for now, since it was broken for a few releases already, so no reason to block the current one.

@efiop efiop removed the release-blocker Blocks a release label Feb 7, 2022
@efiop
Copy link
Contributor

efiop commented Feb 8, 2022

For the record: @skshetry confirmed that we need to use full urls for webdav, so we need to adjust dvc/fs/webdav.py accordingly.

@efiop efiop assigned skshetry and unassigned efiop and pared Feb 22, 2022
@daavoo daavoo added fs: webdav Related to the Webdav filesystem A: data-sync Related to dvc get/fetch/import/pull/push labels Feb 22, 2022
Repository owner moved this from Backlog to Done in DVC Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push bug Did we break something? fs: webdav Related to the Webdav filesystem p1-important Important, aka current backlog of things to do
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants