Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull specific_file.dvc: SCM-Error when dvc import without access in same repository #10309

Closed
conrad-stork-basf opened this issue Feb 19, 2024 · 13 comments · Fixed by #10368, iterative/dvc-data#510, iterative/dvc-data#517, iterative/dvc-data#518 or #10369
Assignees
Labels
A: data-sync Related to dvc get/fetch/import/pull/push bug Did we break something? p1-important Important, aka current backlog of things to do

Comments

@conrad-stork-basf
Copy link

Bug Report

dvc pull specific_file.dvc: SCM-Error when dvc import without access in same repository

Description

We have the following setup: we have two repositories (let's call them in this example data_repo and analyze_repo)

data_repo: contains data
analyze_repo: contains some analyze functions and IMPORTANT! a dvc import from the data_repo. Further it creates a specific_file.dvc file which contains all the results of the analyze functions.

Now we have the use case that we need to pull the analyze_repo from git via https by a token which only has access to the analyze_repo. In the further workflow we then need to get the specific_file.dvc file by calling dvc pull specific_file.dvc. But as the token has no access to the data_repo this throws a SCM Error:

Collecting                                                                                                                                                                              |0.00 [00:00,    ?entry/s]2024-02-19 15:05:00,353 ERROR: failed to pull data from the cloud - SCM error: Failed to clone repo 'git@[...]data_repo.git' to '/tmp/tmpxya28_0ldvc-clone': Authentication failed for: 'git@[...]:22': Permission denied for user git on host [...]
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/asyncssh_vendor.py", line 295, in _run_command
    conn = await asyncssh.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncssh/connection.py", line 8269, in connect
    return await asyncio.wait_for(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncssh/connection.py", line 436, in _connect
    await options.waiter
asyncssh.misc.PermissionDenied: Permission denied for user git on host [...]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 260, in clone
    repo = clone_from()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/porcelain.py", line 546, in clone
    return client.clone(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 753, in clone
    result = self.fetch(path, target, progress=progress, depth=depth)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 839, in fetch
    result = self.fetch_pack(
             ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 1149, in fetch_pack
    proto, can_read, stderr = self._connect(b"upload-pack", path)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 1798, in _connect
    con = self.ssh_vendor.run_command(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/asyncssh_vendor.py", line 308, in _run_command
    raise AuthError(f"{username}@{host}:{port or 22}") from exc
scmrepo.exceptions.AuthError: Authentication failed for: 'git@[...]:22'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/dvc/scm.py", line 152, in clone
    git = Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/__init__.py", line 154, in clone
    backend.clone(url, to_path, bare=bare, mirror=mirror, **kwargs)
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 268, in clone
    raise CloneError(url, to_path) from exc
scmrepo.exceptions.CloneError: Failed to clone repo 'git@[...]data_repo.git' to '/tmp/tmpxya28_0ldvc-clone'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/dvc/commands/data_sync.py", line 35, in run
    stats = self.repo.pull(
            ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/__init__.py", line 59, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/pull.py", line 30, in pull
    processed_files_count = self.fetch(
                            ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/__init__.py", line 59, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/fetch.py", line 167, in fetch
    data = collect(
           ^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc_data/index/collect.py", line 101, in collect
    fsid = data.fs.fsid
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 558, in fsid
    return self.fs.fsid
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 223, in fsid
    self.repo.url or self.repo.root_dir,
    ^^^^^^^^^
  File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 198, in repo
    repo = self._make_repo(**self._repo_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 275, in _make_repo
    with Repo.open(uninitialized=True, **kwargs) as repo:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/__init__.py", line 296, in open
    return open_repo(url, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 60, in open_repo
    return _external_repo(url, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 23, in _external_repo
    path = _cached_clone(url, rev)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 134, in _cached_clone
    clone_path, shallow = _clone_default_branch(url, rev)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/funcy/decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/funcy/flow.py", line 246, in wrap_with
    return call()
           ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/funcy/decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 198, in _clone_default_branch
    git = clone(url, clone_path)
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/scm.py", line 157, in clone
    raise CloneError("SCM error") from exc
dvc.scm.CloneError: SCM error

Reproduce

This should be reproducible via the following steps:

  1. Create data_repo with data
  2. Create analyze_repo with some scripts that create specific_file.dvc
  3. Within the analyze_repo do a dvc import on the data_repo data
  4. Create access token for analyse_repo
  5. Use blank docker container for example to clone analyze_repo into a blank env via the https://TOKEN@data_repo.git
  6. try to pull the specific_file.dvc via dvc pull specific_file.dvc

Expected

We would expect that this should work and only a dvc pull should fail with the exception from above. Or only dvc files that are allowed to pull are pulled all others fail with a warning.

Environment information

Blank docker container was used to create this.

Output of dvc doctor:

DVC version: 3.45.0 (pip)
-------------------------
Platform: Python 3.11.7 on Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.13.0
        dvc_objects = 5.0.0
        dvc_render = 1.0.1
        dvc_task = 0.3.0
        scmrepo = 3.1.0
Supports:
        http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        webdav (webdav4 = 0.9.8),
        webdavs (webdav4 = 0.9.8)
Config:
        Global: /root/.config/dvc
        System: /etc/xdg/dvc
@dberenbaum
Copy link
Collaborator

Or only dvc files that are allowed to pull are pulled all others fail with a warning.

This should be the case for dvc pull --allow-missing, but I can confirm that it also fails to pull anything in this case where an import repo is inaccessible. It looks like there is a bug where dvc doesn't continue to pull other outputs if it fails to access the import repo.

Do you need access to that specific file in this case? There is no way for dvc to access it if it can't reach the data repo, but it's possible to use DVC Studio to access artifacts in these situations.

@dberenbaum dberenbaum added A: data-sync Related to dvc get/fetch/import/pull/push bug Did we break something? p1-important Important, aka current backlog of things to do labels Feb 19, 2024
@dberenbaum dberenbaum added this to DVC Feb 19, 2024
@dberenbaum dberenbaum moved this to Backlog in DVC Feb 19, 2024
@conrad-stork-basf
Copy link
Author

Or only dvc files that are allowed to pull are pulled all others fail with a warning.

This should be the case for dvc pull --allow-missing, but I can confirm that it also fails to pull anything in this case where an import repo is inaccessible. It looks like there is a bug where dvc doesn't continue to pull other outputs if it fails to access the import repo.

Do you need access to that specific file in this case? There is no way for dvc to access it if it can't reach the data repo, but it's possible to use DVC Studio to access artifacts in these situations.

Thanks for confirming. What I do now is just an rm -rf on the imported folder as I do not need the data in the setup anymore. Then it is working again, perhaps not the smoothest workaround, but it does its job 😉 For cases in which the repo should push data again or push something else, this is likely not suited, but this is not the case for us.

Thanks for the help, and Best
Conrad

@dberenbaum
Copy link
Collaborator

@skshetry Can you pick this up and decide whether to merge iterative/dvc-data#510 or go with a different approach?

@skshetry skshetry pinned this issue Mar 14, 2024
@skshetry
Copy link
Member

Looks like iterative/dvc-data#510 does not fix the problem. It will get raised on index.diff().

@skshetry
Copy link
Member

skshetry commented Mar 22, 2024

Created iterative/dvc-data#517.

cc @conrad-stork-basf, could you please try installing from the PR and see if it fixes the problem? Here's how you can install it:

pip install "dvc-data @ git+https://github.com/iterative/dvc-data@refs/pull/517/merge"

@conrad-stork-basf
Copy link
Author

Created iterative/dvc-data#517.

cc @conrad-stork-basf, could you please try installing from the PR and see if it fixes the problem? Here's how you can install it:

pip install "dvc-data @ git+https://github.com/iterative/dvc-data@refs/pull/517/merge"

Yes, I can try it. As I think the PR is already merged I installed it via:

git clone https://github.com/iterative/dvc-data.git
pip install dvc-data/

Installed version should then be 3.15.0?

However, I still get the same error as above. Do I also need to update something else?

Thanks and Best
Conrad

@skshetry
Copy link
Member

skshetry commented Mar 23, 2024

Can you share the full stack trace? Also, since that PR is merged, can you try installing from pypi instead?

pip uninstall dvc-data
pip install dvc-data==3.15.0

@skshetry skshetry reopened this Mar 23, 2024
@conrad-stork-basf
Copy link
Author

Can you share the full stack trace? Also, since that PR is merged, can you try installing from pypi instead?

pip uninstall dvc-data
pip install dvc-data==3.15.0

Sure!
The install from pypi does not work as our internal repository has not yet mirrored pypi...I can try this on Monday
I have also added some more logs. These are the command we are running:

git clone [...] repo
git clone https://github.com/iterative/dvc-data.git
pip install dvc-data/
cd repo
dvc remote modify [correct settings for remote]
dvc pull -r remote -v specific_file

And this is the corresponding log:

Cloning into 'repo'
Cloning into 'dvc-data'...
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://[...]/repository/python/simple
Processing ./dvc-data
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: dictdiffer>=0.8.1 in /usr/local/lib/python3.11/site-packages (from dvc-data==3.15.0) (0.9.0)
Requirement already satisfied: pygtrie>=2.3.2 in /usr/local/lib/python3.11/site-packages (from dvc-data==3.15.0) (2.5.0)
Requirement already satisfied: dvc-objects<6,>=4.0.1 in /usr/local/lib/python3.11/site-packages (from dvc-data==3.15.0) (5.1.0)
Requirement already satisfied: fsspec>=2024.2.0 in /usr/local/lib/python3.11/site-packages (from dvc-data==3.15.0) (2024.2.0)
Requirement already satisfied: diskcache>=5.2.1 in /usr/local/lib/python3.11/site-packages (from dvc-data==3.15.0) (5.6.3)
Requirement already satisfied: attrs>=21.3.0 in /usr/local/lib/python3.11/site-packages (from dvc-data==3.15.0) (23.2.0)
Requirement already satisfied: sqltrie<1,>=0.11.0 in /usr/local/lib/python3.11/site-packages (from dvc-data==3.15.0) (0.11.0)
Requirement already satisfied: tqdm<5,>=4.63.1 in /usr/local/lib/python3.11/site-packages (from dvc-data==3.15.0) (4.66.2)
Requirement already satisfied: funcy>=1.14 in /usr/local/lib/python3.11/site-packages (from dvc-data==3.15.0) (2.0)
Requirement already satisfied: orjson in /usr/local/lib/python3.11/site-packages (from sqltrie<1,>=0.11.0->dvc-data==3.15.0) (3.9.15)
Building wheels for collected packages: dvc-data
  Building wheel for dvc-data (pyproject.toml): started
  Building wheel for dvc-data (pyproject.toml): finished with status 'done'
  Created wheel for dvc-data: filename=dvc_data-3.15.0-py3-none-any.whl size=71625 sha256=c0408066e0c879ae86abeeb9912448843d6e29242d0425a7939e3046d18608af
  Stored in directory: /tmp/pip-ephem-wheel-cache-dxywec08/wheels/f3/54/5c/b6e937f58e665afdb5b595d4525e2c7bcd3f93130d84cb31ff
Successfully built dvc-data
Installing collected packages: dvc-data
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dvc 3.48.4 requires dvc-data<3.15,>=3.13, but you have dvc-data 3.15.0 which is incompatible.
Successfully installed dvc-data-3.15.0
2024-03-23 08:37:15,307 DEBUG: v3.48.4 (pip), CPython 3.11.7 on Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.35
2024-03-23 08:37:15,307 DEBUG: command: /usr/local/bin/dvc pull -r remote -v data
2024-03-23 08:37:15,836 DEBUG: Checking if stage 'data' is in 'dvc.yaml'
2024-03-23 08:37:16,121 DEBUG: failed to load ('data',) from storage local (/app/repo/.dvc/cache/files/md5) - [Errno 2] No such file or directory: '/app/repo/.dvc/cache/files/md5/2a/12743bc97b37916c3f97f26c77622a.dir'
Traceback (most recent call last):
  File "/app/.local/lib/python3.11/site-packages/dvc_data/index/index.py", line 611, in _load_from_storage
    _load_from_object_storage(trie, entry, storage)
  File "/app/.local/lib/python3.11/site-packages/dvc_data/index/index.py", line 547, in _load_from_object_storage
    obj = Tree.load(storage.odb, root_entry.hash_info, hash_name=storage.odb.hash_name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.local/lib/python3.11/site-packages/dvc_data/hashfile/tree.py", line 193, in load
    with obj.fs.open(obj.path, "r") as fobj:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 324, in open
    return self.fs.open(path, mode=mode, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc_objects/fs/local.py", line 131, in open
    return open(path, mode=mode, encoding=encoding)  # noqa: SIM115
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/app/repo/.dvc/cache/files/md5/2a/12743bc97b37916c3f97f26c77622a.dir'

2024-03-23 08:37:19,790 DEBUG: Creating external repo git@[...other project...].git@8a1f41ecf2b1393e436ea9e5b04f8b42878212a2
2024-03-23 08:37:19,790 DEBUG: erepo: git clone 'git@[...other project...].git' to a temporary dir
2024-03-23 08:37:20,769 DEBUG: skipping index collection for data with invalid fsid - SCM error: Failed to clone repo 'git@[...other project...].git' to '/tmp/tmp6ljqj9cadvc-clone': Authentication failed for: 'git@gitlab[...]:22': Permission denied for user git on host gitlab[...]
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/asyncssh_vendor.py", line 295, in _run_command
    conn = await asyncssh.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncssh/connection.py", line 8269, in connect
    return await asyncio.wait_for(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncssh/connection.py", line 436, in _connect
    await options.waiter
asyncssh.misc.PermissionDenied: Permission denied for user git on host gitlab[...]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 260, in clone
    repo = clone_from()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/porcelain.py", line 546, in clone
    return client.clone(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 753, in clone
    result = self.fetch(path, target, progress=progress, depth=depth)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 839, in fetch
    result = self.fetch_pack(
             ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 1149, in fetch_pack
    proto, can_read, stderr = self._connect(b"upload-pack", path)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 1798, in _connect
    con = self.ssh_vendor.run_command(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/asyncssh_vendor.py", line 308, in _run_command
    raise AuthError(f"{username}@{host}:{port or 22}") from exc
scmrepo.exceptions.AuthError: Authentication failed for: 'git@gitlab[...]:22'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/dvc/scm.py", line 152, in clone
    git = Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/__init__.py", line 154, in clone
    backend.clone(url, to_path, bare=bare, mirror=mirror, **kwargs)
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 268, in clone
    raise CloneError(url, to_path) from exc
scmrepo.exceptions.CloneError: Failed to clone repo 'git@gitlab[...other project...].git' to '/tmp/tmp6ljqj9cadvc-clone'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/.local/lib/python3.11/site-packages/dvc_data/index/collect.py", line 101, in collect
    fsid = data.fs.fsid
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 558, in fsid
    return self.fs.fsid
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 223, in fsid
    self.repo.url or self.repo.root_dir,
    ^^^^^^^^^
  File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 198, in repo
    repo = self._make_repo(**self._repo_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 275, in _make_repo
    with Repo.open(uninitialized=True, **kwargs) as repo:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/__init__.py", line 297, in open
    return open_repo(url, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 60, in open_repo
    return _external_repo(url, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 23, in _external_repo
    path = _cached_clone(url, rev)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 134, in _cached_clone
    clone_path, shallow = _clone_default_branch(url, rev)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/funcy/decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/funcy/flow.py", line 246, in wrap_with
    return call()
           ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/funcy/decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 198, in _clone_default_branch
    git = clone(url, clone_path)
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/scm.py", line 157, in clone
    raise CloneError("SCM error") from exc
dvc.scm.CloneError: SCM error

@skshetry
Copy link
Member

Hi. You have installed dvc-data in /usr/local/, but the stack trace shows from /app/.local, so it seems it's not picking up what you installed.

@conrad-stork-basf
Copy link
Author

Hi. You have installed dvc-data in /usr/local/, but the stack trace shows from /app/.local, so it seems it's not picking up what you installed.

You are completely right. I installed it first locally and then saw that I did that. Afterwards I installed it in /usr/local/ and the same error arises. As I need to cut out the private information within the log, I was lazy and just posted the one that I already cut out. Here is the correct log. Sorry for the confusion, now it should be correct

Cloning into 'repo'...
2024-03-23 09:37:17,349 DEBUG: v3.48.4 (pip), CPython 3.11.7 on Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.35
2024-03-23 09:37:17,349 DEBUG: command: /usr/local/bin/dvc pull -r remote -v data
2024-03-23 09:37:17,763 DEBUG: Checking if stage 'data' is in 'dvc.yaml'
2024-03-23 09:37:17,988 DEBUG: failed to load ('data',) from storage local (/app/repo/.dvc/cache/files/md5) - [Errno 2] No such file or directory: '/app/repo/.dvc/cache/files/md5/2a/12743bc97b37916c3f97f26c77622a.dir'
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/dvc_data/index/index.py", line 611, in _load_from_storage
    _load_from_object_storage(trie, entry, storage)
  File "/usr/local/lib/python3.11/site-packages/dvc_data/index/index.py", line 547, in _load_from_object_storage
    obj = Tree.load(storage.odb, root_entry.hash_info, hash_name=storage.odb.hash_name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc_data/hashfile/tree.py", line 193, in load
    with obj.fs.open(obj.path, "r") as fobj:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 324, in open
    return self.fs.open(path, mode=mode, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc_objects/fs/local.py", line 131, in open
    return open(path, mode=mode, encoding=encoding)  # noqa: SIM115
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/app/repo/.dvc/cache/files/md5/2a/12743bc97b37916c3f97f26c77622a.dir'

2024-03-23 09:37:21,358 DEBUG: Creating external repo git@gitlab[...other repo...].git@8a1f41ecf2b1393e436ea9e5b04f8b42878212a2
2024-03-23 09:37:21,358 DEBUG: erepo: git clone 'git@gitlab[...other repo...].git' to a temporary dir
2024-03-23 09:37:22,490 DEBUG: skipping index collection for data with invalid fsid - SCM error: Failed to clone repo 'git@gitlab[...other repo...].git' to '/tmp/tmpvk3p3hcudvc-clone': Authentication failed for: 'git@gitlab[...]:22': Permission denied for user git on host gitlab[...]
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/asyncssh_vendor.py", line 295, in _run_command
    conn = await asyncssh.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncssh/connection.py", line 8269, in connect
    return await asyncio.wait_for(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncssh/connection.py", line 436, in _connect
    await options.waiter
asyncssh.misc.PermissionDenied: Permission denied for user git on host gitlab[...]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 260, in clone
    repo = clone_from()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/porcelain.py", line 546, in clone
    return client.clone(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 753, in clone
    result = self.fetch(path, target, progress=progress, depth=depth)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 839, in fetch
    result = self.fetch_pack(
             ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 1149, in fetch_pack
    proto, can_read, stderr = self._connect(b"upload-pack", path)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dulwich/client.py", line 1798, in _connect
    con = self.ssh_vendor.run_command(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/usr/local/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/asyncssh_vendor.py", line 308, in _run_command
    raise AuthError(f"{username}@{host}:{port or 22}") from exc
scmrepo.exceptions.AuthError: Authentication failed for: 'git@gitlab[...]:22'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/dvc/scm.py", line 152, in clone
    git = Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/__init__.py", line 154, in clone
    backend.clone(url, to_path, bare=bare, mirror=mirror, **kwargs)
  File "/usr/local/lib/python3.11/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 268, in clone
    raise CloneError(url, to_path) from exc
scmrepo.exceptions.CloneError: Failed to clone repo 'git@gitlab[...other repo...].git' to '/tmp/tmpvk3p3hcudvc-clone'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/dvc_data/index/collect.py", line 101, in collect
    fsid = data.fs.fsid
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 558, in fsid
    return self.fs.fsid
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 223, in fsid
    self.repo.url or self.repo.root_dir,
    ^^^^^^^^^
  File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 198, in repo
    repo = self._make_repo(**self._repo_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/fs/dvc.py", line 275, in _make_repo
    with Repo.open(uninitialized=True, **kwargs) as repo:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/__init__.py", line 297, in open
    return open_repo(url, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 60, in open_repo
    return _external_repo(url, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 23, in _external_repo
    path = _cached_clone(url, rev)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 134, in _cached_clone
    clone_path, shallow = _clone_default_branch(url, rev)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/funcy/decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/funcy/flow.py", line 246, in wrap_with
    return call()
           ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/funcy/decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/repo/open_repo.py", line 198, in _clone_default_branch
    git = clone(url, clone_path)
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dvc/scm.py", line 157, in clone
    raise CloneError("SCM error") from exc
dvc.scm.CloneError: SCM error

@skshetry
Copy link
Member

skshetry commented Mar 23, 2024

@conrad-stork-basf, is that all of the log? Can you try without -v? The traceback was changed to a debugging message so that it might appear on --verbose, but it is not a failure.

You can set DVC_SHOW_TRACEBACK to show actual traceback without using -v:

export DVC_SHOW_TRACEBACK=true
dvc pull ...

(I understand how it can be confusing, and I will change it to not print the traceback.)

skshetry added a commit to iterative/dvc-data that referenced this issue Mar 23, 2024
This might mislead the user to believe it's an error. See

iterative/dvc#10309 (comment)
skshetry added a commit to iterative/dvc-data that referenced this issue Mar 23, 2024
* collect: do not log exc_info

This might mislead the user to believe it's an error. See

iterative/dvc#10309 (comment)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@conrad-stork-basf
Copy link
Author

@conrad-stork-basf, is that all of the log? Can you try without -v? The traceback was changed to a debugging message so that it might appear on --verbose, but it is not a failure.

You can set DVC_SHOW_TRACEBACK to show actual traceback without using -v:

export DVC_SHOW_TRACEBACK=true
dvc pull ...

(I understand how it can be confusing, and I will change it to not print the traceback.)

You got me again. It is working. I always get confused by these error messages. So in future I will set the DVC_SHOW_TRACEBACK var and run dvc without -v.

Thanks a lot! Issue is solved.
Thanks and Best
Conrad

@skshetry
Copy link
Member

skshetry commented Mar 23, 2024

Closing. I have removed the traceback message and released dvc-data==3.15.1. I am sorry to confuse you. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment