Skip to content

Running import-url on S3 URL gives error message "requires existing cache on 's3' remote" #4261

@nimrand

Description

@nimrand

Bug Report

We have a corpus (i.e., directory of text documents) that are stored in a company-wide S3 data store at s3://my-company-research-data/data/corpus.

When I run dvc import-url s3://my-company-research-data/data/corpus ./local/path, I get an error:

ERROR: failed to import s3://duolingo-research-data/det/COCA. You could also try downloading it manually, and adding it with dvc add. - Current operation was unsuccessful because 's3://my-company-research-data/data/corpus' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.. Per, this thread, this appears to be a bug.

Please provide information about your setup

Output of dvc version:

$ dvc version

DVC version: 1.1.11
Python version: 3.7.3
Platform: Darwin-18.7.0-x86_64-i386-64bit
Binary: False
Package: pip
Supported remotes: http, https, s3
Repo: dvc, git

Additional Information (if any):

When I run the same command with --verbose, this is what i get:

2020-07-22 17:20:16,566 DEBUG: fetched: [(3,)]                                  
2020-07-22 17:20:16,935 DEBUG: Removing output 'challenge/common/data/COCA-corpus' of stage: 'COCA-corpus.dvc'.
Importing 's3://duolingo-research-data/det/COCA' -> 'challenge/common/data/COCA-corpus'
2020-07-22 17:20:16,936 DEBUG: Computed stage: 'COCA-corpus.dvc' md5: 'fda33f7e862514b4c924b2692aff808d'
2020-07-22 17:20:16,936 DEBUG: 'md5' of stage: 'COCA-corpus.dvc' changed.
2020-07-22 17:20:17,523 DEBUG: fetched: [(0,)]
2020-07-22 17:20:17,524 ERROR: failed to import s3://duolingo-research-data/det/COCA. You could also try downloading it manually, and adding it with `dvc add`. - Current operation was unsuccessful because 's3://duolingo-research-data/det/COCA' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/command/imp_url.py", line 18, in run
    no_exec=self.args.no_exec,
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/repo/__init__.py", line 34, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/repo/imp_url.py", line 54, in imp_url
    stage.run()
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/funcy/decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/decorators.py", line 35, in rwlocked
    return call()
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/funcy/decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/__init__.py", line 424, in run
    sync_import(self, dry, force)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/imports.py", line 29, in sync_import
    stage.save_deps()
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/__init__.py", line 387, in save_deps
    dep.save()
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/output/base.py", line 267, in save
    self.info = self.save_info()
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/output/base.py", line 191, in save_info
    return self.tree.save_info(self.path_info)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/tree/base.py", line 314, in save_info
    self.PARAM_CHECKSUM: self.get_hash(path_info, tree=tree, **kwargs)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/tree/base.py", line 282, in get_hash
    hash_ = self.get_dir_hash(path_info, tree, **kwargs)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/tree/base.py", line 296, in get_dir_hash
    raise RemoteCacheRequiredError(path_info)
dvc.exceptions.RemoteCacheRequiredError: Current operation was unsuccessful because 's3://duolingo-research-data/det/COCA' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.
------------------------------------------------------------

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugDid we break something?

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions