Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc push: Unexpected error when pushing to Google Cloud storage or S3 #10206

Closed
turkanis opened this issue Dec 28, 2023 · 12 comments · Fixed by #10208
Closed

dvc push: Unexpected error when pushing to Google Cloud storage or S3 #10206

turkanis opened this issue Dec 28, 2023 · 12 comments · Fixed by #10208
Assignees
Labels
bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP.

Comments

@turkanis
Copy link

turkanis commented Dec 28, 2023

Bug Report

dvc push: "Unexpected error" when pushing to Google Cloud storage or S3

Reproduce

dvc init
dvc remote add -d s3 s3://bucket  # or gcs gs://bucket
dvc import-url https://data.dvc.org/get-started/data.xml
dvc push -v

output (s3):

2023-12-27 19:56:42,605 DEBUG: v3.36.1 (pip), CPython 3.9.18 on Linux-5.15.139-93.147.amzn2.x86_64-x86_64-with-glibc2.26
2023-12-27 19:56:42,605 DEBUG: command: /path/bin/dvc push -v
Collecting                                                                                                                           |0.00 [00:00,    ?entry/s]
Pushing                                                                                                                              |0.00 [00:00,     ?file/s]
Collecting my.bucket/key on s3                                                                                                |3.00 [00:00, 4.84entry/s]
2023-12-27 19:56:43,676 ERROR: unexpected error
Traceback (most recent call last):
  File "/path/lib/python3.9/site-packages/dvc/cli/__init__.py", line 211, in main
    ret = cmd.do_run()
  File "/path/lib/python3.9/site-packages/dvc/cli/command.py", line 27, in do_run
    return self.run()
  File "/path/lib/python3.9/site-packages/dvc/commands/data_sync.py", line 64, in run
    processed_files_count = self.repo.push(
  File "/path/lib/python3.9/site-packages/dvc/repo/__init__.py", line 65, in wrapper
    return f(repo, *args, **kwargs)
  File "/path/lib/python3.9/site-packages/dvc/repo/push.py", line 144, in push
    push_transferred, push_failed = ipush(
  File "/path/lib/python3.9/site-packages/dvc_data/index/push.py", line 101, in push
    old = build(data.path, data.fs)
  File "/path/lib/python3.9/site-packages/dvc_data/index/build.py", line 90, in build
    for entry in build_entries(path, fs, ignore=ignore):
  File "/path/lib/python3.9/site-packages/dvc_data/index/build.py", line 55, in build_entries
    walk_iter = fs.walk(path, detail=detail)
  File "/path/lib/python3.9/site-packages/dvc_http/__init__.py", line 162, in walk
    raise NotImplementedError
NotImplementedError

2023-12-27 19:56:43,752 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-12-27 19:56:43,755 DEBUG: Removing '/path/.MHVNkr3eAijD7Q5aau3NRK.tmp'
2023-12-27 19:56:43,755 DEBUG: Removing '/path/.MHVNkr3eAijD7Q5aau3NRK.tmp'
2023-12-27 19:56:43,757 DEBUG: Removing '/path/.MHVNkr3eAijD7Q5aau3NRK.tmp'
2023-12-27 19:56:43,757 DEBUG: Removing '/path/bkw-9036/.dvc/cache/files/md5/.mnnSioPUuXvRUCqUV2ug87.tmp'
2023-12-27 19:56:43,777 DEBUG: Version info for developers:
DVC version: 3.36.1 (pip)
-------------------------
Platform: Python 3.9.18 on Linux-5.15.139-93.147.amzn2.x86_64-x86_64-with-glibc2.26
Subprojects:
	dvc_data = 3.3.0
	dvc_objects = 3.0.0
	dvc_render = 1.0.0
	dvc_task = 0.3.0
	scmrepo = 2.0.2
Supports:
	gs (gcsfs = 2023.12.2.post1),
	http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
	s3 (s3fs = 2023.12.2, boto3 = 1.33.13)
Config:
	Global: /home/jdt/.config/dvc
	System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme1n1p1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme1n1p1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/9d9135fb99d9d827364c4dc5a42cdc60

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-12-27 19:56:43,781 DEBUG: Analytics is enabled.
2023-12-27 19:56:43,860 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpccxiwrmd', '-v']
2023-12-27 19:56:43,871 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpccxiwrmd', '-v'] with pid 22406

output (gcs):

2023-12-27 19:47:22,768 DEBUG: v3.36.1 (pip), CPython 3.9.18 on Linux-5.15.139-93.147.amzn2.x86_64-x86_64-with-glibc2.26
2023-12-27 19:47:22,769 DEBUG: command: /path/bin/dvc push -v
Collecting                                                                                                                           |0.00 [00:00,    ?entry/s]
Pushing                                                                                                                              |0.00 [00:00,     ?file/s]
Collecting bucket/path on gs                                                                                          |3.00 [00:01, 2.84entry/s]
2023-12-27 19:47:24,328 ERROR: unexpected error
Traceback (most recent call last):
  File "/path/lib/python3.9/site-packages/dvc/cli/__init__.py", line 211, in main
    ret = cmd.do_run()
  File "/path/lib/python3.9/site-packages/dvc/cli/command.py", line 27, in do_run
    return self.run()
  File "/path/lib/python3.9/site-packages/dvc/commands/data_sync.py", line 64, in run
    processed_files_count = self.repo.push(
  File "/path/lib/python3.9/site-packages/dvc/repo/__init__.py", line 65, in wrapper
    return f(repo, *args, **kwargs)
  File "/path/lib/python3.9/site-packages/dvc/repo/push.py", line 144, in push
    push_transferred, push_failed = ipush(
  File "/path/lib/python3.9/site-packages/dvc_data/index/push.py", line 101, in push
    old = build(data.path, data.fs)
  File "/path/lib/python3.9/site-packages/dvc_data/index/build.py", line 90, in build
    for entry in build_entries(path, fs, ignore=ignore):
  File "/path/lib/python3.9/site-packages/dvc_data/index/build.py", line 55, in build_entries
    walk_iter = fs.walk(path, detail=detail)
  File "/path/lib/python3.9/site-packages/dvc_http/__init__.py", line 162, in walk
    raise NotImplementedError
NotImplementedError

2023-12-27 19:47:24,370 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-12-27 19:47:24,371 DEBUG: Removing '/path/.fJ4uXqQznknWmbrzzUTXLQ.tmp'
2023-12-27 19:47:24,371 DEBUG: Removing '/path/.fJ4uXqQznknWmbrzzUTXLQ.tmp'
2023-12-27 19:47:24,371 DEBUG: Removing '/path/.fJ4uXqQznknWmbrzzUTXLQ.tmp'
2023-12-27 19:47:24,371 DEBUG: Removing '/path/bkw-9036/.dvc/cache/files/md5/.M6iwnJkjQgKzg54kN6chVi.tmp'
2023-12-27 19:47:24,377 DEBUG: Version info for developers:
DVC version: 3.36.1 (pip)
-------------------------
Platform: Python 3.9.18 on Linux-5.15.139-93.147.amzn2.x86_64-x86_64-with-glibc2.26
Subprojects:
	dvc_data = 3.3.0
	dvc_objects = 3.0.0
	dvc_render = 1.0.0
	dvc_task = 0.3.0
	scmrepo = 2.0.2
Supports:
	gs (gcsfs = 2023.12.2.post1),
	http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3)
Config:
	Global: /home/jdt/.config/dvc
	System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme1n1p1
Caches: local
Remotes: gs
Workspace directory: ext4 on /dev/nvme1n1p1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/9d9135fb99d9d827364c4dc5a42cdc60

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-12-27 19:47:24,379 DEBUG: Analytics is enabled.
2023-12-27 19:47:24,445 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpk_30nnlt', '-v']
2023-12-27 19:47:24,455 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpk_30nnlt', '-v'] with pid 15755

Expected

Successful push

Environment information

DVC version: 3.36.1 (pip)
-------------------------
Platform: Python 3.9.18 on Linux-5.15.139-93.147.amzn2.x86_64-x86_64-with-glibc2.26
Subprojects:
	dvc_data = 3.3.0
	dvc_objects = 3.0.0
	dvc_render = 1.0.0
	dvc_task = 0.3.0
	scmrepo = 2.0.2
Supports:
	gs (gcsfs = 2023.12.2.post1),
	http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
	s3 (s3fs = 2023.12.2, boto3 = 1.33.13)
Config:
	Global: /home/jdt/.config/dvc
	System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme1n1p1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme1n1p1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/c9c73dbc105eb09a15137f49a60e6a5b

Additional Information (if any):

@efiop
Copy link
Contributor

efiop commented Dec 28, 2023

Looks like we are accidentally trying to push to that import-url's fs. Taking a look...

@efiop efiop self-assigned this Dec 28, 2023
@efiop efiop added the bug Did we break something? label Dec 28, 2023
@efiop efiop added this to DVC Dec 28, 2023
@github-project-automation github-project-automation bot moved this to Backlog in DVC Dec 28, 2023
@efiop efiop added the p0-critical Critical issue. Needs to be fixed ASAP. label Dec 28, 2023
@efiop
Copy link
Contributor

efiop commented Dec 28, 2023

Ok, with index migration we need storage.read_only that we can use during index building to say that this remote is read-only. This is similar to odb-only read_only that we used to have in the past. Already preparing a fix.

@skshetry
Copy link
Member

Is it about read-only? Or, is it about traversable/non-traversable remotes? Our implementation of http is read+write.

@efiop
Copy link
Contributor

efiop commented Dec 28, 2023

@skshetry We are trying to push to http that we've import-url-ed from and which is not even set up as a remote, so the former.

@efiop
Copy link
Contributor

efiop commented Dec 28, 2023

@turkanis Could you try upstream dvc, please? https://dvc.org/doc/install/pre-release Should be fixed now.

@turkanis
Copy link
Author

@efiop

Thanks for your quick response.

With the pre-release version, I no longer get an error when I run dvc push. However, no push to remote storage occurs. Instead, I get this output:

Collecting                                        |0.00 [00:00,    ?entry/s]
Pushing
Everything is up to date.

Is this the expected behavior? It is not what I expected. If, instead, I add the --to-remote option to dvc url-import, the filve is added to remote storage, and I can download it with dvc pull.

To summarize, without --to-remote, the file is copied to the project directory, a .dvc tracking file is created, but the data never gets written to remote storage. Running dvc import-url with --to-remote followed by dvc pull gives me the behavior I was expecting initially from dvc import-url

@efiop
Copy link
Contributor

efiop commented Dec 29, 2023

@turkanis If you've pushed already and there are no new files to push then it is expected that it is not pushing anything.

--to-remote works in a different way and will indeed do something every time, because it is assuming that you are sure that that file is not present in the remote yet.

@turkanis
Copy link
Author

@efiop

In the scenario I described, I had never pushed before, and there were no objects in remote storage.

@skshetry
Copy link
Member

@turkanis, you might be facing #10194.

@turkanis
Copy link
Author

@efiop

I think I have not described the situation well. Here are the 2 scenarios that occur. In both cases, some of the behavior seems wrong to me. Both start out with nothing in remote storage:

Scenario I

 git init
 dvc init
 dvc remote add -d gcs gs://bucket/dvc-test
 dvc import-url https://data.dvc.org/get-started/data.xml

   # creates data.xml
   # creates data.xml.dvc
   # adds entry in .gitignore

 dvc push 

   # nothing written to remote storage

Scenario II

git init
dvc init
dvc remote add -d gcs gs://bucket/dvc-test
dvc import-url --to-remote https://data.dvc.org/get-started/data.xml

	# creates data.xml.dvc
	# no entry added to .gitignore
	
dvc pull

	# creates data.xml
	# git shows data.xml as untracked

Is this expected behavior? What I want is the imported file to be written to remote storage and to have a .gitignore entry created automatically.

Thanks for your help!

@efiop
Copy link
Contributor

efiop commented Dec 30, 2023

@turkanis Thanks for a detailed description! First one will be solved in #10194 (I left a comment there) and the second one looks like a separate bug. Could you create an issue for the latter one, please?

@turkanis
Copy link
Author

turkanis commented Jan 3, 2024

@efiop

I have created #10217

BradyJ27 pushed a commit to BradyJ27/dvc that referenced this issue Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP.
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants