Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitLab registry HTTP: Migrating from dvc 2 to dvc 3 and pushing leads to Bad Request #10553

Closed
NicoLivesey opened this issue Sep 11, 2024 · 6 comments
Labels
A: data-sync Related to dvc get/fetch/import/pull/push awaiting response we are waiting for your reply, please respond! :)

Comments

@NicoLivesey
Copy link

Bug Report

Description

I have been trying for a few months to migrate from dvc 2 to dvc 3 without success.
We use a gitlab package registry as remote storage and communicate with it through HTTP (as explained here). We authenticate through Personal Access Token and everything was working fine on DVC 2. I successfully upgraded our CI to dvc 3 to leverage the dvc.yaml features when pulling data and models but failed for local env and pushing new dvc files.
I migrated all the cache and files to the dvc 3 format with dvc cache migrate --dvc-files and when I push I get 400 Bad Request and I think this has something to do with the url changing from /cache/81/<the-number> to /cache/files/md5/81/<the-number>.

Right now I just keep working with dvc 2 in local env and dvc 3 for ci but I would really like to upgrade everything to dvc 3.

Reproduce

  1. In a dvc 2 repository with a http remote storage
  2. dvc cache migrate --dvc-files
  3. dvc push

Expected

New files are pushed successfully to remote

Environment information

DVC version: 3.54.1 (pip)
-------------------------
Platform: Python 3.12.4 on macOS-12.6.1-arm64-arm-64bit
Subprojects:
	dvc_data = 3.16.5
	dvc_objects = 5.1.0
	dvc_render = 1.0.2
	dvc_task = 0.4.0
	scmrepo = 3.3.7
Supports:
	http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3)
Config:
	Global: ...
	System: ...
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: ...
@shcheklein
Copy link
Member

@sisp I think might have more information on this

PTAL here iterative/dvc-http#50 (comment)

I don't we support the GitLab in DVC 3 in this way atm :(

@shcheklein shcheklein added the awaiting response we are waiting for your reply, please respond! :) label Sep 11, 2024
@shcheklein shcheklein changed the title Migrating from dvc 2 to dvc 3 and pushing on HTTP remote storage leads to Bad Request GitLab registry HTTP: Migrating from dvc 2 to dvc 3 and pushing leads to Bad Request Sep 11, 2024
@shcheklein shcheklein added the A: data-sync Related to dvc get/fetch/import/pull/push label Sep 11, 2024
@sisp
Copy link
Contributor

sisp commented Sep 11, 2024

I stopped using GitLab's generic package registry for the same reason plus the lacking garbage collection capability by the dvc-http plugin. After thorough consideration, I believe the best way forward is in fact #6148. I've prototyped representing a basic filesystem with OCI and have looked into creating a corresponding fsspec plugin. oras-py seems to be the only (maintained) OCI client library for Python, and it's based on requests and not aiohttp like most (high-performance) fsspec plugins. Hence, it seems that implementing a small OCI client library using aiohttp – at least for the needed OCI registry endpoints, and not necessarily as a separate package – is the first step. Unless I've overlooked something ...

@NicoLivesey
Copy link
Author

Thanks a lot for your quick response. My understanding is that if I have an alternative to gitlab package registry I should move to it. I will check all you links @sisp thank you !

@sisp
Copy link
Contributor

sisp commented Sep 12, 2024

My understanding is that if I have an alternative to gitlab package registry I should move to it.

Not sure if I understand you right. DVC v3 is no longer compatible with the unofficial use of GitLab's generic package registry via the dvc-http plugin. I currently don't know any working alternative integration with GitLab. But I believe that adding an OCI plugin for DVC would restore GitLab integration via its container registry. As a side effect, OCI registries are not limited to GitLab but widely adopted.

@NicoLivesey
Copy link
Author

I mean I can use another remote storage such as s3 ? (It is a bit painful in my case but not related to this issue haha)

@sisp
Copy link
Contributor

sisp commented Sep 12, 2024

Yes, migrating to, e.g., S3 should be no problem, but you loose the seamless integration with GitLab, which is what I'm after at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push awaiting response we are waiting for your reply, please respond! :)
Projects
None yet
Development

No branches or pull requests

3 participants