Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skopeo sync: Support incremental updates to dir storage #1237

Open
lfarkas opened this issue Mar 31, 2021 · 21 comments
Open

skopeo sync: Support incremental updates to dir storage #1237

lfarkas opened this issue Mar 31, 2021 · 21 comments
Labels
kind/feature A request for, or a PR adding, new functionality stale-issue

Comments

@lfarkas
Copy link

lfarkas commented Mar 31, 2021

I'm try to copy a full registry with skopeo and keep it sync. but it's not possible or...at least only once.

for i in `podman search --format "{{.Name}}" $REGISTRY/`; do
	skopeo sync --scoped --all --src docker --dest dir $i $LOCAL_DIR/
done

unfortunately it's work for the first run but not in the second case. it's be very useful to be an overwrite and a delete option. in this case we can overwrite a local directory and keep updated with a given registry. of course i can write a more complicated shell script and it'll work but imho this would be a very useful feature.

@vrothberg
Copy link
Member

Thanks for reaching out. What is the exact error you are getting? Which version of Skopeo are you using?

@lfarkas
Copy link
Author

lfarkas commented Mar 31, 2021

1.2.2
Refusing to overwrite destination directory

@vrothberg vrothberg changed the title can't copy a full registry skopeo sync: cannot overwrite local directory Apr 1, 2021
@vrothberg
Copy link
Member

Thanks! Yes, skopeo sync refuses to overwrite existing directories on the host to prevent accidental data corruption. I think we can make the behavior conditional by adding a new flag (e.g., --dest-overwrite={true,false}).

@rhatdan, @flavio WDYT?

@flavio
Copy link
Contributor

flavio commented Apr 1, 2021

Yes, it would be nice to have that, but I'm somehow concerned about the amount of code to introduce.

IMHO a "real" sync would probably require at least:

  • Ability to skip the download of the data that isn't changed
  • Ability to overwrite data that changed
  • Ability to add new data
  • Ability to remove data that is no longer around

It feels like we're reimplementing rsync 😅

@lfarkas
Copy link
Author

lfarkas commented Apr 1, 2021

otherwise what's the current use case for sync?

  • I like to sync a whole registry eg: registry.example.com/
  • or a part of it eg. registry.example.com/foo/
  • or a given image with all version: registry.example.com/foo/bar
  • or a given version eg registry.example.com/foo/bar:1.2.3
    but in all case if I like to save it ot directory i like to get a resulting directory 100% consistent with the registry image (or that part of the registry which i sync).

anyway this is also the case when the destination is a docker registry.

@vrothberg
Copy link
Member

anyway this is also the case when the destination is a docker registry.

No. This only happens for the dir: transport.

@vrothberg
Copy link
Member

It feels like we're reimplementing rsync sweat_smile

🤣 I think we should keep it simple. The destination will be removed and recreated (i.e., full overwrite)?

@lfarkas
Copy link
Author

lfarkas commented Apr 2, 2021

for me it'd be even better the the current behaviour.

@rhatdan
Copy link
Member

rhatdan commented Apr 2, 2021

@vrothberg You must be a better typer then me. --dest-overwrite={true,false}

How about --force.

@vrothberg
Copy link
Member

I love typing but --force sounds good as well :)

@github-actions
Copy link

github-actions bot commented Jun 3, 2021

A friendly reminder that this issue had no activity for 30 days.

@bduffany
Copy link

bduffany commented Aug 23, 2021

my 2c is that making sync do a full overwrite (behind a --force flag or not) would be non-intuitive -- the word sync implies that it should only update the parts that are different.

If a full overwrite would be equivalent to rm -rf DEST || true followed by skopeo copy, then I think it's better for clients to do that themselves rather than invoke a sync command which (confusingly) does that for them.

@mtrmac
Copy link
Contributor

mtrmac commented Aug 23, 2021

I'm try to copy a full registry with skopeo and keep it sync.

I don’t think that’s not quite a use case. A directory tree of image blobs is not useful in itself. What is that set of files used for? Copied back to yet another registry? Something else?

@mtrmac
Copy link
Contributor

mtrmac commented Aug 23, 2021

Thanks! Yes, skopeo sync refuses to overwrite existing directories on the host to prevent accidental data corruption. I think we can make the behavior conditional by adding a new flag (e.g., --dest-overwrite={true,false}).

… and then c/image/directory.newImageDestination would just notice a dir:-formatted directory with the right version number and erase all contents, and nothing would be gained right now. So without more work around c/image, that flag would be no better than rm -rf $dest; skopeo sync, AFAICS.

@mtrmac
Copy link
Contributor

mtrmac commented Aug 23, 2021

I love typing but --force sounds good as well :)

IMHO --force is a trap; users use it to override one sanity check but end up overriding others they didn’t mean to override (or that will have been added much later in a new version).

Anyway, let’s figure out 1) what is the need, and 2) what do we want to do about that need, before tinkering with the UI of an unknown feature.

@mtrmac
Copy link
Contributor

mtrmac commented Aug 23, 2021

Yes, it would be nice to have that, but I'm somehow concerned about the amount of code to introduce.

It feels like we're reimplementing rsync 😅

Yeah, I’m torn. Skopeo originally was a noddy wrapper around c/image with basically nothing to worry about or design; skopeo sync is very different from that idea. OTOH skopeo sync has also been very popular and useful for some users.

Pragmatically, I think clean PRs are welcome, and if contributors or drive-by users want to take it much further than originally anticipated (and the maintainers have the bandwidth to keep up with those PRs), that’s great. (By “clean PRs” I mean not to hand off quick hacks and technical-debt to others to maintain.)

Alternatively, some users may be much better served with calling c/image directly from a much larger program, e.g. a build system / pipeline that does already have a database of artifacts and their known locations. (Or Skopeo would be forked if the maintainers couldn’t do a good enough job, of course.)

E.g. skopeo sync has already caused a contribution of c/image/copy.Options.OptimizeDestinationImageAlreadyExists, which made sync much more efficient and potentially helps other c/image users as well.

@mtrmac mtrmac changed the title skopeo sync: cannot overwrite local directory skopeo sync: Support incremental updates to dir storage Jan 31, 2022
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@wrender
Copy link

wrender commented Nov 3, 2022

It seems like a necessary feature to me. Force overwrite seems like a good interim solution. I agree though, the terminology "sync" makes you think it is actually syncing.

@mtrmac mtrmac added the kind/feature A request for, or a PR adding, new functionality label Dec 7, 2022
@FruityWelsh
Copy link

--force would seem overloaded in my mind. To be honest I would assume it would do things like ignore tls warnings, or ignore failed images, etc.

@cevich
Copy link
Member

cevich commented Sep 13, 2023

Bump: Ran into this fairly hard when working on containers/podman#19796

Scenario: You need to sync a huge number of images across multiple registry namespaces. It breaks somewhere in the middle or right at the end. Or, something it previously sync'd has become corrupted for some reason or another.

Could skopeo sync be made to do some minimal checking on the destination, and if it's borkd in some obvious way, clobber and re-sync it?

I would also be in support of some kind of --force or --overwrite solution, though less than ideal performance-wise, it would guarantee the "latest" stuff is actually synchronized.

Copy link

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A request for, or a PR adding, new functionality stale-issue
Projects
None yet
Development

No branches or pull requests

9 participants