Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

Linear scan of a repository with lots (200+) of tags causes pull IO timeout #614

Open
bshi opened this issue Oct 7, 2014 · 11 comments
Open
Labels
Milestone

Comments

@bshi
Copy link
Contributor

bshi commented Oct 7, 2014

This was originally reported at GoogleCloudPlatform#22

It seems when performing "docker pull foo/bar:sometag" the repository performs a linear scan of ALL tags in "foo/bar". When backed by object storage systems like GCS, this can take a long time. It has broken image distribution for us.

@bshi
Copy link
Contributor Author

bshi commented Oct 22, 2014

As part of this investigation, I discovered that "docker pull foo/bar:sometag" will incur a hit to the API endpoint that lists all tags for 'foo/bar'. This seems a bit wasteful. @dmp42 - as I'm not too familiar with the details of 'docker pull', perhaps you know offhand whether this is indeed unnecessary work and whether it's worth filing a bug in docker?

@dmp42
Copy link
Contributor

dmp42 commented Oct 22, 2014

@bshi I don't think reporting this is worth the effort - energy IMO is better focused on registry v2 development.

@wking
Copy link
Contributor

wking commented Oct 22, 2014

On Tue, Oct 21, 2014 at 07:06:37PM -0700, Bo Shi wrote:

As part of this investigation, I discovered that "docker pull
foo/bar:sometag" will incur a hit to the API endpoint that lists all
tags for 'foo/bar'.

I haven't looked at the client-side code (at least recently enough to
remember), but I'd expect you'd need to do this for the new alias
detection (moby/moby#8141). Unless you had a separate endpoint
for "give me all aliases for $NAMESPACE/$REPO:$TAG". We could
actually support something like that efficiently if we had something
like refcount tracking (#606, #409), since I was recording the
referring ([namespace, repository, tag], descendant_id) entry for each
ancestor image id [1,2]. We'd just have to iterate through referrers
and return tags matching $NAMESPACE/$REPO where descendant_id ==
image_id. With my new (proposed) atomic/streaming storage separation
3, accessing the referrers is probably a single hit to your atomic
storage (to the image-references entry for the requested image).

@dmp42
Copy link
Contributor

dmp42 commented Oct 28, 2014

One of the terribly inefficient thing right now is that we do not only ls but also read the (tag) file contents.
That second part is going away.

Driver will still need to provide an efficient ls (and we can alleviate part of the pain by caching the result).

@dmp42
Copy link
Contributor

dmp42 commented Nov 4, 2014

@bshi (and other gcs people?) - the new go drivers API #643 is going final and will be merged soon. The time is right to voice concerns :-)

@bshi
Copy link
Contributor Author

bshi commented Nov 4, 2014

Skimmed the discussion in #643 - it seems like you guys are aware and thinking about the issue of unbounded looping over driver interface methods. One other concern is the underlying storage consistency model and what the registry expects of the drivers. S3 doesn't even have consistent (:P) consistency models across S3 regions.

@dmp42
Copy link
Contributor

dmp42 commented Nov 5, 2014

Consistency... we think about it, a lot :-) cc @stevvooe

@dmp42 dmp42 added this to the 1.0 milestone Nov 5, 2014
@stevvooe
Copy link
Contributor

stevvooe commented Nov 6, 2014

@bshi Consistency and coordination are definitely something we are thinking about. Unfortunately, many of the storage backends lack consistency and don't have any kind of transactional coordination. The new registry will likely require some sort of coordination layer to mitigate that. Watch out for upcoming proposals, as you're input will be appreciated.

@wking
Copy link
Contributor

wking commented Nov 6, 2014

On Thu, Nov 06, 2014 at 10:48:58AM -0800, Stephen Day wrote:

The new registry will likely require some sort of coordination layer
to mitigate that.

I'd just keep the mutable (i.e. not content-addressable) stuff in a
storage backend that does support transactions 1. Then you can
offload the coordination to that storage engine (e.g. let Redis handle
the transaction implementation and just use MULTI/EXEC/DISCARD). Then
you get don't need to figure out a way to handle transaction locking
between multiple registries that are sharing the same backing storage.

@rlister
Copy link

rlister commented May 29, 2015

Just as a fun data-point, I started to run into this problem right around 1700 tags for a repo, backed with S3. We are doing continuous builds, so I am able to workaround by periodically deleting large repos and re-pushing as needed.

@duyanghao
Copy link

duyanghao commented Jun 24, 2016

I have the similar problems @bshi @dmp42 .my storage backend is s3-ceph,So has this problem been solved yet? urgently!!!

i guess it is the problem of api "/v1/repositories/repo/tags", it is using python gevent to pull all tags file from storage backend,read it and return to the docker.it is using too much time!
Maybe it exists someway to achieve that api with more efficiency,i am trying to do that!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants