-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Pull-through caching #1299
Add Pull-through caching #1299
Conversation
pulp_container/app/models.py
Outdated
""" | ||
TODO: Add permissions. | ||
""" | ||
TYPE = "container" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a new identifier. (almost certain)
6555d70
to
c4028e0
Compare
d4ab757
to
64640d5
Compare
@ipanova, @mdellweg, would you mind reviewing this PR? Focus on the underlying logic. Things to consider:
|
64640d5
to
1534909
Compare
Oh, the reason why I did not include sending head requests beforehand is that "docker-content-digest" is not a required header and is probably not present in other non-docker registries. |
Can't you dispatch the add_content task from the content app? In the end, the content app does not even need to wait for it to finish, right? |
Sorry, wrong button |
I believe the problem was due to the asynchronous context. I could not dispatch the task because of it. |
@ipanova and I concluded that we should preserve the idea of adding content to a repository (the exact opposite of what we are doing in other plugins). The 4th bullet point is no longer a concern if we assume that there is a user who does not have cached layers on his system and will eventually download all pending blobs (this leads to committing the repository version). Repositories/distributions, created from special distributions, will be visible to users because we allow the pull operation. Besides that, we identified two flaws in the current implementation:
Things to work on next:
|
86fb607
to
c50e570
Compare
af9443b
to
88d8c84
Compare
pulp_container/app/registry.py
Outdated
|
||
digest = response.headers.get("docker-content-digest") | ||
if tag.tagged_manifest.digest != digest: | ||
downloader = remote.get_downloader(url=tag_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you already have it on line 208
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we check for the digest with the HEAD request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's ok but you don't need to write twice remote.get_downloader(url=tag_url)
pulp_container/app/registry.py
Outdated
media_type = determine_media_type(manifest_data, response) | ||
if media_type not in (MEDIA_TYPE.MANIFEST_LIST, MEDIA_TYPE.INDEX_OCI): | ||
await self.save_manifest_and_blobs( | ||
digest, manifest_data, media_type, remote, repository, saved_artifact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the digest you got from docker-content-digest header is not reliable because that is not a required header. you should resort to calcualt_digest as in the except branch on line 176
pulp_container/app/registry.py
Outdated
async def save_manifest_and_blobs( | ||
self, digest, manifest_data, media_type, remote, repository, artifact | ||
): | ||
config_digest = manifest_data["config"]["digest"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are you sure this is not a schema1 manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not. 😭
pulp_container/app/registry.py
Outdated
try: | ||
manifest_data = json.loads(raw_data) | ||
except json.decoder.JSONDecodeError: | ||
raise PathNotResolved(digest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be path here
@@ -318,7 +486,54 @@ async def get_by_digest(self, request): | |||
"Docker-Content-Digest": ca_content.digest, | |||
} | |||
except ObjectDoesNotExist: | |||
raise PathNotResolved(path) | |||
distribution = await distribution.acast() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to do something about this code being repeated 3 times in 3 places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted a new class.
manifest = repository.pending_manifests.get(digest=pk) | ||
manifest.touch() | ||
except models.Manifest.DoesNotExist: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not raising the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still a chance that the fired pull-through download task has not finished and the user is trying to get a new listed manifest. We do not pre-record all listed manifests and their blobs in content-app. So, the manifest does not exist in pending_manifests
and is still not associated with any repository.
I thought this is the desired behavior. Since this is a pull-through-cache repository, it should match exactly the content remotely. So if a tag was removed remotely I would not know why we should keep it locally. |
But, using How can I forcefully tell the sync pipeline to not remove the existing tag? Also, how do we know if the tag was removed from the remote registry if the user never asks for it and thus we never realize that? |
@lubosmj since we using |
f5dab49
to
cbd654e
Compare
pulp_container/app/registry.py
Outdated
else: | ||
raise PathNotResolved(tag_name) | ||
else: | ||
if distribution.remote_id and distribution.pull_through_distribution_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the cast call here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly state that inside this if branch we are working through the pull-through distribution.
pulp_container/app/registry.py
Outdated
extra_data={"headers": V2_ACCEPT_HEADERS, "http_method": "head"} | ||
) | ||
except ClientResponseError: | ||
raise PathNotResolved(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, return the existing tag. The tag will just not be refreshed.
pulp_container/app/registry.py
Outdated
"Docker-Distribution-API-Version": "registry/2.0", | ||
} | ||
return web.Response(text=raw_manifest, headers=headers) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, mention that we parse "blobs" and initialize a remote artifact here.
# it is necessary to pass this information back to the client | ||
raise HTTPTooManyRequests() | ||
else: | ||
raise PathNotResolved(self.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave a TODO comment about possible changes in the future. Right now, we are masking error messages that might be useful to the client.
pulp_container/app/registry.py
Outdated
|
||
manifest = Manifest( | ||
digest=digest, | ||
schema_version=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get the schema version from media_type
.
tag = models.Tag(name=pk, tagged_manifest=manifest) | ||
try: | ||
tag.save() | ||
except IntegrityError: | ||
tag = models.Tag.objects.get(name=tag.name, tagged_manifest=manifest) | ||
tag.touch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the tag to the repository via an immediate task.
pulp_container/app/registry_api.py
Outdated
@@ -1207,12 +1325,18 @@ def head(self, request, path, pk=None): | |||
|
|||
def get(self, request, path, pk): | |||
"""Return a signature identified by its sha256 checksum.""" | |||
_, _, repository_version = self.get_drv_pull(path) | |||
_, repository, repository_version = self.get_drv_pull(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, revert this change.
@@ -1302,6 +1382,103 @@ def destroy(self, request, pk, **kwargs): | |||
return OperationPostponedResponse(async_result, request) | |||
|
|||
|
|||
class ContainerPullThroughDistributionViewSet(DistributionViewSet, RolesMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: add a comment about inheriting the private flag from the pull-through cache distribution.
pulp_container/app/registry_api.py
Outdated
**remote_data, | ||
) | ||
|
||
cache_distribution, _ = models.ContainerDistribution.objects.get_or_create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO in the future? Propagate the permissions and private flag from the pull-through distribution to this distribution.
pre-configure a new repository and sync it to facilitate the retrieval of the actual content. This | ||
speeds up the whole process of shipping containers from its early management stages to distribution. | ||
Similarly to on-demand syncing, the feature also **reduces external network dependencies**, and | ||
ensures a more reliable container deployment system in production environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distributions are public by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We walked through the PR on the call and @lubosmj has few things to update and can mere afterwards
b53d1d4
to
d096e0d
Compare
d096e0d
to
af01caa
Compare
af01caa
to
f57edfe
Compare
closes #507