Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved image upload process #810

Merged
merged 1 commit into from
Jun 24, 2022
Merged

Improved image upload process #810

merged 1 commit into from
Jun 24, 2022

Conversation

ipanova
Copy link
Member

@ipanova ipanova commented May 30, 2022

[noissue]
Required PR: pulp/pulpcore#2779

@ipanova ipanova marked this pull request as draft May 30, 2022 10:31
@ipanova ipanova force-pushed the upload-chunked branch 4 times, most recently from b164c80 to 1de5f18 Compare May 31, 2022 11:31
@ipanova ipanova force-pushed the upload-chunked branch 2 times, most recently from f1f4632 to b2ee794 Compare June 14, 2022 11:53
@ipanova ipanova requested a review from mdellweg June 14, 2022 11:53
@ipanova ipanova marked this pull request as ready for review June 14, 2022 11:53
@mdellweg mdellweg changed the title Debug upload Improved image upload process Jun 14, 2022
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure i understand the change properly, but i have the feeling, that one would be able to "steal" an artifact here by claiming to upload a blob with a certain checksum, not showing any chunks to the server and then committing said upload object.

pulp_container/app/registry_api.py Show resolved Hide resolved
pulp_container/app/registry_api.py Show resolved Hide resolved
pulp_container/app/registry_api.py Show resolved Hide resolved
pulp_container/app/registry_api.py Outdated Show resolved Hide resolved
@ipanova
Copy link
Member Author

ipanova commented Jun 14, 2022

I'm not sure i understand the change properly, but i have the feeling, that one would be able to "steal" an artifact here by claiming to upload a blob with a certain checksum, not showing any chunks to the server and then committing said upload object.

I would not be so worried about artifact stealing because we have many other windows when it can happen. For example in the sync pipeline. Artifact can be stolen by the time it reaches Content stage.
This PR change becomes clear and simple once you grasp the docker api specs. Docker client uses it a weird way - there is monolithic upload and chunked upload. Monolithic upload starts with POST by creating upload and then proceeds with PUT where one big chunk would be uploaded.
Chunk upload starts with POST by creating upload and then proceeds with PATCH where each PATCH contains a chunk with the range header. Then it finishes with PUT where it is allowed to whether submit last chunk or just empty body. This the signal of 'i have finished my chunked upload'.
Docker uses a hybrid of these 2 approaches. It uses chunked upload, but sends just one chunk and does not seem to use range header at all :)
Because of this hybrid approach we just perform extra actions like - create a chunk, save it to storage, retrieve all upload chunks( always one ) read it back from storage, assemble artifact from 1 chunk ( i.e. again write to storage)

@ipanova ipanova force-pushed the upload-chunked branch 2 times, most recently from 2de3faf to dde1826 Compare June 14, 2022 15:24
@mdellweg
Copy link
Member

I do not think there is a way to steal artifacts by using uploads in the current implementation. (I agree, until we have all the necessary queryset-scoping on the artifacts the holes are wide open in other areas, but we intend to close them.)
Creating the blob when the big chunk arrives is probably not possible, because that is really the duty of that last PUT call. So i would try to keep the artifact attached to the upload with on_delete=PROTECT.
Another approach may be to see if we can rename the single chunk in the storage backend without moving data around to turn it into an artifact.

@ipanova
Copy link
Member Author

ipanova commented Jun 15, 2022

I don't understand your last comment. This PR does not address or solve the artifact stealing issues at all - it focuses on performance. The data is not read nor written twice. This is important because even calculating the digest of the same data adds time. More importantly how object storage is involved here because users pay money for every write/read request to the bucket.

@mdellweg
Copy link
Member

I don't understand your last comment. This PR does not address or solve the artifact stealing issues at all - it focuses on performance. The data is not read nor written twice. This is important because even calculating the digest of the same data adds time. More importantly how object storage is involved here because users pay money for every write/read request to the bucket.

What i wanted to say is that it introduces a new way to steal an artifact. One more we that need to think about later if we don't solve it right away.

@ipanova ipanova force-pushed the upload-chunked branch 4 times, most recently from 1240c22 to 3875a6c Compare June 23, 2022 10:37
pulp_container/app/models.py Outdated Show resolved Hide resolved
@ipanova ipanova force-pushed the upload-chunked branch 4 times, most recently from 91f57cb to 3ca9027 Compare June 23, 2022 15:58
@ipanova ipanova requested a review from mdellweg June 23, 2022 16:00
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last concern i found.

pulp_container/app/models.py Show resolved Hide resolved
pulp_container/app/registry_api.py Show resolved Hide resolved
@ipanova ipanova merged commit 29b83b5 into pulp:main Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants