Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve image upload process #797

Closed
ipanova opened this issue May 25, 2022 · 0 comments
Closed

Improve image upload process #797

ipanova opened this issue May 25, 2022 · 0 comments

Comments

@ipanova
Copy link
Member

ipanova commented May 25, 2022

I've noticed that we are doing unnecessary steps which have bad implications on the performance, network traffic and S3/Azure, in case object storage is used.

It appears that regular docker/podman push does not use chunked upload. At least I did not find any RANGE headers in the calls.
https://github.com/pulp/pulp_container/blob/main/pulp_container/app/registry_api.py#L624

It means that if it is a 1GB layer it will be uploaded as one chunk.
In such case we make extra read of data to make a chunk out of it, send to storage.
Later on we retrieve that data back to assemble chunks. In this case it is one chunk that is being read again and assembled in into an artifact. Artifact( same binary data as the chunk) is sent to storage.
As a result the upload takes longer because we have extra reads, plus we write twice to storage , i.e occupy more space, which is paid, in case of object storage like S3. In addition every request whether it is GET/PUT, etc is paid too on the object storage.

For 1GB layer, it takes 4.44 minutes to read the uploaded data, create chunk out of it and send to storage.
https://github.com/pulp/pulp_container/blob/main/pulp_container/app/registry_api.py#L638
https://github.com/pulp/pulpcore/blob/main/pulpcore/app/models/upload.py#L32 Need to look into this more in details but it seems like we read even here twice, unnecessary. Not sure why we create twice the ContentFile.
It take another 30secs to read that data back, assemble chunkes, init and validate an artifact and send that artifact to storage.

TLDR: when the upload is performed in one chunk, create directly an artifact out of it and send to the storage.

@ipanova ipanova self-assigned this May 25, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 14, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 14, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 14, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 14, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 14, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 14, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 15, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 21, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 21, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 21, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 23, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 23, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 23, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 23, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 23, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 23, 2022
ipanova added a commit to ipanova/pulp_container that referenced this issue Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant