Skip to content

Commit

Permalink
Enable chunked uploads (#150)
Browse files Browse the repository at this point in the history
* add arg to enable blob chunking and allow custom chunk sizes
* test.sh: expose ORAS_PORT
* provider: update location on every chunk PATCH
* GHA: make space for large files

Signed-off-by: Brian Cook <bcook@redhat.com>
Signed-off-by: Isabella do Amaral <idoamara@redhat.com>
  • Loading branch information
isinyaaa authored Sep 4, 2024
1 parent caf8db5 commit dfc2415
Show file tree
Hide file tree
Showing 8 changed files with 169 additions and 44 deletions.
97 changes: 66 additions & 31 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -1,47 +1,82 @@
name: Oras Python Tests
on:
pull_request: []
pull_request:

jobs:
formatting:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check Spelling
uses: crate-ci/typos@7ad296c72fa8265059cc03d1eda562fbdfcd6df2 # v1.9.0
with:
files: ./docs ./README.md
- uses: actions/checkout@v4
- name: Check Spelling
uses: crate-ci/typos@7ad296c72fa8265059cc03d1eda562fbdfcd6df2 # v1.9.0
with:
files: ./docs ./README.md

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Lint Oras Python
run: |
python --version
python3 -m pip install pre-commit
python3 -m pip install black
make develop
make lint
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Lint Oras Python
run: |
python --version
python3 -m pip install pre-commit
python3 -m pip install black
make develop
make lint
test-oras-py:
runs-on: ubuntu-latest
services:
registry:
image: ghcr.io/oras-project/registry:latest
ports:
- 5000:5000
- 5000:5000
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Test Oras Python
env:
registry_host: localhost
registry_port: ${{ job.services.registry.ports[5000] }}
REGISTRY_STORAGE_DELETE_ENABLED: "true"
run: |
make install
make test
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Make space for large files
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/lib/android
sudo rm -rf /opt/ghc
sudo apt-get remove -y firefox || true
sudo apt-get remove -y google-chrome-stable || true
sudo apt purge openjdk-* || echo "OpenJDK is not installed"
sudo apt remove --autoremove openjdk-* || echo "OpenJDK is not installed"
sudo apt purge oracle-java* || echo "Oracle Java is not installed"
sudo apt remove --autoremove adoptopenjdk-* || echo "Adopt open JDK is not installed"
sudo apt-get remove -y ant || echo "ant is not installed"
sudo rm -rf /opt/hostedtoolcache/Java_Adopt_jdk || true
sudo apt-get remove -y podman || echo "Podman is not installed"
sudo apt-get remove -y buildah || echo "Buidah is not installed"
sudo apt-get remove -y esl-erlang || echo "erlang is not installed"
sudo rm -rf /opt/google
sudo rm -rf /usr/share/az* /opt/az || true
sudo rm -rf /opt/microsoft
sudo rm -rf /opt/hostedtoolcache/Ruby
sudo apt-get remove -y swift || echo "swift is not installed"
sudo apt-get remove -y swig || echo "swig is not installed"
sudo apt-get remove -y texinfo || echo "texinfo is not installed"
sudo apt-get remove -y texlive || echo "texlive is not installed"
sudo apt-get remove -y r-base-core r-base || echo "R is not installed"
sudo rm -rf /opt/R
sudo rm -rf /usr/share/R
sudo rm -rf /opt/*.zip
sudo rm -rf /opt/*.tar.gz
sudo rm -rf /usr/share/*.zip
sudo rm -rf /usr/share/*.tar.gz
sudo rm -rf /opt/hhvm
sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo rm -rf /opt/hostedtoolcache/node
sudo apt-get autoremove
- name: Test Oras Python
env:
registry_host: localhost
registry_port: ${{ job.services.registry.ports[5000] }}
REGISTRY_STORAGE_DELETE_ENABLED: "true"
run: |
make install
make test
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ oras.egg-info/
env
__pycache__
.python-version
.venv
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ and **Merged pull requests**. Critical items to know are:
The versions coincide with releases on pip. Only major versions will be released as tags on Github.

## [0.0.x](https://github.com/oras-project/oras-py/tree/main) (0.0.x)
- re-enable chunked upload (0.2.1)
- refactor of auth to be provided by backend modules (0.2.0)
- bugfix maintain requests's verify valorization for all invocations, augment basic auth header to existing headers
- Allow generating a Subject from a pre-existing Manifest (0.1.30)
Expand Down
3 changes: 3 additions & 0 deletions oras/defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ class registry:
# DefaultBlocksize default size of each slice of bytes read in each write through in gunzipand untar.
default_blocksize = 32768

# DefaultChunkSize default size of each chunk when uploading chunked blobs.
default_chunksize = 16777216 # 16MB

# what you get for a blank digest, so we don't need to save and recalculate
blank_hash = "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

Expand Down
48 changes: 39 additions & 9 deletions oras/provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,19 +251,25 @@ def upload_blob(
container: container_type,
layer: dict,
do_chunked: bool = False,
chunk_size: int = oras.defaults.default_chunksize,
) -> requests.Response:
"""
Prepare and upload a blob.
Sizes > 1024 are uploaded via a chunked approach (post, patch+, put)
and <= 1024 is a single post then put.
Large artifacts can be uploaded via a chunked approach (post, patch+, put)
to registries that support it. Larger chunks generally give better throughput.
Set do_chunked=True for chunked upload.
:param blob: path to blob to upload
:type blob: str
:param container: parsed container URI
:type container: oras.container.Container or str
:param layer: dict from oras.oci.NewLayer
:type layer: dict
:param do_chunked: if true do chunked blob upload. This allows upload of larger oci artifacts.
:type do_chunked: bool
:param chunk_size: if true use chunked upload.
:type chunk_size: int
"""
blob = os.path.abspath(blob)
container = self.get_container(container)
Expand All @@ -274,7 +280,12 @@ def upload_blob(
if not do_chunked:
response = self.put_upload(blob, container, layer)
else:
response = self.chunked_upload(blob, container, layer)
response = self.chunked_upload(
blob,
container,
layer,
chunk_size=chunk_size,
)

# If we have an empty layer digest and the registry didn't accept, just return dummy successful response
if (
Expand Down Expand Up @@ -571,6 +582,7 @@ def chunked_upload(
blob: str,
container: oras.container.Container,
layer: dict,
chunk_size: int = oras.defaults.default_chunksize,
) -> requests.Response:
"""
Upload via a chunked upload.
Expand All @@ -581,9 +593,12 @@ def chunked_upload(
:type container: oras.container.Container or str
:param layer: dict from oras.oci.NewLayer
:type layer: dict
:param chunk_size: chunk size in bytes
:type chunk_size: int
"""
# Start an upload session
headers = {"Content-Type": "application/octet-stream", "Content-Length": "0"}
headers.update(self.headers)

upload_url = f"{self.prefix}://{container.upload_blob_url()}"
r = self.do_request(upload_url, "POST", headers=headers)
Expand All @@ -596,24 +611,27 @@ def chunked_upload(
# Read the blob in chunks, for each do a patch
start = 0
with open(blob, "rb") as fd:
for chunk in oras.utils.read_in_chunks(fd):
if not chunk:
break

for chunk in oras.utils.read_in_chunks(fd, chunk_size=chunk_size):
end = start + len(chunk) - 1
content_range = "%s-%s" % (start, end)
headers = {
"Content-Range": content_range,
"Content-Length": str(len(chunk)),
"Content-Type": "application/octet-stream",
}
headers.update(self.headers)

# Important to update with auth token if acquired
# TODO call to auth here
start = end + 1
self._check_200_response(
self.do_request(session_url, "PATCH", data=chunk, headers=headers)
r := self.do_request(
session_url, "PATCH", data=chunk, headers=headers
)
)
session_url = self._get_location(r, container)
if not session_url:
raise ValueError(f"Issue retrieving session url: {r.json()}")

# Finally, issue a PUT request to close blob
session_url = oras.utils.append_url_params(
Expand Down Expand Up @@ -682,6 +700,8 @@ def push(
annotation_file: Optional[str] = None,
manifest_annotations: Optional[dict] = None,
subject: Optional[str] = None,
do_chunked: bool = False,
chunk_size: int = oras.defaults.default_chunksize,
) -> requests.Response:
"""
Push a set of files to a target
Expand All @@ -700,6 +720,10 @@ def push(
:type manifest_annotations: dict
:param target: target location to push to
:type target: str
:param do_chunked: if true do chunked blob upload
:type do_chunked: bool
:param chunk_size: chunk size in bytes
:type chunk_size: int
:param subject: optional subject reference
:type subject: oras.oci.Subject
"""
Expand Down Expand Up @@ -759,7 +783,13 @@ def push(
logger.debug(f"Preparing layer {layer}")

# Upload the blob layer
response = self.upload_blob(blob, container, layer)
response = self.upload_blob(
blob,
container,
layer,
do_chunked=do_chunked,
chunk_size=chunk_size,
)
self._check_200_response(response)

# Do we need to cleanup a temporary targz?
Expand Down
57 changes: 56 additions & 1 deletion oras/tests/test_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
__license__ = "Apache-2.0"

import os
import subprocess
from pathlib import Path

import pytest
Expand All @@ -13,7 +14,7 @@
import oras.provider
import oras.utils

here = os.path.abspath(os.path.dirname(__file__))
here = Path(__file__).resolve().parent


@pytest.mark.with_auth(False)
Expand Down Expand Up @@ -62,6 +63,60 @@ def test_annotated_registry_push(tmp_path, registry, credentials, target):
)


@pytest.mark.with_auth(False)
def test_chunked_push(tmp_path, registry, credentials, target):
"""
Basic tests for oras chunked push
"""
# Direct access to registry functions
client = oras.client.OrasClient(hostname=registry, insecure=True)
artifact = os.path.join(here, "artifact.txt")

assert os.path.exists(artifact)

res = client.push(files=[artifact], target=target, do_chunked=True)
assert res.status_code in [200, 201, 202]

files = client.pull(target, outdir=tmp_path)
assert str(tmp_path / "artifact.txt") in files
assert oras.utils.get_file_hash(artifact) == oras.utils.get_file_hash(files[0])

# large file upload
base_size = oras.defaults.default_chunksize * 1024 # 16GB
tmp_chunked = here / "chunked"
try:
subprocess.run(
[
"dd",
"if=/dev/null",
f"of={tmp_chunked}",
"bs=1",
"count=0",
f"seek={base_size}",
],
)

res = client.push(
files=[tmp_chunked],
target=target,
do_chunked=True,
)
assert res.status_code in [200, 201, 202]

files = client.pull(target, outdir=tmp_path / "download")
download = str(tmp_path / "download/chunked")
assert download in files
assert oras.utils.get_file_hash(str(tmp_chunked)) == oras.utils.get_file_hash(
download
)
finally:
tmp_chunked.unlink()

# File that doesn't exist
with pytest.raises(FileNotFoundError):
res = client.push(files=[tmp_path / "none"], target=target)


def test_parse_manifest(registry):
"""
Test parse manifest function.
Expand Down
2 changes: 1 addition & 1 deletion oras/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
__copyright__ = "Copyright The ORAS Authors."
__license__ = "Apache-2.0"

__version__ = "0.2.0"
__version__ = "0.2.1"
AUTHOR = "Vanessa Sochat"
EMAIL = "vsoch@users.noreply.github.com"
NAME = "oras"
Expand Down
4 changes: 2 additions & 2 deletions scripts/test.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#!/bin/bash

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd $DIR/../

# Ensure envars are defined - expected registry port and host
export ORAS_PORT=5000
export ORAS_PORT=${ORAS_PORT:-5000}
export ORAS_HOST=localhost
export ORAS_REGISTRY=${ORAS_HOST}:${ORAS_PORT}
export ORAS_USER=myuser
Expand Down

0 comments on commit dfc2415

Please sign in to comment.