Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement persistent bucket fixtures for integration tests #301

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/+test_with_persistent_bucket.changed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve internal testing infrastructure by updating integration tests to use persistent buckets.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as indicated in #301 (comment) this should go under infrastracture not changed category as it is not changing the exposed API of b2 CLI tool. The CI&tests are not even part of the releases.

10 changes: 8 additions & 2 deletions test/integration/cleanup_buckets.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,16 @@
# License https://www.backblaze.com/using_b2_code.html
#
######################################################################
from .persistent_bucket import get_or_create_persistent_bucket


def test_cleanup_buckets(b2_api):
# this is not a test, but it is intended to be called
# via pytest because it reuses fixtures which have everything
# set up
pass # b2_api calls b2_api.clean_buckets() in its finalizer
# set up.
pass
# The persistent bucket is cleared manually now and not
# when tests tear down, as otherwise we'd lose the main benefit
# of a persistent bucket, whose identity is shared across tests.
persistent_bucket = get_or_create_persistent_bucket(b2_api)
b2_api.clean_bucket(persistent_bucket)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this break concurrently run tests?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why lifecycle rules was suggested instead to cleanup in the first place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cleans the buckets once, before the tests run.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so what will happen if I someone opens a second PR when one is being tested?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same thing that would happen if no changes were introduced—such a scenario disrupts the test bucket lifecycle, persistent or not. Question is, is it frequent enough to warrant addressing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bucket cleanup process only removed stale buckets, not all of them, so before we did support concurrent GHA jobs, and this change breaks it, and for no reason AFAIK, since the solution is to simply to leave that bucket alone forever.

22 changes: 22 additions & 0 deletions test/integration/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import subprocess
import sys
import tempfile
import uuid
from os import environ, path
from tempfile import TemporaryDirectory

Expand All @@ -31,6 +32,10 @@

from ..helpers import b2_uri_args_v3, b2_uri_args_v4
from .helpers import NODE_DESCRIPTION, RNG_SEED, Api, CommandLine, bucket_name_part, random_token
from .persistent_bucket import (
PersistentBucketAggregate,
get_or_create_persistent_bucket,
)

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -402,3 +407,20 @@ def b2_uri_args(apiver_int):
return b2_uri_args_v4
else:
return b2_uri_args_v3


# -- Persistent bucket fixtures --
@pytest.fixture
def unique_subfolder():
subfolder = f"test-{uuid.uuid4().hex[:8]}"
yield subfolder


@pytest.fixture
def persistent_bucket(unique_subfolder, b2_api) -> PersistentBucketAggregate:
"""
Since all consumers of the `bucket_name` fixture expect a new bucket to be created,
we need to mirror this behavior by appending a unique subfolder to the persistent bucket name.
"""
persistent_bucket = get_or_create_persistent_bucket(b2_api)
yield PersistentBucketAggregate(persistent_bucket.name, unique_subfolder)
5 changes: 2 additions & 3 deletions test/integration/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,6 @@ def _should_remove_bucket(self, bucket: Bucket) -> tuple[bool, str]:
def clean_buckets(self, quick=False):
# even with use_cache=True, if cache is empty API call will be made
buckets = self.api.list_buckets(use_cache=quick)
print('Total bucket count:', len(buckets))
remaining_buckets = []
for bucket in buckets:
should_remove, why = self._should_remove_bucket(bucket)
Expand Down Expand Up @@ -539,9 +538,9 @@ def reauthorize(self, check_key_capabilities=False):
} - private_preview_caps - set(auth_dict['allowed']['capabilities'])
assert not missing_capabilities, f'it appears that the raw_api integration test is being run with a non-full key. Missing capabilities: {missing_capabilities}'

def list_file_versions(self, bucket_name):
def list_file_versions(self, bucket_name, path=''):
return self.should_succeed_json(
['ls', '--json', '--recursive', '--versions', *self.b2_uri_args(bucket_name)]
['ls', '--json', '--recursive', '--versions', *self.b2_uri_args(bucket_name, path)]
)

def cleanup_buckets(self, buckets: dict[str, dict | None]) -> None:
Expand Down
69 changes: 69 additions & 0 deletions test/integration/persistent_bucket.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
######################################################################
#
# File: test/integration/persistent_bucket.py
#
# Copyright 2024 Backblaze Inc. All Rights Reserved.
#
# License https://www.backblaze.com/using_b2_code.html
#
######################################################################
import hashlib
import os
from dataclasses import dataclass
from functools import cached_property
from test.integration.helpers import BUCKET_NAME_LENGTH, Api

import backoff
from b2sdk.v2 import Bucket
from b2sdk.v2.exception import DuplicateBucketName, NonExistentBucket

PERSISTENT_BUCKET_NAME_PREFIX = "constst"


@dataclass
class PersistentBucketAggregate:
bucket_name: str
subfolder: str

@cached_property
def virtual_bucket_name(self):
return f"{self.bucket_name}/{self.subfolder}"


@backoff.on_exception(backoff.expo, Exception, max_tries=3, max_time=10)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the backoff here for? the retry mechanism built in in b2sdk doesn't work properly?

def delete_files(bucket: Bucket, subfolder: str):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def delete_files(bucket: Bucket, subfolder: str):
def delete_files(bucket: Bucket, subfolder: str | None = None):

seems like these two methods should be combined

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me like these calls are going:
a) slow down tests (so +++ cost on development side & GHA)
b) incur extra cost on B2 API side - but I guess that doesn't matter much for us

While simply waiting for lifecycle rules to do their thing, also means some extra B2 storage costs, but is much faster since we have to do nothing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the exec time and the number of buckets used dropped as compared to the state before, I was clearly overcorrecting by clearing the subfolders between cases, trying to fully mimic the behavior of recreated buckets. Clearly redundant.

for file_version, _ in bucket.ls(recursive=True, folder_to_list=subfolder):
bucket.delete_file_version(file_version.id_, file_version.file_name)


def get_persistent_bucket_name(b2_api: Api) -> str:
bucket_base = os.environ.get("GITHUB_REPOSITORY_ID", b2_api.account_id)
bucket_hash = hashlib.sha256(bucket_base.encode()).hexdigest()
return f"{PERSISTENT_BUCKET_NAME_PREFIX}-{bucket_hash}" [:BUCKET_NAME_LENGTH]


@backoff.on_exception(
backoff.expo,
DuplicateBucketName,
max_tries=3,
jitter=backoff.full_jitter,
)
def get_or_create_persistent_bucket(b2_api: Api) -> Bucket:
bucket_name = get_persistent_bucket_name(b2_api)
try:
bucket = b2_api.api.get_bucket_by_name(bucket_name)
except NonExistentBucket:
bucket = b2_api.api.create_bucket(
bucket_name,
bucket_type="allPublic",
mjurbanski-reef marked this conversation as resolved.
Show resolved Hide resolved
lifecycle_rules=[
{
"daysFromHidingToDeleting": 1,
"daysFromUploadingToHiding": 1,
"fileNamePrefix": "",
}
],
)
# add the new bucket name to the list of bucket names
b2_api.bucket_name_log.append(bucket_name)
return bucket
Loading