-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metadata: fix tests not running, fix failing non-running tests, fix validate base images exist #33127
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Current dependencies on/for this PR: This stack of pull requests is managed by Graphite. |
dcd634e
to
c0a2868
Compare
2e0d3fd
to
e8a3b64
Compare
8569163
to
eb5b9cc
Compare
eb5b9cc
to
74e3bcd
Compare
@@ -151,7 +151,7 @@ def validate_metadata_base_images_in_dockerhub( | |||
) -> ValidationResult: | |||
metadata_definition_dict = metadata_definition.dict() | |||
|
|||
image_address = get(metadata_definition_dict, "data.connectorOptions.baseImage") | |||
image_address = get(metadata_definition_dict, "data.connectorBuildOptions.baseImage") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❗ this fixes the behavior of the validator - previously it wasn't checking the right tag. This means that we have not been running this validation, and should probably do a run on all connectors on this branch/after this is merged to make sure that no connectors currently reference invalid base images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! I don't think running full validation is necessary: if the value of this field is incorrect the connector build will fail, so we'd probably catch this error on publish failure anyway.
|
||
# If folder_name has subdirectories, os.walk will return a list of tuples, | ||
# one for folder_name and one for each of its subdirectories. | ||
fixture_files = [] | ||
for root, dirs, files in os.walk(file_path): | ||
return [os.path.join(root, file_name) for file_name in files] | ||
fixture_files.extend(os.path.join(root, file_name) for file_name in files) | ||
return fixture_files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes an issue where cleaning up our fixtures by adding subdirectories actually led to no files being found for the fixture - we weren't looking in the subdirectories for files because we early returned after grabbing the (non-nested) files from the directory itself.
|
||
|
||
@pytest.fixture(scope="session") | ||
def valid_metadata_upload_files() -> List[str]: | ||
return list_all_paths_in_fixture_directory("metadata_upload/valid") | ||
files = list_all_paths_in_fixture_directory("metadata_upload/valid") | ||
assert len(files) > 0, "No files found in metadata_upload/valid" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would have caught the issue of tests not being run. The test iterated over an empty list and therefore said it was green.
If we instead use pytest.mark.parameterize
to get similar behavior, we could still run into issues where the test is green when it's skipping things, but it'd be a lot more obvious since we get a different test listed and pass/fail for each test case.
It would likely be overkill for test_upload_metadata_to_gcs_valid_metadata
since we already have 12(?) different test cases we run for every valid metadata file, but for the invalid ones which are just one test, it could work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that in the case that the dir has both files and subdirs with files, this would not validate that we haven't missed the subdir's files. Maybe there's a better way to check this, but I wanted to add some sanity checking.
@@ -13,7 +13,7 @@ data: | |||
normalizationConfig: | |||
normalizationIntegrationType: postgres | |||
normalizationRepository: airbyte/exists-2 | |||
dockerImageTag: 6.6.6 # tag does not exist | |||
normalizationTag: 6.6.6 # tag does not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This previously failed for the wrong reason - a ValidationError
because the normalizationTag
field is missing. We want it to fail because the image itself does not exist.
metadata_file_path, | ||
validator_opts=ValidatorOptions(docs_path=DOCS_PATH), | ||
) | ||
print(f"Upload raised {exc_info.value}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the "testing for " and this print to help debug to make sure that tests were actually raising the errors i expected them to. example output:
Testing upload of valid metadata file with invalid docker image: /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/tag_nonexistent/metadata_normalization_image_tag_does_not_exist.yaml
Running validator: validate_all_tags_are_keyvalue_pairs
Running validator: validate_at_least_one_language_tag
Running validator: validate_major_version_bump_has_breaking_change_entry
Running validator: validate_docs_path_exists
Running validator: validate_metadata_base_images_in_dockerhub
Running validator: validate_metadata_images_in_dockerhub
Checking that the following images are on dockerhub: [('airbyte/exists-2', '6.6.6'), ('airbyte/exists-3', '0.0.1'), ('airbyte/exists-1', '0.0.1'), ('airbyte/exists-4', '0.0.1')]
Upload raised Metadata file /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/tag_nonexistent/metadata_normalization_image_tag_does_not_exist.yaml is invalid for uploading: Validation error: Image airbyte/exists-2:6.6.6 does not exist in DockerHub
Testing upload of valid metadata file with invalid docker image: /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/valid_overrides_but_image_nonexistent/metadata_main_repo_does_not_exist_but_is_overrode.yaml
Running validator: validate_all_tags_are_keyvalue_pairs
Running validator: validate_at_least_one_language_tag
Running validator: validate_major_version_bump_has_breaking_change_entry
Running validator: validate_docs_path_exists
Running validator: validate_metadata_base_images_in_dockerhub
Running validator: validate_metadata_images_in_dockerhub
Checking that the following images are on dockerhub: [('airbyte/exists-2', '0.0.1'), ('airbyte/exists-3', '0.0.1'), ('airbyte/does-not-exist-1', '0.0.1'), ('airbyte/exists-4', '0.0.1')]
Upload raised Metadata file /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/valid_overrides_but_image_nonexistent/metadata_main_repo_does_not_exist_but_is_overrode.yaml is invalid for uploading: Validation error: Image airbyte/does-not-exist-1:0.0.1 does not exist in DockerHub
This output is only from when I add the -s
flag. So I don't think this will clog up our CI/airbyte-ci logs
# Mock dockerhub | ||
mocker.patch("metadata_service.validators.metadata_validator.is_image_on_docker_hub", side_effect=stub_is_image_on_docker_hub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are properly running the base image docker check on validate, not only on upload, we need to mock dockerhub in this test too
def stub_is_image_on_docker_hub(image_name: str, version: str, digest: Optional[str] = None, retries: int = 0, wait_sec: int = 30) -> bool: | ||
image_exists = all(["exists" in image_name, version not in MOCK_VERSIONS_THAT_DO_NOT_EXIST, digest is None or "exists" in digest]) | ||
return image_exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add stub for digests so that we can test that it fails if the digest is invalid, in addition to the other ways the image could fail.
@@ -6,7 +6,7 @@ data: | |||
hosts: | |||
- zopim.com | |||
connectorBuildOptions: | |||
baseImage: docker.io/airbyte/python-connector-base:1.1.0@sha256:bd98f6505c6764b1b5f99d3aedc23dfc9e9af631a62533f60eb32b1d3dbab20c | |||
baseImage: docker.io/airbyte/base-repo-exists:1.1.0@sha256:shathatexists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed all of these to the format we use to either indicate that the piece should exist or not.
I think we should probably refactor the way we do this for both images and SHAs (and also probably docs) such that anything exists by default, unless DOESNOTEXIST
or something similar is in it. That way, the files can look more like metadata and require less edits when we make new examples. But I'd do that later as it would touch a bunch of files.
8c9fff4
to
8fe5560
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the valid_overrides_but_base_image_nonexistent
folder was confusing as it referred to the high-level dockerImageTag
not existing. Now that we also have base image validations, changed it to valid_overrides_but_main_image_nonexistent
and the filenames within accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test failed for the wrong reason - it was supposed to clock that there is no breaking change for version 2.0.0, but its actually incorrectly formatted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New test to cover the case I uncovered where a breaking change test was failing for the wrong reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a breaking change format test that i don't think we have covered!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New test: if there's no sha, it's invalid. Our validator seems to enforce this in
try:
image_name, tag_with_sha_prefix, digest = image_address.split(":")
# As we query the DockerHub API we need to remove the docker.io prefix
image_name = image_name.replace("docker.io/", "")
except ValueError:
return False, f"Image {image_address} is not in the format <image>:<tag>@<sha>"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor comments, thanks for fixing this!
@@ -151,7 +151,7 @@ def validate_metadata_base_images_in_dockerhub( | |||
) -> ValidationResult: | |||
metadata_definition_dict = metadata_definition.dict() | |||
|
|||
image_address = get(metadata_definition_dict, "data.connectorOptions.baseImage") | |||
image_address = get(metadata_definition_dict, "data.connectorBuildOptions.baseImage") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! I don't think running full validation is necessary: if the value of this field is incorrect the connector build will fail, so we'd probably catch this error on publish failure anyway.
airbyte-ci/connectors/metadata_service/lib/tests/fixtures/__init__.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
def stub_is_image_on_docker_hub(image_name: str, version: str) -> bool: | ||
return "exists" in image_name and version not in MOCK_VERSIONS_THAT_DO_NOT_EXIST | ||
def stub_is_image_on_docker_hub(image_name: str, version: str, digest: Optional[str] = None, retries: int = 0, wait_sec: int = 30) -> bool: | ||
image_exists = all(["exists" in image_name, version not in MOCK_VERSIONS_THAT_DO_NOT_EXIST, digest is None or "exists" in digest]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the "does not exist" case use 0.0.0
version.
It might be more robust. I can easily make a typo like doesnotexists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into a better does not exist version number combo (i think we still need one specific to breaking changes) before merging this.
on the "exists" - I agree. Mentioned a suggestion here, but will put that off to a separate pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
went with 99.99.99
and 0.0.0
. Not perfect but definitely sticks out more than 6.6.6
Fixed the shas in this one and will fix the images separately
8fe5560
to
b513577
Compare
…alidate base images exist (airbytehq#33127) Co-authored-by: erohmensing <erohmensing@users.noreply.github.com>
Close #33197
An entire rabbit hole of little fixes.
These fixes are all very coupled, and I couldn't split them while keeping CI green. I've written comments in the diff which explains why each change was made.
A summary of the rabbit hole (if I remember correctly/can grok my commit history:
test_upload_metadata_to_gcs_invalid_docker_images
"ran", but didn't actually run on any files. The fixture was empty!Plus some other nonsense I ran into along the way