Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata: fix tests not running, fix failing non-running tests, fix validate base images exist #33127

Merged
merged 23 commits into from
Dec 6, 2023

Conversation

erohmensing
Copy link
Contributor

@erohmensing erohmensing commented Dec 5, 2023

Close #33197

An entire rabbit hole of little fixes.

These fixes are all very coupled, and I couldn't split them while keeping CI green. I've written comments in the diff which explains why each change was made.

A summary of the rabbit hole (if I remember correctly/can grok my commit history:

  1. Noticed that test_upload_metadata_to_gcs_invalid_docker_images "ran", but didn't actually run on any files. The fixture was empty!
  2. Fixed the issue and got our fixtures back. The test itself doesn't run
  3. Fixed an error in the test, now they are running. An old test case is failing for a different reason than it should
  4. Fix that. Run into some weird errors! It's because some test cases are succeeding when they should be failing.
  5. Add bettor error handling for when test cases succeed when they should fail.
  6. Look into those that are succeeding when they should fail. None of the invalid base_image checks are invalid! Looks like we're not checking the correct property when looking for the base image.
  7. Fix that
  8. Fix the stub that got out of sync with the method it stubs, add extra stubbing for the digest and change the invalid data files accordingly
  9. Get docker auth errors, realize that we now need to stub docker in validate in addition to upload, since we are calling the dockerhub api in pre-upload validations
  10. move the invalid base image checks to the validate fixtures since they should fail early.

Plus some other nonsense I ran into along the way

Copy link

vercel bot commented Dec 5, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Dec 6, 2023 9:15pm

@erohmensing erohmensing force-pushed the ella/fix-base-image-metadata-tests branch from dcd634e to c0a2868 Compare December 5, 2023 22:57
@erohmensing erohmensing changed the base branch from ella/fix-metadata-tests to ella/mock-doc-url December 5, 2023 22:57
@erohmensing erohmensing force-pushed the ella/fix-base-image-metadata-tests branch from 8569163 to eb5b9cc Compare December 5, 2023 23:50
Base automatically changed from ella/mock-doc-url to master December 6, 2023 00:08
@erohmensing erohmensing force-pushed the ella/fix-base-image-metadata-tests branch from eb5b9cc to 74e3bcd Compare December 6, 2023 00:10
@@ -151,7 +151,7 @@ def validate_metadata_base_images_in_dockerhub(
) -> ValidationResult:
metadata_definition_dict = metadata_definition.dict()

image_address = get(metadata_definition_dict, "data.connectorOptions.baseImage")
image_address = get(metadata_definition_dict, "data.connectorBuildOptions.baseImage")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❗ this fixes the behavior of the validator - previously it wasn't checking the right tag. This means that we have not been running this validation, and should probably do a run on all connectors on this branch/after this is merged to make sure that no connectors currently reference invalid base images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! I don't think running full validation is necessary: if the value of this field is incorrect the connector build will fail, so we'd probably catch this error on publish failure anyway.

Comment on lines +10 to +16

# If folder_name has subdirectories, os.walk will return a list of tuples,
# one for folder_name and one for each of its subdirectories.
fixture_files = []
for root, dirs, files in os.walk(file_path):
return [os.path.join(root, file_name) for file_name in files]
fixture_files.extend(os.path.join(root, file_name) for file_name in files)
return fixture_files
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes an issue where cleaning up our fixtures by adding subdirectories actually led to no files being found for the fixture - we weren't looking in the subdirectories for files because we early returned after grabbing the (non-nested) files from the directory itself.



@pytest.fixture(scope="session")
def valid_metadata_upload_files() -> List[str]:
return list_all_paths_in_fixture_directory("metadata_upload/valid")
files = list_all_paths_in_fixture_directory("metadata_upload/valid")
assert len(files) > 0, "No files found in metadata_upload/valid"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have caught the issue of tests not being run. The test iterated over an empty list and therefore said it was green.

If we instead use pytest.mark.parameterize to get similar behavior, we could still run into issues where the test is green when it's skipping things, but it'd be a lot more obvious since we get a different test listed and pass/fail for each test case.

It would likely be overkill for test_upload_metadata_to_gcs_valid_metadata since we already have 12(?) different test cases we run for every valid metadata file, but for the invalid ones which are just one test, it could work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that in the case that the dir has both files and subdirs with files, this would not validate that we haven't missed the subdir's files. Maybe there's a better way to check this, but I wanted to add some sanity checking.

@@ -13,7 +13,7 @@ data:
normalizationConfig:
normalizationIntegrationType: postgres
normalizationRepository: airbyte/exists-2
dockerImageTag: 6.6.6 # tag does not exist
normalizationTag: 6.6.6 # tag does not exist
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This previously failed for the wrong reason - a ValidationError because the normalizationTag field is missing. We want it to fail because the image itself does not exist.

metadata_file_path,
validator_opts=ValidatorOptions(docs_path=DOCS_PATH),
)
print(f"Upload raised {exc_info.value}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the "testing for " and this print to help debug to make sure that tests were actually raising the errors i expected them to. example output:


Testing upload of valid metadata file with invalid docker image: /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/tag_nonexistent/metadata_normalization_image_tag_does_not_exist.yaml
Running validator: validate_all_tags_are_keyvalue_pairs
Running validator: validate_at_least_one_language_tag
Running validator: validate_major_version_bump_has_breaking_change_entry
Running validator: validate_docs_path_exists
Running validator: validate_metadata_base_images_in_dockerhub
Running validator: validate_metadata_images_in_dockerhub
Checking that the following images are on dockerhub: [('airbyte/exists-2', '6.6.6'), ('airbyte/exists-3', '0.0.1'), ('airbyte/exists-1', '0.0.1'), ('airbyte/exists-4', '0.0.1')]
Upload raised Metadata file /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/tag_nonexistent/metadata_normalization_image_tag_does_not_exist.yaml is invalid for uploading: Validation error: Image airbyte/exists-2:6.6.6 does not exist in DockerHub

Testing upload of valid metadata file with invalid docker image: /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/valid_overrides_but_image_nonexistent/metadata_main_repo_does_not_exist_but_is_overrode.yaml
Running validator: validate_all_tags_are_keyvalue_pairs
Running validator: validate_at_least_one_language_tag
Running validator: validate_major_version_bump_has_breaking_change_entry
Running validator: validate_docs_path_exists
Running validator: validate_metadata_base_images_in_dockerhub
Running validator: validate_metadata_images_in_dockerhub
Checking that the following images are on dockerhub: [('airbyte/exists-2', '0.0.1'), ('airbyte/exists-3', '0.0.1'), ('airbyte/does-not-exist-1', '0.0.1'), ('airbyte/exists-4', '0.0.1')]
Upload raised Metadata file /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/valid_overrides_but_image_nonexistent/metadata_main_repo_does_not_exist_but_is_overrode.yaml is invalid for uploading: Validation error: Image airbyte/does-not-exist-1:0.0.1 does not exist in DockerHub

This output is only from when I add the -s flag. So I don't think this will clog up our CI/airbyte-ci logs

Comment on lines +354 to +355
# Mock dockerhub
mocker.patch("metadata_service.validators.metadata_validator.is_image_on_docker_hub", side_effect=stub_is_image_on_docker_hub)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are properly running the base image docker check on validate, not only on upload, we need to mock dockerhub in this test too

Comment on lines 25 to 27
def stub_is_image_on_docker_hub(image_name: str, version: str, digest: Optional[str] = None, retries: int = 0, wait_sec: int = 30) -> bool:
image_exists = all(["exists" in image_name, version not in MOCK_VERSIONS_THAT_DO_NOT_EXIST, digest is None or "exists" in digest])
return image_exists
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add stub for digests so that we can test that it fails if the digest is invalid, in addition to the other ways the image could fail.

@@ -6,7 +6,7 @@ data:
hosts:
- zopim.com
connectorBuildOptions:
baseImage: docker.io/airbyte/python-connector-base:1.1.0@sha256:bd98f6505c6764b1b5f99d3aedc23dfc9e9af631a62533f60eb32b1d3dbab20c
baseImage: docker.io/airbyte/base-repo-exists:1.1.0@sha256:shathatexists
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all of these to the format we use to either indicate that the piece should exist or not.

I think we should probably refactor the way we do this for both images and SHAs (and also probably docs) such that anything exists by default, unless DOESNOTEXIST or something similar is in it. That way, the files can look more like metadata and require less edits when we make new examples. But I'd do that later as it would touch a bunch of files.

@erohmensing erohmensing marked this pull request as ready for review December 6, 2023 01:22
@erohmensing erohmensing requested a review from a team December 6, 2023 01:22
@erohmensing erohmensing force-pushed the ella/fix-base-image-metadata-tests branch from 8c9fff4 to 8fe5560 Compare December 6, 2023 01:23
@erohmensing erohmensing changed the title fix tests that weren't running on master, then fix tests that were failing metadata: fix tests not running, fix failing non-running tests, fix validate base images exist Dec 6, 2023
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the valid_overrides_but_base_image_nonexistent folder was confusing as it referred to the high-level dockerImageTag not existing. Now that we also have base image validations, changed it to valid_overrides_but_main_image_nonexistent and the filenames within accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test failed for the wrong reason - it was supposed to clock that there is no breaking change for version 2.0.0, but its actually incorrectly formatted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test to cover the case I uncovered where a breaking change test was failing for the wrong reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a breaking change format test that i don't think we have covered!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test: if there's no sha, it's invalid. Our validator seems to enforce this in

    try:
        image_name, tag_with_sha_prefix, digest = image_address.split(":")
        # As we query the DockerHub API we need to remove the docker.io prefix
        image_name = image_name.replace("docker.io/", "")
    except ValueError:
        return False, f"Image {image_address} is not in the format <image>:<tag>@<sha>"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments, thanks for fixing this!

@@ -151,7 +151,7 @@ def validate_metadata_base_images_in_dockerhub(
) -> ValidationResult:
metadata_definition_dict = metadata_definition.dict()

image_address = get(metadata_definition_dict, "data.connectorOptions.baseImage")
image_address = get(metadata_definition_dict, "data.connectorBuildOptions.baseImage")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! I don't think running full validation is necessary: if the value of this field is incorrect the connector build will fail, so we'd probably catch this error on publish failure anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

def stub_is_image_on_docker_hub(image_name: str, version: str) -> bool:
return "exists" in image_name and version not in MOCK_VERSIONS_THAT_DO_NOT_EXIST
def stub_is_image_on_docker_hub(image_name: str, version: str, digest: Optional[str] = None, retries: int = 0, wait_sec: int = 30) -> bool:
image_exists = all(["exists" in image_name, version not in MOCK_VERSIONS_THAT_DO_NOT_EXIST, digest is None or "exists" in digest])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the "does not exist" case use 0.0.0 version.
It might be more robust. I can easily make a typo like doesnotexists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into a better does not exist version number combo (i think we still need one specific to breaking changes) before merging this.

on the "exists" - I agree. Mentioned a suggestion here, but will put that off to a separate pr

Copy link
Contributor Author

@erohmensing erohmensing Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went with 99.99.99 and 0.0.0. Not perfect but definitely sticks out more than 6.6.6

Fixed the shas in this one and will fix the images separately

@erohmensing erohmensing force-pushed the ella/fix-base-image-metadata-tests branch from 8fe5560 to b513577 Compare December 6, 2023 20:15
@erohmensing erohmensing enabled auto-merge (squash) December 6, 2023 21:16
@erohmensing erohmensing merged commit 62b2020 into master Dec 6, 2023
17 checks passed
@erohmensing erohmensing deleted the ella/fix-base-image-metadata-tests branch December 6, 2023 21:22
rishabh-cldcvr pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Dec 14, 2023
…alidate base images exist (airbytehq#33127)

Co-authored-by: erohmensing <erohmensing@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix metadata lib tests
2 participants