metadata: fix tests not running, fix failing non-running tests, fix validate base images exist #33127

erohmensing · 2023-12-05T22:56:44Z

An entire rabbit hole of little fixes.

These fixes are all very coupled, and I couldn't split them while keeping CI green. I've written comments in the diff which explains why each change was made.

A summary of the rabbit hole (if I remember correctly/can grok my commit history:

Noticed that test_upload_metadata_to_gcs_invalid_docker_images "ran", but didn't actually run on any files. The fixture was empty!
Fixed the issue and got our fixtures back. The test itself doesn't run
Fixed an error in the test, now they are running. An old test case is failing for a different reason than it should
Fix that. Run into some weird errors! It's because some test cases are succeeding when they should be failing.
Add bettor error handling for when test cases succeed when they should fail.
Look into those that are succeeding when they should fail. None of the invalid base_image checks are invalid! Looks like we're not checking the correct property when looking for the base image.
Fix that
Fix the stub that got out of sync with the method it stubs, add extra stubbing for the digest and change the invalid data files accordingly
Get docker auth errors, realize that we now need to stub docker in validate in addition to upload, since we are calling the dockerhub api in pre-upload validations
move the invalid base image checks to the validate fixtures since they should fail early.

Plus some other nonsense I ran into along the way

vercel · 2023-12-05T22:56:48Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	⬜️ Ignored (Inspect)	Visit Preview		Dec 6, 2023 9:15pm

erohmensing · 2023-12-05T22:56:59Z

Current dependencies on/for this PR:

master
- PR fix references to metadata_service changed files #33125
  - PR fix failing metadata tests on master #33126
    - PR change mock doc url to look more mock-like #33128
      - PR metadata: fix tests not running, fix failing non-running tests, fix validate base images exist #33127 👈
        
        PR re-organize metadata test fixtures #33134

This stack of pull requests is managed by Graphite.

erohmensing · 2023-12-06T00:43:25Z

airbyte-ci/connectors/metadata_service/lib/metadata_service/validators/metadata_validator.py

@@ -151,7 +151,7 @@ def validate_metadata_base_images_in_dockerhub(
 ) -> ValidationResult:
    metadata_definition_dict = metadata_definition.dict()

-    image_address = get(metadata_definition_dict, "data.connectorOptions.baseImage")
+    image_address = get(metadata_definition_dict, "data.connectorBuildOptions.baseImage")


❗ this fixes the behavior of the validator - previously it wasn't checking the right tag. This means that we have not been running this validation, and should probably do a run on all connectors on this branch/after this is merged to make sure that no connectors currently reference invalid base images.

nice catch! I don't think running full validation is necessary: if the value of this field is incorrect the connector build will fail, so we'd probably catch this error on publish failure anyway.

erohmensing · 2023-12-06T00:44:37Z

airbyte-ci/connectors/metadata_service/lib/tests/fixtures/__init__.py

+
+    # If folder_name has subdirectories, os.walk will return a list of tuples,
+    # one for folder_name and one for each of its subdirectories.
+    fixture_files = []
    for root, dirs, files in os.walk(file_path):
-        return [os.path.join(root, file_name) for file_name in files]
+        fixture_files.extend(os.path.join(root, file_name) for file_name in files)
+    return fixture_files


This fixes an issue where cleaning up our fixtures by adding subdirectories actually led to no files being found for the fixture - we weren't looking in the subdirectories for files because we early returned after grabbing the (non-nested) files from the directory itself.

erohmensing · 2023-12-06T00:48:01Z

airbyte-ci/connectors/metadata_service/lib/tests/fixtures/__init__.py



 @pytest.fixture(scope="session")
 def valid_metadata_upload_files() -> List[str]:
-    return list_all_paths_in_fixture_directory("metadata_upload/valid")
+    files = list_all_paths_in_fixture_directory("metadata_upload/valid")
+    assert len(files) > 0, "No files found in metadata_upload/valid"


This would have caught the issue of tests not being run. The test iterated over an empty list and therefore said it was green.

If we instead use pytest.mark.parameterize to get similar behavior, we could still run into issues where the test is green when it's skipping things, but it'd be a lot more obvious since we get a different test listed and pass/fail for each test case.

It would likely be overkill for test_upload_metadata_to_gcs_valid_metadata since we already have 12(?) different test cases we run for every valid metadata file, but for the invalid ones which are just one test, it could work.

Also note that in the case that the dir has both files and subdirs with files, this would not validate that we haven't missed the subdir's files. Maybe there's a better way to check this, but I wanted to add some sanity checking.

erohmensing · 2023-12-06T00:51:29Z

...metadata_upload/invalid/tag_nonexistent/metadata_normalization_image_tag_does_not_exist.yaml

@@ -13,7 +13,7 @@ data:
  normalizationConfig:
    normalizationIntegrationType: postgres
    normalizationRepository: airbyte/exists-2
-    dockerImageTag: 6.6.6 # tag does not exist
+    normalizationTag: 6.6.6 # tag does not exist


This previously failed for the wrong reason - a ValidationError because the normalizationTag field is missing. We want it to fail because the image itself does not exist.

erohmensing · 2023-12-06T00:58:13Z

airbyte-ci/connectors/metadata_service/lib/tests/test_gcs_upload.py

+                metadata_file_path,
+                validator_opts=ValidatorOptions(docs_path=DOCS_PATH),
+            )
+        print(f"Upload raised {exc_info.value}")


Used the "testing for " and this print to help debug to make sure that tests were actually raising the errors i expected them to. example output:

Testing upload of valid metadata file with invalid docker image: /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/tag_nonexistent/metadata_normalization_image_tag_does_not_exist.yaml Running validator: validate_all_tags_are_keyvalue_pairs Running validator: validate_at_least_one_language_tag Running validator: validate_major_version_bump_has_breaking_change_entry Running validator: validate_docs_path_exists Running validator: validate_metadata_base_images_in_dockerhub Running validator: validate_metadata_images_in_dockerhub Checking that the following images are on dockerhub: [('airbyte/exists-2', '6.6.6'), ('airbyte/exists-3', '0.0.1'), ('airbyte/exists-1', '0.0.1'), ('airbyte/exists-4', '0.0.1')] Upload raised Metadata file /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/tag_nonexistent/metadata_normalization_image_tag_does_not_exist.yaml is invalid for uploading: Validation error: Image airbyte/exists-2:6.6.6 does not exist in DockerHub Testing upload of valid metadata file with invalid docker image: /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/valid_overrides_but_image_nonexistent/metadata_main_repo_does_not_exist_but_is_overrode.yaml Running validator: validate_all_tags_are_keyvalue_pairs Running validator: validate_at_least_one_language_tag Running validator: validate_major_version_bump_has_breaking_change_entry Running validator: validate_docs_path_exists Running validator: validate_metadata_base_images_in_dockerhub Running validator: validate_metadata_images_in_dockerhub Checking that the following images are on dockerhub: [('airbyte/exists-2', '0.0.1'), ('airbyte/exists-3', '0.0.1'), ('airbyte/does-not-exist-1', '0.0.1'), ('airbyte/exists-4', '0.0.1')] Upload raised Metadata file /Users/ella/airbytehq/airbyte/airbyte-ci/connectors/metadata_service/lib/tests/fixtures/metadata_upload/invalid/valid_overrides_but_image_nonexistent/metadata_main_repo_does_not_exist_but_is_overrode.yaml is invalid for uploading: Validation error: Image airbyte/does-not-exist-1:0.0.1 does not exist in DockerHub

This output is only from when I add the -s flag. So I don't think this will clog up our CI/airbyte-ci logs

erohmensing · 2023-12-06T00:59:57Z

airbyte-ci/connectors/metadata_service/lib/tests/test_gcs_upload.py

+    # Mock dockerhub
+    mocker.patch("metadata_service.validators.metadata_validator.is_image_on_docker_hub", side_effect=stub_is_image_on_docker_hub)


Now that we are properly running the base image docker check on validate, not only on upload, we need to mock dockerhub in this test too

erohmensing · 2023-12-06T01:02:14Z

airbyte-ci/connectors/metadata_service/lib/tests/test_gcs_upload.py

+def stub_is_image_on_docker_hub(image_name: str, version: str, digest: Optional[str] = None, retries: int = 0, wait_sec: int = 30) -> bool:
+    image_exists = all(["exists" in image_name, version not in MOCK_VERSIONS_THAT_DO_NOT_EXIST, digest is None or "exists" in digest])
+    return image_exists


add stub for digests so that we can test that it fails if the digest is invalid, in addition to the other ways the image could fail.

erohmensing · 2023-12-06T01:06:53Z

...rs/metadata_service/lib/tests/fixtures/metadata_upload/valid/metadata_base_image_exists.yaml

@@ -6,7 +6,7 @@ data:
    hosts:
      - zopim.com
  connectorBuildOptions:
-    baseImage: docker.io/airbyte/python-connector-base:1.1.0@sha256:bd98f6505c6764b1b5f99d3aedc23dfc9e9af631a62533f60eb32b1d3dbab20c
+    baseImage: docker.io/airbyte/base-repo-exists:1.1.0@sha256:shathatexists


Changed all of these to the format we use to either indicate that the piece should exist or not.

I think we should probably refactor the way we do this for both images and SHAs (and also probably docs) such that anything exists by default, unless DOESNOTEXIST or something similar is in it. That way, the files can look more like metadata and require less edits when we make new examples. But I'd do that later as it would touch a bunch of files.

erohmensing · 2023-12-06T01:26:50Z

..._overrides_but_image_nonexistent/metadata_main_image_tag_does_not_exist_but_is_overrode.yaml

the valid_overrides_but_base_image_nonexistent folder was confusing as it referred to the high-level dockerImageTag not existing. Now that we also have base image validations, changed it to valid_overrides_but_main_image_nonexistent and the filenames within accordingly

erohmensing · 2023-12-06T01:27:15Z

...s/metadata_validate/invalid/metadata_major_version_no_breaking_change_entry_for_version.yaml

This test failed for the wrong reason - it was supposed to clock that there is no breaking change for version 2.0.0, but its actually incorrectly formatted.

erohmensing · 2023-12-06T01:27:27Z

...ests/fixtures/metadata_validate/invalid/metadata_breaking_change_versions_under_releases.yml

New test to cover the case I uncovered where a breaking change test was failing for the wrong reason

erohmensing · 2023-12-06T01:27:34Z

...ib/tests/fixtures/metadata_validate/invalid/metadata_breaking_changes_not_under_releases.yml

Just a breaking change format test that i don't think we have covered!

erohmensing · 2023-12-06T01:27:51Z

..._service/lib/tests/fixtures/metadata_validate/invalid/metadata_invalid_base_image_no_sha.yml

New test: if there's no sha, it's invalid. Our validator seems to enforce this in

try: image_name, tag_with_sha_prefix, digest = image_address.split(":") # As we query the DockerHub API we need to remove the docker.io prefix image_name = image_name.replace("docker.io/", "") except ValueError: return False, f"Image {image_address} is not in the format <image>:<tag>@<sha>"

alafanechere

LGTM, minor comments, thanks for fixing this!

alafanechere · 2023-12-06T13:58:15Z

airbyte-ci/connectors/metadata_service/lib/metadata_service/validators/metadata_validator.py

@@ -151,7 +151,7 @@ def validate_metadata_base_images_in_dockerhub(
 ) -> ValidationResult:
    metadata_definition_dict = metadata_definition.dict()

-    image_address = get(metadata_definition_dict, "data.connectorOptions.baseImage")
+    image_address = get(metadata_definition_dict, "data.connectorBuildOptions.baseImage")


nice catch! I don't think running full validation is necessary: if the value of this field is incorrect the connector build will fail, so we'd probably catch this error on publish failure anyway.

airbyte-ci/connectors/metadata_service/lib/tests/fixtures/__init__.py

alafanechere · 2023-12-06T14:13:59Z

..._service/lib/tests/fixtures/metadata_validate/invalid/metadata_invalid_base_image_no_sha.yml

alafanechere · 2023-12-06T14:16:25Z

airbyte-ci/connectors/metadata_service/lib/tests/test_gcs_upload.py

-def stub_is_image_on_docker_hub(image_name: str, version: str) -> bool:
-    return "exists" in image_name and version not in MOCK_VERSIONS_THAT_DO_NOT_EXIST
+def stub_is_image_on_docker_hub(image_name: str, version: str, digest: Optional[str] = None, retries: int = 0, wait_sec: int = 30) -> bool:
+    image_exists = all(["exists" in image_name, version not in MOCK_VERSIONS_THAT_DO_NOT_EXIST, digest is None or "exists" in digest])


Can we make the "does not exist" case use 0.0.0 version.
It might be more robust. I can easily make a typo like doesnotexists

I'll look into a better does not exist version number combo (i think we still need one specific to breaking changes) before merging this.

on the "exists" - I agree. Mentioned a suggestion here, but will put that off to a separate pr

went with 99.99.99 and 0.0.0. Not perfect but definitely sticks out more than 6.6.6

Fixed the shas in this one and will fix the images separately

…cated

…re is no sha

…te command test

…alidate base images exist (airbytehq#33127) Co-authored-by: erohmensing <erohmensing@users.noreply.github.com>

This was referenced Dec 5, 2023

fix references to metadata_service changed files #33125

Merged

fix failing metadata tests on master #33126

Merged

erohmensing mentioned this pull request Dec 5, 2023

change mock doc url to look more mock-like #33128

Merged

erohmensing force-pushed the ella/fix-base-image-metadata-tests branch from dcd634e to c0a2868 Compare December 5, 2023 22:57

erohmensing changed the base branch from ella/fix-metadata-tests to ella/mock-doc-url December 5, 2023 22:57

erohmensing force-pushed the ella/mock-doc-url branch from 2e0d3fd to e8a3b64 Compare December 5, 2023 23:50

erohmensing force-pushed the ella/fix-base-image-metadata-tests branch from 8569163 to eb5b9cc Compare December 5, 2023 23:50

Base automatically changed from ella/mock-doc-url to master December 6, 2023 00:08

erohmensing force-pushed the ella/fix-base-image-metadata-tests branch from eb5b9cc to 74e3bcd Compare December 6, 2023 00:10

erohmensing commented Dec 6, 2023

View reviewed changes

erohmensing requested review from bnchrch and alafanechere December 6, 2023 01:11

erohmensing marked this pull request as ready for review December 6, 2023 01:22

erohmensing requested a review from a team December 6, 2023 01:22

erohmensing force-pushed the ella/fix-base-image-metadata-tests branch from 8c9fff4 to 8fe5560 Compare December 6, 2023 01:23

erohmensing changed the title ~~fix tests that weren't running on master, then fix tests that were failing~~ metadata: fix tests not running, fix failing non-running tests, fix validate base images exist Dec 6, 2023

erohmensing commented Dec 6, 2023

View reviewed changes

erohmensing mentioned this pull request Dec 6, 2023

re-organize metadata test fixtures #33134

Merged

alafanechere approved these changes Dec 6, 2023

View reviewed changes

erohmensing added 9 commits December 6, 2023 14:15

add debugging info

349721c

fix invalid data fixture and assert that our fixtures have contents

88debfe

fix debugging expected errors

a5821f6

add 2 tests and fix one which didn't represent what the filename indi…

d18fef1

…cated

fix incorrect arg name

89bc218

fix incorrect property name

75a09b1

add more clarity

2250e1f

clarify even more

e695919

use correct path to retrieve base image

2bc38c0

erohmensing and others added 12 commits December 6, 2023 14:15

fix stub

483da70

add retries to the regular docker hub checks. why the hell not

0a67bc0

fix stub for base images

367ca98

fix base images for stubbing

fe76b12

mock dockerhub for valid files check

0cc9dd3

move base image tests to validate, not upload. add new one for if the…

2f60529

…re is no sha

better error info if no GCS creds, and patch dockerhub in test_valida…

a9b95cf

…te command test

Automated Commit - Formatting Changes

6e305de

remove redundant comment

75590dc

fix error messages for if validation succeeds when it shouldnt have

eafc72e

refactor the checking it failed correctly

2a836f0

use pytest.fail

b513577

erohmensing force-pushed the ella/fix-base-image-metadata-tests branch from 8fe5560 to b513577 Compare December 6, 2023 20:15

erohmensing added 2 commits December 6, 2023 14:58

use better mock versions

534c7e2

assume shas are valid by default

2ecbbc7

erohmensing enabled auto-merge (squash) December 6, 2023 21:16

erohmensing merged commit 62b2020 into master Dec 6, 2023
17 checks passed

erohmensing deleted the ella/fix-base-image-metadata-tests branch December 6, 2023 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metadata: fix tests not running, fix failing non-running tests, fix validate base images exist #33127

metadata: fix tests not running, fix failing non-running tests, fix validate base images exist #33127

erohmensing commented Dec 5, 2023 •

edited

Loading

vercel bot commented Dec 5, 2023 •

edited

Loading

erohmensing commented Dec 5, 2023 •

edited

Loading

erohmensing Dec 6, 2023

alafanechere Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023

alafanechere Dec 6, 2023

alafanechere left a comment

alafanechere Dec 6, 2023

alafanechere Dec 6, 2023

alafanechere Dec 6, 2023

erohmensing Dec 6, 2023

erohmensing Dec 6, 2023 •

edited

Loading

		# Mock dockerhub
		mocker.patch("metadata_service.validators.metadata_validator.is_image_on_docker_hub", side_effect=stub_is_image_on_docker_hub)

metadata: fix tests not running, fix failing non-running tests, fix validate base images exist #33127

metadata: fix tests not running, fix failing non-running tests, fix validate base images exist #33127

Conversation

erohmensing commented Dec 5, 2023 • edited Loading

vercel bot commented Dec 5, 2023 • edited Loading

erohmensing commented Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alafanechere left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erohmensing Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

erohmensing commented Dec 5, 2023 •

edited

Loading

vercel bot commented Dec 5, 2023 •

edited

Loading

erohmensing commented Dec 5, 2023 •

edited

Loading

erohmensing Dec 6, 2023 •

edited

Loading