Connector CI: Upload complete.json file on pipeline complete #27051

bnchrch · 2023-06-06T07:06:01Z

What

Add a complete.json file for when a connector pipeline completes

CLAassistant · 2023-06-06T07:06:08Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Octavia Squidington III seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

bnchrch · 2023-06-06T07:06:51Z

tools/ci_connector_ops/ci_connector_ops/pipelines/actions/__init__.py

-    from ci_connector_ops.pipelines.bases import StepResult
-
-
-async def run_steps(


I had to move this for circular import reasons 😢

bnchrch · 2023-06-06T07:07:36Z

tools/ci_connector_ops/ci_connector_ops/pipelines/contexts.py

@@ -67,6 +67,8 @@ def __init__(
        slack_webhook: Optional[str] = None,
        reporting_slack_channel: Optional[str] = None,
        pull_request: PullRequest = None,
+        ci_report_bucket: Optional[str] = None,
+        ci_gcs_credentials: Optional[str] = None,


I moved pipeline report logic and creds to the base pipeline

nit: Do we want to persist reports for metadata pipelines?
I believe dagger metadata pipelines running in the CI will soon be removed as metadata validation will happen in test.

Im thinking we persist it for all CI, if we standardize it then we can always answer usage questions with metabase

bnchrch · 2023-06-06T07:07:53Z

tools/ci_connector_ops/ci_connector_ops/pipelines/contexts.py

@@ -416,24 +420,7 @@ async def __aexit__(
        suffix = f"{connector_name}/{connector_version}/output.json"
        file_path_key = f"{self.report_output_prefix}/{suffix}"

-        local_report_path = Path(local_reports_path_root + file_path_key)
-
-        await local_report_path.parents[0].mkdir(parents=True, exist_ok=True)


I move report saving to the report model

bnchrch · 2023-06-06T07:11:32Z

tools/ci_connector_ops/ci_connector_ops/pipelines/pipelines/connectors.py

+    raise ValueError(f"Could not convert context state: {state} to step status")
+
+
+# HACK: This is to avoid wrapping the whole pipeline in a dagger pipeline to avoid instability just prior to launch


@alafanechere This was the big set back.

To use the GCS uploads at the end of the pipeline we needed dagger.

This left us with a few weird options

Refactor to our utopia where all steps are pipelines (large effort)

Refactor our connector pipelines to be steps (large effort and in the wrong direction)

Add extra argument(s) to all our run_connector_*_pipeline functions (felt verbose)

bolt on a generic report complete pipeline at the end of the run that uses a hack to repurpose an existing connector pipeline

I went with 4 because it was fast, but more importantly it didnt change the logic of our pipelines in a large way. Which is something I wanted to avoid as we are hoping to launch these very soon.

Let me know your thoughts!

It looks like an elegant approach and not a hack to me 😄 But let me know what you think of my suggestion about discarding complete.json and bet on live reporting from dagger

alafanechere · 2023-06-06T17:37:38Z

tools/ci_connector_ops/ci_connector_ops/pipelines/commands/groups/connectors.py

+    "--ci-gcs-credentials",
+    help="The service account to use during CI.",
+    type=click.STRING,
+    required=False,  # Not required for pre-release or local pipelines


Suggested change

required=False, # Not required for pre-release or local pipelines

required=False, # Not required for pre-release or local pipelines

Do we have a logic to validate this is not null when we're on a pre-release?

@alafanechere Yup right here: https://github.com/airbytehq/airbyte/blob/bnchrch/dagger/nightly-report-finished-step/tools/ci_connector_ops/ci_connector_ops/pipelines/commands/groups/connectors.py#L42

alafanechere · 2023-06-06T17:40:04Z

tools/ci_connector_ops/ci_connector_ops/pipelines/contexts.py

@@ -67,6 +67,8 @@ def __init__(
        slack_webhook: Optional[str] = None,
        reporting_slack_channel: Optional[str] = None,
        pull_request: PullRequest = None,
+        ci_report_bucket: Optional[str] = None,
+        ci_gcs_credentials: Optional[str] = None,


nit: Do we want to persist reports for metadata pipelines?
I believe dagger metadata pipelines running in the CI will soon be removed as metadata validation will happen in test.

alafanechere · 2023-06-06T17:41:47Z

tools/ci_connector_ops/ci_connector_ops/pipelines/contexts.py

        connector_name = self.report.pipeline_context.connector.technical_name
        connector_version = self.report.pipeline_context.connector.version
-        git_revision = self.report.pipeline_context.git_revision
-        git_branch = self.report.pipeline_context.git_branch.replace("/", "_")
-        suffix = f"{connector_name}/{git_branch}/{connector_version}/{git_revision}.json"
-        local_report_path = Path(local_reports_path_root + suffix)
-        await local_report_path.parents[0].mkdir(parents=True, exist_ok=True)
-        await local_report_path.write_text(self.report.to_json())
-        if self.report.should_be_saved:
-            s3_key = self.s3_report_key + suffix
-            report_upload_exit_code = await remote_storage.upload_to_s3(
-                self.dagger_client, str(local_report_path), s3_key, os.environ["TEST_REPORTS_BUCKET_NAME"]
-            )
-            if report_upload_exit_code != 0:
-                self.logger.error("Uploading the report to S3 failed.")
+
+        suffix = f"{connector_name}/{connector_version}/output.json"
+        file_path_key = f"{self.report_output_prefix}/{suffix}"


if you moved the report saving logic in the report model I think this should also be migrated there.

alafanechere · 2023-06-06T17:44:28Z

tools/ci_connector_ops/ci_connector_ops/pipelines/contexts.py

+        suffix = f"{connector_name}/{connector_version}/output.json"
+        file_path_key = f"{self.report_output_prefix}/{suffix}"
+
+        await self.report.save(file_path_key)


Suggested change

await self.report.save(file_path_key)

await self.report.share(file_path_key)

What do you think about exposing a "mega" function on the report side that:

Write report to remote storage if ci

Write report to local file system if local

Post a comment on the PR if ci

Print out the report to the console

Send a slack message if ci - on publish only but could be used for connector health reports.

It makes me think that we could revamp the connector health report logic on slack and just report to slack at runtime, one message per connector.

We'd totally defer what needs to be reported and when to the Report class.

Let me know what you think about orchestrating slack messages from Dagger instead of Dagster. The dagger pipeline could also retrieve the previous build statuses from GCS.
I believe this approach would totally remove the need for a "complete.json" file and downstream aggretion and processing on Dagster. The connector badge could also be uploaded from Dagger... Looks like a nice shortcut ATM. Wdyt?

@alafanechere I like this direction!

The only thing we loose is history

We wont have ✅❌✅✅✅❌

Our badges and connector specific status report history looking back

Which might make this a no go

We have timestamp on the key prefix, so we could ls the bucket with the prefix, order results by timestamp and pick the last 6 results.
But I believed this is a logic that's better suited for aggregation usecases in Dagster. And I like to have one big artifact produced daily that persists all our connectors tests results.
I'll let you decide if this is a beneficial shortcut compared to a Dagster job implementation that we could do later.

alafanechere · 2023-06-06T17:54:18Z

tools/ci_connector_ops/ci_connector_ops/pipelines/pipelines/connectors.py

+    raise ValueError(f"Could not convert context state: {state} to step status")
+
+
+# HACK: This is to avoid wrapping the whole pipeline in a dagger pipeline to avoid instability just prior to launch


It looks like an elegant approach and not a hack to me 😄 But let me know what you think of my suggestion about discarding complete.json and bet on live reporting from dagger

alafanechere

Approving to not block you, would love to see the report.share method come to life (even if it's not posting slack messages), but this can come in a subsequent PR.

bnchrch changed the title ~~Bnchrch/dagger/nightly report finished step~~ Connector CI: Upload complete.json file on pipeline complete Jun 6, 2023

bnchrch commented Jun 6, 2023

View reviewed changes

Base automatically changed from bnchrch/dagger/update-nightlies to master June 6, 2023 16:37

alafanechere reviewed Jun 6, 2023

View reviewed changes

Octavia Squidington III added 2 commits June 6, 2023 14:05

Write complete.json file at end of run

7184a51

Move state tranform

f52ea3b

bnchrch force-pushed the bnchrch/dagger/nightly-report-finished-step branch from 48895e4 to f52ea3b Compare June 6, 2023 21:06

alafanechere approved these changes Jun 6, 2023

View reviewed changes

Octavia Squidington III added 2 commits June 6, 2023 14:37

Move file_path_key logic into report

8bd458e

Use property decorator

48d5b70

bnchrch enabled auto-merge (squash) June 6, 2023 22:06

bnchrch merged commit 8bc8414 into master Jun 6, 2023

bnchrch deleted the bnchrch/dagger/nightly-report-finished-step branch June 6, 2023 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connector CI: Upload complete.json file on pipeline complete #27051

Connector CI: Upload complete.json file on pipeline complete #27051

bnchrch commented Jun 6, 2023

CLAassistant commented Jun 6, 2023

bnchrch Jun 6, 2023

bnchrch Jun 6, 2023

alafanechere Jun 6, 2023

bnchrch Jun 6, 2023

bnchrch Jun 6, 2023

bnchrch Jun 6, 2023

alafanechere Jun 6, 2023

alafanechere Jun 6, 2023

bnchrch Jun 6, 2023

alafanechere Jun 6, 2023

alafanechere Jun 6, 2023

alafanechere Jun 6, 2023

bnchrch Jun 6, 2023 •

edited

Loading

alafanechere Jun 6, 2023

alafanechere Jun 6, 2023

alafanechere left a comment

		from ci_connector_ops.pipelines.bases import StepResult


		async def run_steps(

		raise ValueError(f"Could not convert context state: {state} to step status")


		# HACK: This is to avoid wrapping the whole pipeline in a dagger pipeline to avoid instability just prior to launch

	required=False, # Not required for pre-release or local pipelines
	required=False, # Not required for pre-release or local pipelines

	await self.report.save(file_path_key)
	await self.report.share(file_path_key)

Connector CI: Upload complete.json file on pipeline complete #27051

Connector CI: Upload complete.json file on pipeline complete #27051

Conversation

bnchrch commented Jun 6, 2023

What

CLAassistant commented Jun 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnchrch Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alafanechere left a comment

Choose a reason for hiding this comment

bnchrch Jun 6, 2023 •

edited

Loading