Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connector CI: Upload complete.json file on pipeline complete #27051

Merged
merged 4 commits into from
Jun 6, 2023

Conversation

bnchrch
Copy link
Contributor

@bnchrch bnchrch commented Jun 6, 2023

What

Add a complete.json file for when a connector pipeline completes

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Octavia Squidington III seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@bnchrch bnchrch changed the title Bnchrch/dagger/nightly report finished step Connector CI: Upload complete.json file on pipeline complete Jun 6, 2023
from ci_connector_ops.pipelines.bases import StepResult


async def run_steps(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move this for circular import reasons 😢

@@ -67,6 +67,8 @@ def __init__(
slack_webhook: Optional[str] = None,
reporting_slack_channel: Optional[str] = None,
pull_request: PullRequest = None,
ci_report_bucket: Optional[str] = None,
ci_gcs_credentials: Optional[str] = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved pipeline report logic and creds to the base pipeline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we want to persist reports for metadata pipelines?
I believe dagger metadata pipelines running in the CI will soon be removed as metadata validation will happen in test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im thinking we persist it for all CI, if we standardize it then we can always answer usage questions with metabase

@@ -416,24 +420,7 @@ async def __aexit__(
suffix = f"{connector_name}/{connector_version}/output.json"
file_path_key = f"{self.report_output_prefix}/{suffix}"

local_report_path = Path(local_reports_path_root + file_path_key)

await local_report_path.parents[0].mkdir(parents=True, exist_ok=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I move report saving to the report model

raise ValueError(f"Could not convert context state: {state} to step status")


# HACK: This is to avoid wrapping the whole pipeline in a dagger pipeline to avoid instability just prior to launch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere This was the big set back.

To use the GCS uploads at the end of the pipeline we needed dagger.

This left us with a few weird options

  1. Refactor to our utopia where all steps are pipelines (large effort)
  2. Refactor our connector pipelines to be steps (large effort and in the wrong direction)
  3. Add extra argument(s) to all our run_connector_*_pipeline functions (felt verbose)
  4. bolt on a generic report complete pipeline at the end of the run that uses a hack to repurpose an existing connector pipeline

I went with 4 because it was fast, but more importantly it didnt change the logic of our pipelines in a large way. Which is something I wanted to avoid as we are hoping to launch these very soon.

Let me know your thoughts!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like an elegant approach and not a hack to me 😄 But let me know what you think of my suggestion about discarding complete.json and bet on live reporting from dagger

Base automatically changed from bnchrch/dagger/update-nightlies to master June 6, 2023 16:37
"--ci-gcs-credentials",
help="The service account to use during CI.",
type=click.STRING,
required=False, # Not required for pre-release or local pipelines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
required=False, # Not required for pre-release or local pipelines
required=False, # Not required for pre-release or local pipelines

Do we have a logic to validate this is not null when we're on a pre-release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -67,6 +67,8 @@ def __init__(
slack_webhook: Optional[str] = None,
reporting_slack_channel: Optional[str] = None,
pull_request: PullRequest = None,
ci_report_bucket: Optional[str] = None,
ci_gcs_credentials: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we want to persist reports for metadata pipelines?
I believe dagger metadata pipelines running in the CI will soon be removed as metadata validation will happen in test.

Comment on lines 417 to 423
connector_name = self.report.pipeline_context.connector.technical_name
connector_version = self.report.pipeline_context.connector.version
git_revision = self.report.pipeline_context.git_revision
git_branch = self.report.pipeline_context.git_branch.replace("/", "_")
suffix = f"{connector_name}/{git_branch}/{connector_version}/{git_revision}.json"
local_report_path = Path(local_reports_path_root + suffix)
await local_report_path.parents[0].mkdir(parents=True, exist_ok=True)
await local_report_path.write_text(self.report.to_json())
if self.report.should_be_saved:
s3_key = self.s3_report_key + suffix
report_upload_exit_code = await remote_storage.upload_to_s3(
self.dagger_client, str(local_report_path), s3_key, os.environ["TEST_REPORTS_BUCKET_NAME"]
)
if report_upload_exit_code != 0:
self.logger.error("Uploading the report to S3 failed.")

suffix = f"{connector_name}/{connector_version}/output.json"
file_path_key = f"{self.report_output_prefix}/{suffix}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you moved the report saving logic in the report model I think this should also be migrated there.

suffix = f"{connector_name}/{connector_version}/output.json"
file_path_key = f"{self.report_output_prefix}/{suffix}"

await self.report.save(file_path_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
await self.report.save(file_path_key)
await self.report.share(file_path_key)

What do you think about exposing a "mega" function on the report side that:

  • Write report to remote storage if ci
  • Write report to local file system if local
  • Post a comment on the PR if ci
  • Print out the report to the console
  • Send a slack message if ci - on publish only but could be used for connector health reports.

It makes me think that we could revamp the connector health report logic on slack and just report to slack at runtime, one message per connector.

We'd totally defer what needs to be reported and when to the Report class.

Let me know what you think about orchestrating slack messages from Dagger instead of Dagster. The dagger pipeline could also retrieve the previous build statuses from GCS.
I believe this approach would totally remove the need for a "complete.json" file and downstream aggretion and processing on Dagster. The connector badge could also be uploaded from Dagger... Looks like a nice shortcut ATM. Wdyt?

Copy link
Contributor Author

@bnchrch bnchrch Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere I like this direction!

The only thing we loose is history

  1. We wont have ✅❌✅✅✅❌
  2. Our badges and connector specific status report history looking back

Which might make this a no go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have timestamp on the key prefix, so we could ls the bucket with the prefix, order results by timestamp and pick the last 6 results.
But I believed this is a logic that's better suited for aggregation usecases in Dagster. And I like to have one big artifact produced daily that persists all our connectors tests results.
I'll let you decide if this is a beneficial shortcut compared to a Dagster job implementation that we could do later.

raise ValueError(f"Could not convert context state: {state} to step status")


# HACK: This is to avoid wrapping the whole pipeline in a dagger pipeline to avoid instability just prior to launch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like an elegant approach and not a hack to me 😄 But let me know what you think of my suggestion about discarding complete.json and bet on live reporting from dagger

@bnchrch bnchrch force-pushed the bnchrch/dagger/nightly-report-finished-step branch from 48895e4 to f52ea3b Compare June 6, 2023 21:06
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to not block you, would love to see the report.share method come to life (even if it's not posting slack messages), but this can come in a subsequent PR.

@bnchrch bnchrch enabled auto-merge (squash) June 6, 2023 22:06
@bnchrch bnchrch merged commit 8bc8414 into master Jun 6, 2023
@bnchrch bnchrch deleted the bnchrch/dagger/nightly-report-finished-step branch June 6, 2023 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants