-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connector CI: Update nightlies to write to GCS #26929
Conversation
Octavia Squidington III seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
@@ -357,6 +360,18 @@ def metadata(self) -> dict: | |||
def docker_image_from_metadata(self) -> str: | |||
return f"{self.metadata['dockerRepository']}:{self.metadata['dockerImageTag']}" | |||
|
|||
@property | |||
def ci_gcs_credentials_secret(self) -> Secret: | |||
# TODO (ben): Update this to be in use ANYWHERE we use a service account. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere I didnt want to add yet another service account env var. It felt like we were going down a bad way there.
Im thinking that we should enforce one service account for the entire pipeline. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere Heres a draft of that change: #26931
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it'd be much better to have a SA for airbyte-ci connectors
that can do the thing it has to do for metadata upload, report upload etc...
workflow_dispatch: | ||
inputs: | ||
runs-on: | ||
type: string | ||
default: dev-large-runner | ||
required: true | ||
test-connectors-options: | ||
default: --concurrency=5 --release-stage=generally_available --release-stage=beta | ||
default: --concurrency=5 --release-stage=generally_available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere I was noticing some flakyness even when only running GA connectors.
Im thinking we start the nightlies off with GA, then if they match the current nightlies we expand to beta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to successfully run a couple of nightlies on both GA and beta so I think we should keep it like it is now.
46e4598
to
b8f7b55
Compare
workflow_dispatch: | ||
inputs: | ||
runs-on: | ||
type: string | ||
default: dev-large-runner | ||
required: true | ||
test-connectors-options: | ||
default: --concurrency=5 --release-stage=generally_available --release-stage=beta | ||
default: --concurrency=5 --release-stage=generally_available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to successfully run a couple of nightlies on both GA and beta so I think we should keep it like it is now.
@@ -31,7 +28,7 @@ jobs: | |||
METADATA_SERVICE_BUCKET_NAME: prod-airbyte-cloud-connector-metadata-service | |||
SPEC_CACHE_BUCKET_NAME: io-airbyte-cloud-spec-cache | |||
SPEC_CACHE_GCS_CREDENTIALS: ${{ secrets.SPEC_CACHE_SERVICE_ACCOUNT_KEY_PUBLISH }} | |||
TEST_REPORTS_BUCKET_NAME: "airbyte-connector-build-status" | |||
TEST_REPORTS_BUCKET_NAME: "airbyte-connector-integration-reports" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might reuse this bucket for non connector use case?
TEST_REPORTS_BUCKET_NAME: "airbyte-connector-integration-reports" | |
TEST_REPORTS_BUCKET_NAME: "airbyte-ci-reports" |
cmd = ctx.invoked_subcommand | ||
|
||
path_values = [ | ||
cmd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
AWS_SECRET_ACCESS_KEY: ${{ secrets.STATUS_API_AWS_SECRET_ACCESS_KEY }} | ||
AWS_DEFAULT_REGION: "us-east-2" | ||
TEST_REPORTS_BUCKET_NAME: "airbyte-connector-build-status" | ||
TEST_REPORTS_BUCKET_NAME: "airbyte-connector-integration-reports" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TEST_REPORTS_BUCKET_NAME: "airbyte-connector-integration-reports" | |
TEST_REPORTS_BUCKET_NAME: "airbyte-ci-reports" |
I believe we could use this bucket for other use cases. I'd suggest to use a common root path like connectors
to group all connectors related reports;
@@ -55,11 +55,46 @@ def validate_environment(is_local: bool, use_remote_secrets: bool): | |||
) | |||
|
|||
|
|||
def render_report_output_prefix(ctx: click.Context) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this function can be a method of the Report class, given you'd pass the invoked_command to our PipelineContext
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this in as I think its a good idea for the airbyte-ci
to set the top level path for all airbyte-ci related outputs.
This lets us standardize the path in both the pipelines and any utilities we might make.
@@ -357,6 +360,18 @@ def metadata(self) -> dict: | |||
def docker_image_from_metadata(self) -> str: | |||
return f"{self.metadata['dockerRepository']}:{self.metadata['dockerImageTag']}" | |||
|
|||
@property | |||
def ci_gcs_credentials_secret(self) -> Secret: | |||
# TODO (ben): Update this to be in use ANYWHERE we use a service account. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it'd be much better to have a SA for airbyte-ci connectors
that can do the thing it has to do for metadata upload, report upload etc...
|
||
@property | ||
def test_report_bucket(self) -> str: | ||
bucket = os.environ.get("TEST_REPORTS_BUCKET_NAME") # TODO (ben): move to config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we move it to the context instance? (I believe its what you mean by config)
git_branch = self.report.pipeline_context.git_branch.replace("/", "_") | ||
suffix = f"{connector_name}/{git_branch}/{connector_version}/{git_revision}.json" | ||
local_report_path = Path(local_reports_path_root + suffix) | ||
connector_name = self.connector.technical_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to get the proper reference I made sure to use the context attached to the report to retrieve these metadata.
connector_name = self.connector.technical_name | ||
connector_version = self.connector.version | ||
|
||
suffix = f"{connector_name}/{connector_version}/output.json" | ||
file_path_key = f"{self.report_output_prefix}/{suffix}" | ||
|
||
local_report_path = Path(local_reports_path_root + file_path_key) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shall we create a dedicated method that would generate GCS and local path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im thinking here that we mandate that the relative path stays the same remote vs local
Then the things are configurable are
- What bucket to write to if remote
- What root folder to write to if tmp
Keeps the path to report consistent between both
if self.report.should_be_saved: | ||
s3_key = self.s3_report_key + suffix | ||
report_upload_exit_code = await remote_storage.upload_to_s3( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a soft roll out could we keep uploading reports to both S3 and GCS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere as I understand this is due to not wanting to break the dashboard correct?
I think no matter what those get broken with the new paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So perhaps we just go over to gcs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the "old" path for S3 to not break the dashboard.
What
We need our nightly builds to output test results to a different path
How
Example output file from a local run
https://storage.googleapis.com/airbyte-connector-integration-reports/test/manual/bnchrch_dagger_update-nightlies/1685688402/f63bd2708013d1cec9d9051b4c9b4328b9bcafff/source-marketo/1.1.0/output.json
closes: #26397