-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Airbyte-ci: Generate binary files for airbyte-ci #31930
Airbyte-ci: Generate binary files for airbyte-ci #31930
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
…eneration-pyinstaller
…eneration-pyinstaller
9396814
to
d64044b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I like that this is an experiment too, so we can slowly roll this out
ci_gcs_credentials_secret: dagger.Secret, | ||
ci_artifact_bucket_name: str, | ||
image: str, | ||
output_filename: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... what will the output_filename
be? Ideally, it's something tike airbyte-ci-{arch}-{version}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally it would have just been airbyte-ci-debian
or airbyte-ci-macos
however I updated it to be airbyte-ci-amd64-1.2.3
and airbyte-ci-arm64-1.2.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm chock full of questions today 🤠
@@ -123,7 +123,7 @@ At this point you can run `airbyte-ci` commands. | |||
|
|||
| Option | Default value | Mapped environment variable | Description | | |||
| --------------------------------------- | ------------------------------- | ----------------------------- | ------------------------------------------------------------------------------------------- | | |||
| `--no-tui` | | | Disables the Dagger terminal UI. | | |||
| `--enable-dagger-run/--disable-dagger-run` | `--enable-dagger-run`` | | Disables the Dagger terminal UI. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I was worried about merging airbyte CI and airbyte CI internal but this works
@@ -408,6 +408,7 @@ This command runs the Python tests for a airbyte-ci poetry package. | |||
## Changelog | |||
| Version | PR | Description | | |||
| ------- | ---------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | | |||
| 2.6.0 | [#31974](https://github.com/airbytehq/airbyte/pull/31974) | Merge airbyte-ci-internal into airbyte-ci | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ was this necessary to get the binaries working? because of the scripts, or what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we should mention the new command!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it was. Basically because the airbyte-ci
entry point relied on airbyte-ci-internal
being a separate command. With a binary we can (technically should) only build 1 so we had to merge 2 into 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, makes sense
# by checking if sys.argv contains the dagger run command | ||
# e.g. `dagger run aibyte-ci connectors publish` | ||
|
||
called_with_dagger_run = os.getenv("_DAGGER_WRAP_APPLIED") == "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code does not seem to check what the comment says it does
ci_gcs_credentials: str, | ||
ci_job_key: str, | ||
s3_build_cache_access_key_id: str, | ||
s3_build_cache_secret_key: str, | ||
show_dagger_logs: bool, | ||
): # noqa D103 | ||
ctx.ensure_object(dict) | ||
|
||
|
||
if enable_dagger_run and not is_current_process_wrapped_by_dagger_run(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case do we start out not wrapped by dagger run, if we only have the one entrypoint now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I take it back, looks like this is now the entrypoint, so we'll almost always not be wrapped in dagger run and we do the wrapping here. In which case... is this pretty much only to test that someone didn't append dagger run
themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're the only ones adding that env var I'm not sure how that works either 🤔
|
||
|
||
if enable_dagger_run and not is_current_process_wrapped_by_dagger_run(): | ||
main_logger.info("Re-Running airbyte-ci with dagger run.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this log might be a bit confusing to users when it does this each time by default 🤔
@@ -88,16 +87,12 @@ def check_dagger_cli_install() -> str: | |||
return check_dagger_cli_install() | |||
return dagger_path | |||
|
|||
|
|||
def main(): | |||
set_working_directory_to_root() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woohoo, no longer doing this in 2 places!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes because I think a minor code restructuration would help in maintaining this.
I'm also not sure about the release flow you have in mind.
- Is it supposed to be a manual release that we run locally?
- Would it be a reasonable effort in the scope of this PR to create a workflow that would call
airbyte-ci release
command onpyproject.toml
file change ofairbyte-ci
on the master branch? (release on merge)
PLATFORMS_TO_BUILD = { | ||
"amd64": "python:3.10-slim", | ||
"arm64": "arm64v8/python:3.10-slim", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
container = await ( | ||
dagger_client | ||
.container() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.container() | |
.container(platform=dagger.Platform(f"linux/{platform_name}")) | |
.from("python:3.10-slim") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! I did not know this! Perfect!
from pipelines.dagger.actions.remote_storage import upload_to_gcs | ||
|
||
PLATFORMS_TO_BUILD = { | ||
"amd64": "python:3.10-slim", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be "smart" to use or python connector base image as it really has poetry
?
Might not be a good idea as our base image is made to build connectors..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Ideally I would like to leave it separate, but I do like the speed gains of not having to install poetry. Sure why not!
.with_exec([f"./dist/{artifact_name}", "--version"]) | ||
.with_exec(["ls", "-la", "dist"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are "checks" command, can we:
- await on container (without the check command): it will trigger the bin generation: if the bin gen fails an ExecError will be thrown
- await on container.with_exec(). If the command exec fails an ExecError will be thrown but we'll know that it's these specific check command that return a 1 status code.
- Upload: upload is actually executed when we know that the bin gen succedded.
It'll help avoid lazy execution surprises and simplify debugging.
"arm64": "arm64v8/python:3.10-slim", | ||
} | ||
|
||
DIRECTORIES_TO_MOUNT = [".git", "airbyte-ci"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we mount .git
?
Is it because when we check the bin we run --version
which has to be run inside our git
repo?
I would suggest not mounting the .git
dir at binary generation time if it's not required, and run the checks inside platform specific "check" containers following the bin generation.
On these "check" containers we'd mount:
- The bin
- The airbyte repo, not using
.git
but using the directory representation of our repo with Daggerclient.git
method..
It will make thisairbyte-ci release
command run faster locally as.git
is pretty fat locally.
CURRENT_VERSION = importlib.metadata.version("pipelines") | ||
|
||
|
||
async def create_airbyte_ci_release( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd appreciate three different function to split responsability:
generate_binary(airbyte_repo: Directory, platform: dagger.Platform)
-> runs the pyinstaller commandcheck_binary(airbyte_ci_binary: File, platform: dagger.Platform)
-> mount the binary in platform specific containers with the airbyte repo and run sanity checks. Returns a boolean wether the check succeeds or notupload_binary(airbyte_ci_binary: File, platform: dagger.Platform)
uploads the binary to GCS
run_releases
would call these function for each platform and would first retrieve the current airbyte-repo with dagger_client.git
await upload_to_gcs( | ||
dagger_client=dagger_client, | ||
file_to_upload=binary_file, | ||
key=file_path, | ||
bucket=ci_artifact_bucket_name, | ||
gcs_credentials=ci_gcs_credentials_secret, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explicitely log the public GCS url of the uploaded binaries?
"--onefile", | ||
"pipelines/cli/airbyte_ci.py", | ||
]) | ||
.with_exec([f"./dist/{artifact_name}", "--version"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check that the --version
output matches the CURRENT_VERSION
?
name: Connector Ops CI - Experimental Airbyte CI Release | ||
|
||
# Note this is a workflow simply to test out if github can build using macos | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.ref }} | ||
cancel-in-progress: true | ||
|
||
on: | ||
workflow_dispatch: | ||
jobs: | ||
build: | ||
runs-on: ${{ matrix.os }} | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
os: ['ubuntu-latest', 'macos-latest'] | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- uses: actions/setup-python@v2 | ||
with: | ||
python-version: 3.10 | ||
|
||
- run: curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python - | ||
|
||
- run: cd airbyte-ci/pipelines/airbyte_ci | ||
- run: poetry install --with dev | ||
- run: poetry run pyinstaller --collect-all pipelines --collect-all beartype --collect-all dagger --hidden-import strawberry --name airbyte-ci-${{ matrix.os }} --onefile pipelines/cli/airbyte_ci.py | ||
- uses: actions/upload-artifact@v2 | ||
with: | ||
path: dist/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this workflow if it's not workfing for macos-latest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Sorry I couldve explained this better. The reason I have this github action is to test if githubs macos-latest image is successful in building an macos image (given that our amd64 is not)
…eneration-pyinstaller
@alafanechere Ready for your rereview! This has been updated to use github actions exclusively and Ive deleted the You can build and download the macos binary by
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only requesting changes for the GHA workflow, I'd like to make sure we can't override a released version from a branch
uses: google-github-actions/upload-cloud-storage@v1 | ||
with: | ||
path: airbyte-ci/connectors/pipelines/dist/${{ steps.get_artifact_name.outputs.artifact_name}} | ||
destination: dev-airbyte-cloud-connector-metadata-service/airbyte-ci/releases/${{ steps.get_artifact_name.outputs.artifact_name}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the destination bucket / path a workflow input?
Would be nice to be able to write to safely write to pre-releases
to make sure an upload from a PR without a bumped airbyte-ci version never overwrites a previous artifact. An alternative way to avoid accidental overwrite would be to add a commit id postfix.
airbyte-ci/.gitignore
Outdated
@@ -30,7 +30,6 @@ MANIFEST | |||
# Usually these files are written by a python script from a template | |||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | |||
*.manifest | |||
*.spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share why it's not gitignored anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this was an artifact from attempting to use spec files as the way to define pyinstaller configuration.
will revert
@@ -56,7 +57,7 @@ def display_welcome_message() -> None: | |||
) | |||
|
|||
|
|||
def check_up_to_date() -> bool: | |||
def check_up_to_date(throw_as_error=False) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
I'd suggest making this function always succeed (returns true or false)
And implement the raising logic at the caller level.
It'd be better to throw a ClickException
as it has a show
method to cleanly output errors to stderr
.
I think taking this approach would help build more environment checks in a consolidated way.
@@ -206,6 +212,38 @@ def check_local_docker_configuration(): | |||
) | |||
|
|||
|
|||
def is_dagger_run_enabled_by_default() -> bool: | |||
dagger_run_by_default = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we store it in a constant variable at module level?
@@ -72,6 +72,10 @@ def get_dagger_cli_version(dagger_path: Optional[str]) -> Optional[str]: | |||
|
|||
|
|||
def check_dagger_cli_install() -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shall we rename this to get_dagger_path
that would wrap subfunctions:
- Checking if the current dagger cli version is the correct one
- Installing it if needed
- Returning the current dagger cli path
@functools.cache | ||
def get_git_repo() -> git.Repo: | ||
"""Retrieve the git repo.""" | ||
return git.Repo(search_parent_directories=True) | ||
|
||
|
||
@functools.cache | ||
def get_git_repo_path() -> str: | ||
"""Retrieve the git repo path.""" | ||
return get_git_repo().working_tree_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💎
Can we ensure the current git repo is the airbyte one?
We would check the remote url and fail if it's not airbyte?
remote_url = repo.remotes[0].config_reader.get("url")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explain airbyte-ci developers how to build the binary locally in this file?
def call_current_command_with_dagger_run(): | ||
mark_dagger_wrap() | ||
if (os.environ.get("AIRBYTE_ROLE") == "airbyter") or (os.environ.get("CI") == "True"): | ||
os.environ[DAGGER_CLOUD_TOKEN_ENV_VAR_NAME_VALUE[0]] = DAGGER_CLOUD_TOKEN_ENV_VAR_NAME_VALUE[1] | ||
|
||
exit_code = 0 | ||
if len(sys.argv) > 1 and any([arg in ARGS_DISABLING_TUI for arg in sys.argv]): | ||
command = ["airbyte-ci-internal"] + [arg for arg in sys.argv[1:] if arg != "--no-tui"] | ||
else: | ||
dagger_path = check_dagger_cli_install() | ||
command = [dagger_path, "run", "airbyte-ci-internal"] + sys.argv[1:] | ||
dagger_path = check_dagger_cli_install() | ||
command = [dagger_path, "run"] + sys.argv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this change breaks the feature of running airbyte-ci
from nested airbyte subfolders if the binary is not in the shell path:
e.g running:
cd airbyte-ci/connectors/pipelines
./dist/airbyte-ci connectors --name=source-faker test
Output:
INFO pipelines.cli.dagger_run: Running command: ['/Users/augustin/bin/dagger', 'run', './dist/airbyte-ci', 'connectors', '--name=source-faker', 'test'] dagger_run.py:113
█ [0.00s] ERROR ./dist/airbyte-ci connectors --name=source-faker test
This is not happening after I added airbyte-ci
to my path (which the local install process should do)
3279e55
to
f72546d
Compare
ab3a5b1
to
325fd2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnchrch I made some improvements to the workflow:
- Make sure we always checkout the same commit for ubuntu and macOS build (by setting
ref
in the checkout action) - Upload pre-release binaries with the commit sha, to avoid accidental overwrites of “pre-releases” when folks are iterating on different branches
- Pre-release binaries are uploaded to the dev bucket
- Version + latest binaries are uploaded to the prod bucket (only uploaded when the workflow runs on master)
- On master:
- Upload a per version artifact and a latest artifact: this can be helpful if we’d like to pin CI / users to a specific version
- I made a minor filename / path changes:
- I changed the GCS paths to
airbyte-ci/releases/<os>/<latest/version/sha>/airbyte-ci
- I removed the
-latest
prefix on os to avoid confusing postfixes
- I changed the GCS paths to
- Print uploaded artifact public url in the workflow for convenience
- Change fetch-depth back to default to not fetch unrelated branches
You can check how the new workflow looks on this run
✅ Local install and run still works |
Goal
Provide binaries for use in CI and locally
Anti Goal(s)
How
poetry run poe build-release-binary
which generates a standalone binary wrapping dependencies and pythonairbyte-ci
and removes `airbyte-ci-internalGotchas/Learnings/Deadends
Later PRs