Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CI workflow - bring back reasonable CI runtime #6461

Merged
merged 1 commit into from
Apr 3, 2023

Conversation

snazy
Copy link
Member

@snazy snazy commented Apr 1, 2023

Unifies most main + PR workflows into a single workflow.

The unified CI workflow consists of 2 "stages":

  • Checks - test, intTest, NesQuEIT, etc
  • Finalize - a "success" dummy job for PRs + a "save to github-cache" job for push-to-main

Checks are split into multiple jobs, which brings back CI workflow runtimes of < 20 minutes for small-ish changes, up to 35 minutes at worst (assuming that all jobs are not delayed by GH concurrent job limits).

Utilizes the Gradle build cache as much as possible. The updated build cache of the jobs in the checks stage are saved as artifacts (with the minimum retention period). The updated build cache is pushed back to GigHub's cache when the checks have successfully finished.

Java CI runs against Java 11. The workflow can be easily adopted to add Java 17 for some (or all) jobs, but that would effectively double the amount of concurrent jobs, and there is a hard limit imposed by GitHub.
Spark + Deltalake tests always run against 11 as configured in the build scripts.

Codecov was not added to the new workflow, it wasn't working for quite a while now or produced wrong results.

Build logs and reports are not archived. Test results and relevant logs are available via Gradle build scans.

Windows + macOS workflows are not included in the unified workflow.

There is also another Gradle cache
action
, which utilizes the GitHub's cache like a remote Gradle cache. However, that puts too much load (requests) against GitHub's cache, which in turn throttles our CI and responds with HTTP/429 (Too many requests). See this issue.

Fixes #6365

@snazy snazy added pr-docker Smoke test Docker images pr-helm Helm chart testing pr-integrations NesQuEIT (Iceberg, Spark, Flink, Presto) pr-native run native test labels Apr 1, 2023
@snazy
Copy link
Member Author

snazy commented Apr 1, 2023

Sample workflow runs:

@snazy snazy force-pushed the unified-workflow-gradle-action branch from 276478e to 83c19dd Compare April 1, 2023 10:47
dimas-b
dimas-b previously approved these changes Apr 3, 2023
Copy link
Member

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test results and relevant logs are available via Gradle build scans.

Shell we add this to README, perhaps?

matrix:
python-version: ['3.7', '3.8', '3.9', '3.10']
python-version: ['3.8', '3.9', '3.10', '3.7']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why reorder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.8 starts first (b/c it has "all the tox checks") - in case there are too many GH WF jobs running, you get that result before 3.9, 3.10, 3.7


# Unifies main + PR workflow.
#
# The unified CI workflow consists of 4 "stages":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there are only 3 list items below 🤷

Unifies most main + PR workflows into a single workflow.

The unified CI workflow consists of 2 "stages":
* Checks - test, intTest, NesQuEIT, etc
* Finalize - a "success" dummy job + a "save to github-cache" job

Utilizes the Gradle build cache for all stages. The updated build cache
of the jobs in the checks stage are saved as artifacts (with the minimum
retention period). The updated build cache is pushed back to GigHub's
cache when the checks have successfully finished.

Java CI runs against Java 11 and Java 17, where it is meaningful.
(Spark + Deltalake tests always run against 11, so not run against
Java 17.) Some checks also run against the latest Java version.

Codecov was not added to the new workflow, it wasn't working for quite
a while now or produced wrong results.

Build logs and reports are not archived. Test results and relevant logs
are available via Gradle build scans.

Windows + macOS workflows are not included in the unified workflow.

There is also another [Gradle cache
action](https://github.com/burrunan/gradle-cache-action), which utilizes
the GitHub's cache like a remote Gradle cache. However, that puts too
much load (requests) against GitHub's cache, which in turn throttles our
CI and responds with HTTP/429 (Too many requests). See [this
issue](burrunan/gradle-cache-action#66).

Fixes projectnessie#6365
@snazy snazy force-pushed the unified-workflow-gradle-action branch from 83c19dd to 47d2aca Compare April 3, 2023 14:49
@snazy snazy merged commit 3bd3911 into projectnessie:main Apr 3, 2023
@snazy snazy deleted the unified-workflow-gradle-action branch April 3, 2023 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-docker Smoke test Docker images pr-helm Helm chart testing pr-integrations NesQuEIT (Iceberg, Spark, Flink, Presto) pr-native run native test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor CI workflows to speed up turn-around times
2 participants