Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developer Experience - CI/CD #10217

Open
sam-heilbron opened this issue Oct 18, 2024 · 4 comments
Open

Developer Experience - CI/CD #10217

sam-heilbron opened this issue Oct 18, 2024 · 4 comments

Comments

@sam-heilbron
Copy link
Contributor

Do you have a suggestion for code improvement or tracking existing technical debt? Please describe.

This issue is the result of a conversation between a number of Engineers who contribute to various projects within Solo. This issue is meant to capture a broader story around CI/CD improvements. It attempts to expose smaller tasks that individual engineers can tackle as well

This issue is meant to focus solely on the CI/CD experience. Please search in other issues for "Developer Experience" to find other areas that we hope to improve

When contributing to the Gloo repository, there are some challenges to go from having a Pull Request opened to having it merged on the main branch (and potentially other LTS branches). As an Open Source project, we should attempt to align our standards to mirror other projects, to provide a consistent developer experience and encourage contributions.

Describe the solution you'd like

1. Fix Changelog-bot becoming out of date on Forks
When a release occurs, the expected changelog folder updates by 1. PRs from forks which were previously passing CI now become out of date. We need to find a way to ensure (before Solution number 2) that forks can easily stay up to date.

2. Standardize a pattern of using forks to contribute Pull Requests
Using forks is a common pattern of other open source projects. We want to continue to encourage community input and collaboration, and to do so, we should follow a common standard.

This flow should be well documented for anyone to follow, and the team maintaining the project should also abide by this approach.

It is likely that when this is introduced, a maintainer will have to manually use /test to trigger the tests defined in cloudbuild.

3. Introduce a Merge Queue
As we increase the number of unique contributors to the repository, we will increasingly hit challenges with PRs being merged serially, and causing all other PRs to re-run CI before merging. A MQ is intended to help alleviate this pain.

We have historically intentionally not done this due to flakes existing in the repository. MQ can exacerbate flakes in a repository, as a single flake will eject all items in the queue. We are nearing a point where this is less of a concern, and in fact introducing the MQ may encourage best practices around quickly triaging and resolving flakes.

Before making this change, please sync with a subset of the team to determine if collectively they believe CI is in a good place, and collaborate with @rpunia1 who introduced this in other Solo repositories.

4. Decouple artifact building from tests.
At the moment, our e2e tests each are responsible for building images and helm charts, loading them into a kind cluster, and then executing e2e tests against them.
At the same time, we rely on our cloudbuild.yaml to build images and helm charts.

This has a couple of side-effects:

  1. We are required to maintain a notoriously painful-to-update cloudbuild
  2. The images that we publish on release and PRs (using cloudbuild) to not necessarily mirror the images that we build for tests
  3. We are duplicating effort to build images+charts C+1 times per PR, where C=number of unique clusters for our e2e tests. We could really just rely on a single cluster
  4. By consolidating our build pipeline, we could easily build a nightly, or on-demand pipeline to publish assets

5. Introduce a directive to fix Code Generation on a PR
When code generation fails in CI, it should be easy for a developer to trigger that change to occur in CI, and not re-run everything locally.

Additional Context

No response

@sam-heilbron
Copy link
Contributor Author

@danehans @nikolasmatt @SirNexus you all were the most vocal in the conversation (though please also tag anyone else that you think would have interest or input). Can you take a look and feel free to make any edits to capture any of the proposals we discussed at the offsite, or additional ones that you have in mind which would affect CI/CD specifically

@danehans
Copy link
Contributor

danehans commented Oct 25, 2024

@sam-heilbron thanks for caputing the discussion points from our meeting on this issue.

As an Open Source project, we should attempt to align our standards to mirror other projects, to provide a consistent developer experience and encourage contributions.

IMHO, we should be more specific by stating that the project should align with CNCF standards. This doc may be a good reference to help guide this initiative. cc: @linsun

@yuval-k
Copy link
Contributor

yuval-k commented Nov 26, 2024

would be nice that similar to when a release happens, and changelog bot moves the changelog to the new folder, that we can just put the changelog in /changelog folder, and changelog will move it to the correct version to begin with.

that way we don't need to check which version was released before creating the changelog

@yuval-k
Copy link
Contributor

yuval-k commented Nov 26, 2024

alternative is to put our logs in /changelog, and have the changelog doc tool to only grab the ones added in commits after the latest release. that way we won't need the bot moving files around too (i believe that's how Istio does it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants