Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to make enterprise use the latest core code at all times #62

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 169 additions & 0 deletions docs/adrs/0012-enterprise-and-oss-process-coupling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# 12. Couple the Enterprise and OSS development process

Date: 2022-08-12

## Status

Proposed

## Context

Weave Gitops OSS and Weave Gitops Enterprise are in many ways set up
as though they are two independent projects - Enterprise uses stable
releases of OSS, much like it uses stable releases of Flux or Cobra.

However, that doesn't reflect the reality. The reality is that they
are two tightly coupled projects, in several ways. For one thing,
there is a lot of library code in OSS that Enterprise uses - and maybe
there's scope for this to grow. For another thing, there's an
increasing commercial desire to be able to move capabilities between
the two codebases cheaply.

This disconnect between the architecture and engineering reality
wastes time and makes both stakeholders and engineers frustrated. It's
difficult to quantify the exact amount of time wasted, so this instead
describes two scenarios that both are common, and are healthy in a
product before version 1.0.

The first scenario is an Enterprise developer who spots a JS
component in OSS that just need a one line fix to be perfect for the
developer's current ticket. They are faced with two choices:

* Either they have to make a pull request, then wait for the next time
OSS makes a release, and then finish their ticket.
* Or, they have to copy-paste the whole component into a "vendored"
one.

The other scenario is a developer working on an OSS feature that uses
shared code with Enterprise wants to make an API change. They have
three options:

* They can spend time making it backwards compatible so Enterprise doesn't
need to change, leaving 2 implementations of the feature inside OSS
* They can try to remember to submit a pull request to Enterprise
the next time OSS makes a release
* They can just leave it for the Enterprise developers to figure out
and fix.

In both cases, the developer is faced with the choice to accrue tech
debt at a frightening rate, or slow down the Enterprise process.

## Decision

We will accept the fact that the two code bases are so tightly
Copy link
Contributor

@enekofb enekofb Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as suggested in slack https://weaveworks.slack.com/archives/C03QNK53W68/p1661252951461039?thread_ts=1660574525.864539&cid=C03QNK53W68

why dont we have it a go using a branch that follows main main-oss

using the same process but adding a step of

  • before raising a PR with OSS changes, we need to rebase changes from main to main-oss

Some metrics we could gather to determine whether is a good move are

  • avg time OSS is available in main-oss
  • number of manual interventions during merges required
  • number of manual interventions during release required
  • releasability of the resulting branch after brought changes (measured by acceptance testing or similar)

coupled, and embrace it.

The enterprise `main` branch will start tracking the OSS `main`
branch, in both javascript and go. This will be accomplished with a
github action that creates or updates a PR in Enterprise with the
latest OSS main, which kicks off Enterprise's CI, and if it fails the
developer who changed OSS will be notified and be able to fix the
breakage directly, while the context is still in their brain.

This tries to illustrate the proposed flow when OSS Oscar works on a
new feature that changes APIs that Enterprise is using - when Enterprise
Enya next wants to upgrade OSS, Oscar has already pushed an API migration.
```mermaid
sequenceDiagram
participant Oscar
participant OSS as Gitops OSS
participant Enterprise as Gitops Enterprise
participant Enya
Oscar->>OSS: Make PR with new feature
OSS->>Oscar: Success ✅
Oscar->>OSS: Merge
OSS->>Enterprise: Open PR to bump OSS
Enterprise->>Oscar: Build failed ⛔
Oscar->>Enterprise: Use new feature
Enterprise->>Oscar: Success ✅
Enya->>Enterprise: Approve ✅
Oscar->>Enterprise: Merge
```

This means that OSS's release process needs to similarly kick off an
Enterprise release process - when OSS forks a branch to release, that
triggers another action that does the same for Enterprise. When OSS
decides to approve the release, the Enterprise PR is updated with the
stable tags for OSS.

This tries to illustrate the proposed release process - as it is
Enterprise that depends on OSS, Oscar must launch the process, but as
soon as it's begun Enya is able to start testing the release in
Enterprise. When OSS has finished publishing its release, Enya is able
to just release the PR that OSS generated automatically.
```mermaid
sequenceDiagram
participant Oscar
participant OSS as Gitops OSS
participant Enterprise as Gitops Enterprise
participant Enya
Oscar->>OSS: Kick off release process
OSS->>OSS: Opens PR with updated versions
OSS->>Enterprise: Make release test PR to update versions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we'd need this one as we'd hopefully already be up to date?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Yes. Probably?

My thinking was, OSS creates a branch release/v0.10.0 to start making a release. As Enterprise's main branch tracks OSS main, we'd need to freeze Enterprise at the same version as OSS is releasing, so create a release/v0.10.0 in enterprise that points to the exact versions in release/v0.10.0 from OSS. Do QA, fix any last problems, release, merge. Then that PR can live on for a couple of days if necessary, while development can continue in main.

But you seem to be thinking of it the other way round (release from main, stack changes in PRs), which I don't have a problem with either.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you seem to be thinking of it the other way round

Yeah, that is just habit / vibe.

Feels like people might forget to merge to release branch and so we'll need to cherry pick / manage a bit more. vs. devs having to base off track-latest-oss (if we block it from being merged to main for a day or two) as they want new things otherwise the code won't compile.

Oscar->>OSS: Approve ✅
Enya->>OSS: Approve ✅
OSS->>OSS: Merge PR and publish release
OSS->>Enterprise: Make a PR to update OSS, make release
Enya->>Enterprise: Make further changes
Enya->>Enterprise: Approve ✅
Enterprise->>Enterprise: Release & merge
```

In both cases, the automation doesn't go as far as to change
Enterprise - it will merely make pull requests for human review.

## Consequences

This is intended to be a pragmatic decision that can be implemented
very quickly, that helps unblock our developers today. It recognizes
that the boundary between OSS and Enterprise isn't technical but
rather driven by commercial strategy, and as a result isn't sitting
right for the engineers working in it every day. It's very easy to
come up with multiple proposals that are better, however those
proposals would take more work, time, and decisions - this proposal
can be implemented now, and would let most developers continue working
the way they do today, just faster.

The main upside is that changes in OSS will be available to use in
Enterprise in about 15 minutes, compared to weeks today. This is the
kind of improvement that's so big that it's not just about "closing
tickets faster" - it would get rid of entire classes of problems
and sources of frustration to do with migrations and API stability. If
a developer wants to change an API that's used in both, they can fix
both and have them merged before lunch.

As a result, this would bring some of the benefits we could get from
setting up a monorepo for both OSS and Enterprise, but this could be
done very quickly, with very little changes to existing workflows.

One downside is that this holds the developer working on OSS
responsible for making sure their changes work with Enterprise. This
is a little bit of a roadblock for developers primarily working on
OSS, however it's not unreasonable to ask developers to make sure the
product still works with their changes - you wouldn't expect a
developer primarily working on backend to merge API changes that break
the frontend and move on without further conversations.

The biggest challenge is that it would couple releases between OSS and
Enterprise. Enterprise's main branch today can always be released
using a stable OSS release - with this change, Enterprise would have
to choose between releasing with an unstable OSS version, roll back or
backport changes so they work with the last stable OSS release, or ask
OSS to make a stable release. However, it's worth pointing out that
this won't prevent Enterprise from releasing a customer-specific
release - just that the cost of doing that would be slightly
higher in that it needs to verify both OSS and Enterprise
functionality.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a potential interim solution might be to update an existing PR in this case:

Scenario:

  • WGE is ready to release in 3 days or so for a customer.
  • WG does a release v0.10.0 as they have a good cadence.
  • PR created and merged in WGE to update to this use this v0.10.0
  • WG feature work immediately continues and WebGL powered graph lands in main as cafe1234
  • WGE PR 1234 is created to update to weave-gitops to cafe1234
  • WGE PR 1234 is not merged yet as not ready to release, ETA 2 days.
  • WGE PR 1235 is created and worked on which updates to WebGL graphs and was based off of PR 1234
  • WG bug fix for WebGL powered graphs lands in main as beaf5467
  • WGE PR 1234 is updated to update to weave-gitops to beaf5467
  • WGE does a release of 0.10.0
  • WGE waits 12 hours to see if anything breaks
  • WGE merges PR 1234, PR 1235

The PR 1234 would basically become a next-weave-gitops branch really wouldn't it.. now we have multiple branches..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if there's a "recent-ish" WG release, it's all good. I was thinking specifically of if WG's cadence is out of step with WGE, so if WG can't release fast enough for $quality_reasons and WGE can't wait for $customer_reasons, then WGE is a bit stuck. But perhaps that's mainly a problem if there's missing communication and inconsistent quality thresholds.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a need for more co-ord in the WGE teams too, if we have a predictable 2 week cadence too each team can plan their manual release testing etc a bit better.

So if we align w/ WG's cycle that would work well..


However, these release challenges are a new iteration of an old pain
point - both Enterprise and OSS are lacking process for longer-lived
stable branches that are separate from the main feature development
branches so any release includes anything that has landed in main.
It is assumed that both projects will need to solve that. Discussions
about how to do that are already happening, but are out of scope or
this ADR. Until then, both OSS and Enterprise should agree to try to
do a stable release every 2 weeks, whether there are new features or
not, as long as there's no blocking bugs. This synchronization only
happens between the release management function in the respective
project, instead of all developers being subject to this
synchronization overhead.