Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Changes to Minor Release Process for the OpenSearch Bundle #150

Closed
CEHENKLE opened this issue Mar 29, 2023 · 23 comments
Closed
Assignees
Labels
bake-time Marks pull requests that are pending a minimum waiting period before they are merged discuss Issues calling for discussion

Comments

@CEHENKLE
Copy link
Member

CEHENKLE commented Mar 29, 2023

What/Why

What are you proposing?

The current OpenSearch bundle[1] release process looks like this:

  1. We follow a release train model for minor releases. At the beginning of the year we publish two dates for each train that year: A Code Freeze date and (about 1 week later) a Release Date.
  2. For each release we cut a release ticket per repo that has milestones in it. Those milestones work backwards from the release date.
  3. After the code freeze date we generate a build and do performance testing, integration testing, as well as update documentation, release notes and a write a blog post.
  4. On the release date we build the artifacts and release them out on all platforms.

Some issues with this process:

  1. Repos are waiting until the last minute to add features (with large merges happening in the last hours). This means we either have to move the freeze date and the release date, or it means we only move the freeze date and then we don't have enough time to test.
  2. Some of the things that we need, like release notes, don't get generated until the last minute, and require lots of cat herding to obtain.
  3. Because features are arriving late, it's very hard for the documentation team to be able to turn around updates in a week.
  4. Churn on features and docs make it difficult to finalize blog content.
  5. Again, because things are coming in so late, the community doesn't have enough visibility into the features/changes that are going into releases.
  6. Several of the issues we've found post freeze are not integration issues, but are instead standard quality misses. This implies teams are not testing until the code freeze date.
  7. Perhaps related, we've found that one week is not enough time to find and fix bugs.

Proposed change:

To address these issue, I propose we make four significant changes:

  • We continue to plan release every 6 weeks for the year, but we only announce the start of the release process, not the actual release date.
  • We add "entrance criteria" that must be met to join the release and "exit criteria" that must be met for the artifacts to be released.
  • If a plug-in doesn't meet the entrance criteria we will take the previous version, bump its version and use that for the release. They will not be able to add features to the release if they miss the start of the window.
  • Every day we generate a release candidate. When the exit criteria are met by a release candidate, we announce the release date (~2 days later) and publish artifacts on that date. If we cannot pass the exit criteria by 2 weeks after the start of the window, we will cancel the release and hold changes until the next window.

Why these changes?
I believe these changes will put the right emphasis on quality, rather than our current system which is more date driven. I looked at putting in more gates and checks to try and make sure we were ready, but I think setting expectations on what has to happen before you join the release puts in the right check and balances. I would like to get to the point where we run a tool that looks for things like release notes, branches being cut, etc, and publishes a readiness checklist. If repo are ready, we automatically pick up the latest version.

Next steps?

Please see the related PR for more details about process, particularly the exit and entrance criteria. I thought it'd be easier to review it line by line there than in an issue. It's also possible this information should move over to the build repo and it will definitely require updates to the issue template.

This issue will stay open until April 14th, and if the process is agreed upon, would be used starting with the 2.8 release of OpenSearch.

Thank you!


[1] The OpenSearch bundle contains OpenSearch core and its plugins, as well as OpenSearch Dashboards and its plugins. It does not include Data Prepper or clients. We release as a bundle because of the tight coupling between the cores and their plugins. There's a good RFC about plans to change that. You should check it out :)

@CEHENKLE CEHENKLE added bake-time Marks pull requests that are pending a minimum waiting period before they are merged discuss Issues calling for discussion and removed untriaged labels Mar 29, 2023
@CEHENKLE
Copy link
Member Author

CEHENKLE commented Mar 29, 2023

Hey! Getting this up has been on my list of things to do for quite a while, so even through I'm heading out for a week, I wanted to publish (CHECK AND RUN HENKLE RIDES AGAIN!). I'll be looking in on the issue occasionally from the road, but I'll fully engage when I get back in a week. Happy debating :)

@anirudha
Copy link

Ship it!

@dblock
Copy link
Member

dblock commented Mar 29, 2023

+100 on the idea that we won't be fixing a hard date and iterating until we have a viable release candidate.

@peternied
Copy link
Member

Overall really big fan of this proposal - nits:

If a plug-in doesn't meet the entrance criteria we will take the previous version, bump its version and use that for the release. They will not be able to add features to the release if they miss the start of the window.

This does not feel enforceable, there are changes like opensearch-project/OpenSearch#6841 that are not internally backward compatible with plugins.

We add "entrance criteria" that must be met to join the release and "exit criteria" that must be met for the artifacts to be released.

How and who validates these criteria? Big bet - what if we didn't add criteria unless it could be validated via a mechanism (GitHub Action/Jenkins workflow)?

@gaiksaya
Copy link
Member

One of the entrance criteria can be the code coverage. Adding features is great but catching the bugs during release cycle is kind of late and opens up "hot fixes" can. Having code coverage as one of the entrance criteria would ensure product quality. Also this can be validated via a mechanism (GitHub Action/Jenkins workflow)

@bbarani
Copy link
Member

bbarani commented Mar 29, 2023

If we cannot pass the exit criteria by 2 weeks after the start of the window, we will cancel the release and hold changes until the next window.

@CEHENKLE Does it make sense to cancel a release if one of the plugin doesn't meet the exit criteria? I would think that we should pull that plugin out OR revert it back to previous version in that scenario to meet the exit criteria of the OpenSearch product.

We need to also think about automating the entrance and exit criteria rather than relying on good intentions.

@CEHENKLE
Copy link
Member Author

@bbarani I've tried to set the exit criteria up so that it's 99% not the plugins. The only thing the plugins should be doing is validation/bug fixes at that point. If they found a truly awful bug that couldn't be fixed in the window, we would either remove them or cancel the release (I think).

@jmazanec15
Copy link
Member

If a plug-in doesn't meet the entrance criteria we will take the previous version, bump its version and use that for the release. They will not be able to add features to the release if they miss the start of the window.

This does not feel enforceable, there are changes like opensearch-project/OpenSearch#6841 that are not internally backward compatible with plugins.

Id like to discuss more details on the fallback version bump plan, given that this proposal strongly hinges on it. What will happen if a plugin has bug fixes alongside an unprepared feature? As @peternied mentioned, how will breaking changes be dealt with - whether they be from core, or a dependency plugin? What branching strategy should be taken? Assume for the sake of example the next minor version release is 2.8., should plugins:

  1. create a 2.8 branch that is "stable" after 2.7 is released and only merge in changes from 2.x that meet the entrance criteria to this branch?
  2. should stable changes be merged into 2.x and features get merged into 2.x from feature branches after they meet entrance criteria?
  3. should all changes be merged into 2.x branch and at release time, if the plugin does not meet entrance criteria, someone will create the 2.8 branch and just cherry-pick the minimal set of changes needed for compatibility

One problem I foresee with this proposal is that it requires action at the window start deadline (i.e. the above fallback needs to be executed/tested). I think, with the limited resources each plugin has, deadlines such as these are hard to enforce and cause several of the problems mentioned with the current process. To avoid this, I think it makes more sense to setup an expectation that each plugin has a stable branch that meets the entrance criteria ready shortly after the previous minor version release/any breaking changes are introduced. With this, we would also need to setup some expectation on when the last breaking change can be checked in before window start. That way, when this window deadline hits, there is no need for some fallback action to take place.

@davidlago
Copy link

💯 to the proposal, this move makes a lot of sense.

  • If a plug-in doesn't meet the entrance criteria we will take the previous version, bump its version and use that for the release. They will not be able to add features to the release if they miss the start of the window.

Along similar lines to what @jmazanec15 points out above, we use releases to keep up with CVE remediation. If a plugin does not meet the entrance criteria, we will still need to add CVE remediations alongside those version bumps from their previous versions.

@bbarani
Copy link
Member

bbarani commented Apr 6, 2023

💯 to the proposal, this move makes a lot of sense.

  • If a plug-in doesn't meet the entrance criteria we will take the previous version, bump its version and use that for the release. They will not be able to add features to the release if they miss the start of the window.

Along similar lines to what @jmazanec15 points out above, we use releases to keep up with CVE remediation. If a plugin does not meet the entrance criteria, we will still need to add CVE remediations alongside those version bumps from their previous versions.

@davidlago We should be able to still meet our patching policy if we can remediate in 60 days, so if someone misses a release they can catch them on the next one. If in case if they are not able to meet the 60 days cadence OR if the CVE is critical, we can always schedule a patch release to meet the date.

Also, we should encourage everyone to merge the CVE fixes to the previous released version branches (Ex: 2.6.0) assuming that patch version is always in play for any release minor version, so in case if end up using that branch for a specific plugin to cut the release branch (if a particular plugin is not ready and need to be reverted to previous version during release cycle), the CVE's would still be patched in upcoming version.

@bbarani
Copy link
Member

bbarani commented Apr 6, 2023

If a plug-in doesn't meet the entrance criteria we will take the previous version, bump its version and use that for the release. They will not be able to add features to the release if they miss the start of the window.

This does not feel enforceable, there are changes like opensearch-project/OpenSearch#6841 that are not internally backward compatible with plugins.

Id like to discuss more details on the fallback version bump plan, given that this proposal strongly hinges on it. What will happen if a plugin has bug fixes alongside an unprepared feature? As @peternied mentioned, how will breaking changes be dealt with - whether they be from core, or a dependency plugin? What branching strategy should be taken? Assume for the sake of example the next minor version release is 2.8., should plugins:

  1. create a 2.8 branch that is "stable" after 2.7 is released and only merge in changes from 2.x that meet the entrance criteria to this branch?
  2. should stable changes be merged into 2.x and features get merged into 2.x from feature branches after they meet entrance criteria?
  3. should all changes be merged into 2.x branch and at release time, if the plugin does not meet entrance criteria, someone will create the 2.8 branch and just cherry-pick the minimal set of changes needed for compatibility

One problem I foresee with this proposal is that it requires action at the window start deadline (i.e. the above fallback needs to be executed/tested). I think, with the limited resources each plugin has, deadlines such as these are hard to enforce and cause several of the problems mentioned with the current process. To avoid this, I think it makes more sense to setup an expectation that each plugin has a stable branch that meets the entrance criteria ready shortly after the previous minor version release/any breaking changes are introduced. With this, we would also need to setup some expectation on when the last breaking change can be checked in before window start. That way, when this window deadline hits, there is no need for some fallback action to take place.

These are the possible options (that I can think of) to successfully revert a plugin to previous version in case of any last minute findings that would put a release in jeopardy.

  • Multiple plugins have tight dependencies on one another creating a tight coupling during the release process.

    • Example: Plugin trying to extend any new functionality implemented in the Job scheduler plugin need to revert back those changes if job scheduler reverts back the enhancement during the release cycle.
    • Recommendation:
      • Using feature flags for enabling and disabling a feature not ready to be shipped with a release
      • Work with the upstream and rollback all the changes made to the dependent plugins without trying to fix the issue during release window (if its going to delay the release by some X number of days)
      • On-board to distribution build CI sooner to surface potential gaps
  • Release branches are created but the bug fixes are taking multiple days during release window causing delay to the release.

    • Recommendation:
      • Use feature flags for enabling and disabling a feature to be shipped with a release
      • Rollback the features with bugs that cannot be fixed within stipulated amount of time during release cycle
      • Create the release branch using the branch used for the previous release and add it to the build manifest to generate release candidate

@jmazanec15
Copy link
Member

@bbarani I think for last minute findings it makes sense to look into reverting/quick bug fixes and taking action to ensure entrance criteria catches these findings for next release. Not sure Im in favor of feature flags, because I think it will mean shipping dead code. That being said, I dont think this proposal is necessarily geared towards last minute findings after the release window starts - more so the state of the plugins when the release window starts.

I think that the majority of the entrance criteria can be checked before the change is checked into a stable/prod-ready branch as opposed to when the release window starts. For the proposed entrance criteria, (1, 3, 5, 7) should be able to be checked before the change is merged. Release notes could be semi-ready if plugins follow a changelog, which really just leaves sanity testing - which should probably be covered by tests and manual testing time at check in as well.

Entrance criteria for ref

  1. Documentation draft PRs are up and in tech review for all component changes.
  2. Sanity testing is done for all components.
  3. Code coverage has not decreased (all new code has tests).
  4. Release notes are ready and available for all components.
  5. Roadmap is up-to-date (information is available to create release highlights).
  6. Release ticket is cut, and there's a forum post announcing the start of the window.
  7. Any necessary security reviews are complete.

Requiring code changes around a release deadline can cause more issues and may lead to more delays, even if they are just reverts. So I am wondering why not enforce entrance criteria checks on merges to stable/prod-ready branches? And then, put some expectations around breaking changes being checked in before release date and how quickly downstream projects need to respond to breaking changes?

@dblock dblock pinned this issue Apr 13, 2023
@prudhvigodithi
Copy link
Member

prudhvigodithi commented Apr 14, 2023

The proposal looks great, this will ensure better quality for the product being released. If we plan release every 6 weeks assuming we have 8 releases per year, as this proposal only goes with minor version we should have a thought about patch/major version within this time frame. Should we need to follow the same entrance criteria for both patch/major? as they look generic and improves the quality of the product as a whole when released.

Requiring code changes around a release deadline can cause more issues and may lead to more delays, even if they are just reverts. So I am wondering why not enforce entrance criteria checks on merges to stable/prod-ready branches? And then, put some expectations around breaking changes being checked in before release date and how quickly downstream projects need to respond to breaking changes?

Hey @jmazanec15 we should already have merge checks (GH workflows) that enforce certain criteria per repo before the code is merged and they are same for all the branches (based on the version the tests could differ though).

For handling the dependency for upstream project management:

  1. We should make sure the patches are generic to be kept up with, if a plugin is acting as a dependency for the other plugins, the need for the patch should be explained, that is, what bugs they fix or what features are being pushed. Here the idea is for minor release, so changes to plugins should not be a breaking change for the dependency plugin.

  2. Patches should split into manageable, functional pieces so they can be comprehended, examined, and individually approved or rejected, having this its easy to address if it caused a breaking change.

If exists any other breaking change from the plugin it should be pushed to the major branch and not backported to the minor/patch branch and then later should be shipped in a product major release.

@ashwin-pc
Copy link
Member

This is definitely a step forward, but I don't think it completely addresses the 6th problem.

Several of the issues we've found post freeze are not integration issues, but are instead standard quality misses. This implies teams are not testing until the code freeze date.

Looking at it from the lens of a feature developer on why large merges are pushed in at the last minute, it mostly boils down to the fact that teams don't want to miss the release train since it's a version they have committed certain features to. In the new approach I'm concerned that this will still be the case, albeit a little harder since there is now only an approximate date for when the release process will start. But now to make sure that the feature does not miss the window, I (hypothetically) as a feature developer will start to shore up all the necessary entrance criteria requirements while still making quality misses since no one but me today is really sanity testing these changes.

One suggestion is to have the sanity testing be done by the release team based on the sanity testing criteria provided by the feature team. This way we can independently audit the quality of the feature before accepting it in the release.

@seanneumann
Copy link

Love this proposal. Little unsure of the "if a plug-in doesn't meet the entrance criteria we will take the previous version, bump its version and use that for the release. They will not be able to add features to the release if they miss the start of the window", but that has been discussed quite a bit.

I'll also double down on a future where we are not bundled (aka not so tightly coupled). In addition to reading Protobuf in OpenSearch and also recommend reading the OpenSearch Dashboards proposal Decoupling the rigid dependency.

Onward.

@krishna-ggk
Copy link

krishna-ggk commented Apr 17, 2023

This is great proposal, but had few questions similar to what others brought up.

We add "entrance criteria" that must be met to join the release and "exit criteria" that must be met for the artifacts to be released.

I echo with @peternied thoughts on having mechanisms to enforce.

Further taking a step back, I notice couple subtle problems with respect to build-sanity that could potentially be attacked upfront

  1. Plugin integration tests in plugin repos don't run with all plugins enabled.
  2. Many plugins aren't in manifest for the latest minor version until last minute. Example with 2.7, plugins were added to manifest between Feb 24 through Apr 15!

Have we considered starting new minor version distribution build with all plugins included in the manifest right from the start with them running integ tests with all plugins enabled instead of incrementally adding plugins to manifest?

There is definitely a time-window at the start where version-bump etc are required for this overall build to start passing, but this forcing function can be good.

Given that overall distribution build failures are considered higher priority, this potentially can act as forcing function for plugin teams to maintain working builds all the time instead of just during release. Further it actively discourages plugin teams from staging changes for next version until release window start. Finally this model should also help flag integration issues upfront.

@bbarani
Copy link
Member

bbarani commented May 16, 2023

Have we considered starting new minor version distribution build with all plugins included in the manifest right from the start with them running integ tests with all plugins enabled instead of incrementally adding plugins to manifest?

@krishna-ggk Yes, we do have a plan to start creating a new minor version with all plugins included in the manifest immediately after shipping a version and run automated tests on regular cadence. @gaiksaya is leading the effort to automate the building and testing part, its just that we cannot start building with all the plugins until the version bump PR (to next version) is merged.

Given that overall distribution build failures are considered higher priority, this potentially can act as forcing function for plugin teams to maintain working builds all the time instead of just during release. Further it actively discourages plugin teams from staging changes for next version until release window start. Finally this model should also help flag integration issues upfront.

As you have mentioned, every team needs to treat build failures as high severity event in order for us to successfully build test and release artifacts in continuous manner. We want to surface and bubble up any issues sooner than later and we can achieve that only if we on-board all repos participating in a release sooner to the build process.

@wbeckler
Copy link

wbeckler commented Jun 12, 2023

Would this proposal be different if dashboards and core had separate release cadences? To separate the cadences, dashboards could be released with multi-core-version compatibility, and dashboard plugins could have multi-core-plugin-version compatibility.

Would this reduce the surface area and integration complexity of bundle releases so much that we wouldn't need to change the minor release process for the OpenSearch bundle?

@bbarani
Copy link
Member

bbarani commented Jun 15, 2023

@wbeckler I don't expect any change to this proposal even if dashboards and core had separate release cadences. This still gives us flexibility to publish OpenSearch and OpenSearch dashboards as separate product versions (if needed) but every product release will have a defined entry and exit criteria.

@bbarani
Copy link
Member

bbarani commented Jun 21, 2023

We will be finalizing this proposal on June 30 2023 July 15 2023 unless there are any other concerns, feedback on this proposed release process.We will move to this release model for future OpenSearch release (exact release version is TBD). Please provide your feedback and inputs before June 30 2023.

@reta
Copy link

reta commented Jun 22, 2023

Like it, would (hopefully) also remove some pressure (driven by date) from the contributors, +1 to it

@bbarani
Copy link
Member

bbarani commented Jun 22, 2023

Love this proposal. Little unsure of the "if a plug-in doesn't meet the entrance criteria we will take the previous version, bump its version and use that for the release. They will not be able to add features to the release if they miss the start of the window", but that has been discussed quite a bit.

Yes, the plan is to not delay a OpenSearch release just because a plug-in doesn't meet the entrance criteria rather move ahead with previous version of that plug-in and have that feature moved to next upcoming release.

@bbarani
Copy link
Member

bbarani commented Jul 18, 2023

Closing this issue as the comments have been addressed. We will start following this release model for upcoming OpenSearch minor release. Please feel free to comment / re-open if needed. CC: @CEHENKLE @dblock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bake-time Marks pull requests that are pending a minimum waiting period before they are merged discuss Issues calling for discussion
Projects
None yet
Development

No branches or pull requests