Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a "nightly" instance of Playground that gets updated everyday with the daily build. #129

Closed
11 tasks done
CEHENKLE opened this issue Aug 15, 2023 · 12 comments
Closed
11 tasks done
Assignees
Labels
enhancement New feature or request

Comments

@CEHENKLE
Copy link
Member

CEHENKLE commented Aug 15, 2023

Is your feature request related to a problem? Please describe

I'd like to be able to play with new features faster as they're getting build on OpenSearch. Therefore, I'd like a "nightly" version of playground to get built with every successful build -- first with the 2.x build, but ultimately with main as well.

Describe the solution you'd like

Whenever there's a successful night built, we replace the current nightly playground build automatically. We will need a way to see what version we're pointing at.

Acceptance Criteria:

  1. Support upcoming version for 2.x
  2. OS and OSD gets deployed nightly on a regular basis using the latest builds.
  3. Have generic anonymous(read-only) access for OpenSearch DashBoards.
  4. Display the commits related to a build-id.
  5. The instance should be publicly accessible.

Action Items

@CEHENKLE CEHENKLE added enhancement New feature or request untriaged Issues that have not yet been triaged labels Aug 15, 2023
@bbarani
Copy link
Member

bbarani commented Aug 17, 2023

@CEHENKLE Thanks for opening an issue. Should we eventually create playground instance for all actively supported versions i.e. 1.x, 2.x and main branch? Also, I assume its for both OpenSearch and OpenSearch dashboards?

CC: @opensearch-project/engineering-effectiveness

@bbarani bbarani removed the untriaged Issues that have not yet been triaged label Aug 17, 2023
@gaiksaya
Copy link
Member

gaiksaya commented Aug 29, 2023

Hi @CEHENKLE ,

Couple of questions in addition to above, before we can get started:

  • We recently introduced continuing the build even if few of the non-crucial components failed to build. When we want to deploy the nightly, do you expect all components to be present or it is just enough that cluster is up and running even though it is missing something like sql plugin because it did not pass the build? Applies to both OpenSearch and OpenSearch dashboards
  • In case of consistent failures for days/weeks due to crucial components, the playground won't be updated until they are fixed. Is that okay?

Crucial components today: Both OS and OSD core engines, job-scheduler and common-utils.

@gaiksaya gaiksaya self-assigned this Aug 30, 2023
@bbarani
Copy link
Member

bbarani commented Sep 18, 2023

Hi @CEHENKLE ,

Couple of questions in addition to above, before we can get started:

Crucial components today: Both OS and OSD core engines, job-scheduler and common-utils.

  • We recently introduced continuing the build even if few of the non-crucial components failed to build. When we want to deploy the nightly, do you expect all components to be present or it is just enough that cluster is up and running even though it is missing something like sql plugin because it did not pass the build? Applies to both OpenSearch and OpenSearch dashboards

My 2 cents. We should just deploy latest snapshot artifact (with whatever plugin in there and we have a logic to fail builds for crucial plugins already). It would be great if we can annotate the build, tests details corresponding to the deployed artifact ( either via iframe, or through indexing) so users would know the list of plugins available on that build.

  • In case of consistent failures for days/weeks due to crucial components, the playground won't be updated until they are fixed. Is that okay?

Surfacing the build information along with the build date used for a specific deployment would help to understand the long pending failures.

@prudhvigodithi
Copy link
Member

We can even explore adding all the build details, plugin information, jenkins build URL, inside a slash path (/)
Example: https://nightly.playground.opensearch.org as the main dashboard page https://nightly.playground.opensearch.org/buildDetails has above mentioned information.

Related Jenkins Example: https://build.ci.opensearch.org/systemInfo

@gaiksaya
Copy link
Member

gaiksaya commented Oct 5, 2023

Approaches

1. Use opensearch-cluster-cdk

Use opensearch-cluster-cdk as a mechanism that to deploy the cluster. This code base already has the functionality to deploy the nightly built artifacts that includes both OpenSearch and OpenSearch Dashboards. For any missing functionalities such as customized opensearch.yml, security permissions, etc can be contributed to the code base.

[Image: image.png]

Pros

  • Well established, tested and actively used code base readily available to use.
  • Used to deploy other publicly available OpenSearch and OpenSearch Dashboards cluster (data-store cluster)
  • Needs minimal changes
  • Reproducible for the community
  • Has future scope to onboard other distributions (if at all required)

Cons

  • As of now, this set-up only tests tarball as a distribution
  • Development would depend on the development on opensearch-cluster-cdk
  • Cluster needs to be managed by ourselves. Need to take care of everything starting with data, permissions, deployments
  • Need to add new workflow/pipeline for active deployments

2. Onboard to existing playground framework

The GitHub repository dashboards-anywhere is responsible for hosting multiple instances of playground that are hosted today. Few examples are as below:

dashboards-anywhere has on-boarded multiple instances of playground that can be tracked here: https://github.com/opensearch-project/dashboards-anywhere/tree/main/config/playground/helm
The code base uses EKS, terraform and helm all together to form an end cluster. The deployments are taken care automatically by the Github Actions workflows.

Since this used helm-charts at the backend, all we need is a container image (dockerhub/ECR) pushed to staging daily (which we already have it.)

Pros

  • Actively used code base for current playground set up.
  • Deployment is taken care of by the repository GitHub Actions
  • Maintenance of the code base is a shared effort
  • Detailed on-boarding instructions

Cons

  • Extensive manual set up required. We might need to spend sometime to understand and source code the infrastructure part of it.
  • We are not using a direct distribution here (example: docker, tarballs, etc) but deploying using helm. We are expected to see helm, EKS related issues. However, features and functionalities of core software should not be affected.
  • Since we are not the code owners of the repository, the development would be dependent on the contributing PRs (we do have an option to become maintainers)
  • Semi-centralized code base
  • Needs improved credential handling in GitHub Actions

@gaiksaya
Copy link
Member

gaiksaya commented Oct 5, 2023

Would like to get some input on what approaches from @Flyingliuhub @dblock @bbarani @AMoo-Miki . Please feel free to tag people who you think can provide valuable input to this.
Thanks!

@AMoo-Miki
Copy link

AMoo-Miki commented Oct 9, 2023

Thanks Sayali. Both of these are great options and as you pointed out have great pros and probably painful cons.

Considering that extensive manual setup is needed now (and will probably be needed again when certain updates happen), and that we could face challenges that we cannot fix ourselves, the existing playground framework sound more challenging to be setup. I also wonder if this pipeline will make it less customizable for us to do crazy things like manipulating the installations.

I am working on a proposal for the Playground to use a modified security plugin that will create read/write metadata indices for each visitor, allowing them to login anonymously but experience a fully functioning Dashboards. I suspect the modifications to the plugin would be applied as a post-install patch. Similarly, we might want to have the latest version of OUI included in the nightly builds; patching post-install would be better than building specific images for playground that are different from the images we release nightly.

While being forced to maintain the infrastructure could be a pain, I feel it would be good for us to learn of the pains and solve them for the users.

Considering the freedom to customize (without building different images) and the ability to learn, the opensearch-cluster-cdk sounds more attractive to me.

@gaiksaya
Copy link
Member

gaiksaya commented Oct 9, 2023

Thanks @AMoo-Miki. If there are going to be customization and additional installations I agree opensearch-cluster-cdk gives us that edge. Couple of questions before I proceed with drafting a design for this:

  1. Can I know what role does OUI play in this?
  2. When you say customized security plugin, I believe you mean customized permissions rather than the default ones?

@AMoo-Miki
Copy link

AMoo-Miki commented Oct 11, 2023

  1. Can I know what role does OUI play in this?

OUI is a vital component of OSD which has its own release cycle. Any change in OUI has the potential to change the UX of OSD. For example, the recent UX changes to OSD were almost completely driven by OUI and we had to resort to setting up our own endpoints for nightly builds to showcase the changes.

  1. When you say customized security plugin, I believe you mean customized permissions rather than the default ones?

My idea is much crazier than that: the idea patches the built artifacts of security-dashboards-plugin to allow for randomly suffixed .kibana/.opensearch_dashboards metadata stores.

In my vision:

  1. OUI has two artifacts: (a) the consumable code and (b) its docs site which has its own playground
  2. Every night, an attempt is made to build OUI, OSD, OS, and plugins; upon successful completion, these nightly artifacts are made public through the normal channels.
  3. The latest nightly artifacts for OSD and its plugins are deployed to a staging environment and any custom patches are applied; these include a patch to use the nightly artifact of OUI, as well as security plugin's patch for random suffixes.
    1. If OSD fails to start due to an OUI incompatibility, the last known working nightly of OUI will be patched in and an issue will be raised on OUI to fix the problem. The nightly artifact for OUI will be marked as broken.
    2. If OSD fails to start due to a plugin, the plugin will be swapped with the last known working nightly and an issue will be raised with them.
  4. The latest nightly artifact of OS and its plugins are deployed to the staging environment and any custom patches are applied; I don't have thoughts on any right now.
    1. If OSD fails to start due to an OS incompatibility, the last known working nightly of OS will be spun up and OSD will point to it,
    2. If OSD fails to start again, the problem is with OSD; an issue will be raised on OSD to fix the problem and the nightly will be marked broken. The last known working OSD will be used instead with the latest nightly of OS.
    3. If OSD does start up, the problem is with OS; an issue will be raised on OS to fix the problem and the nightly will be marked broken.
    4. If OS fails to start due to a plugin, yada yada yada!
  5. The latest working artifacts from the above steps are deployed to playground; nightly OUI docs are deployed to OUI's website.
  6. Custom data is populated to showcase all of the capabilities.

We might also want to keep the previous night's deployments active on a different port or fleet to be able to quickly switch if we find something horribly wrong in the morning.

PS, looking at these, you might feel custom scripts would be easier to build than a cdk; if you do, you are not alone :D

@gaiksaya
Copy link
Member

Moving this issue to opensearch-devops repository as we are planning to host the codebase there.
Thanks!

@gaiksaya gaiksaya transferred this issue from opensearch-project/opensearch-build Oct 28, 2023
@github-actions github-actions bot added the untriaged Issues that have not yet been triaged label Oct 28, 2023
@gaiksaya gaiksaya removed the untriaged Issues that have not yet been triaged label Oct 28, 2023
@gaiksaya
Copy link
Member

Please see the high level design posted here: #130
Thanks! Will be modifying the description of this issue into smaller issues.
Thanks!

@gaiksaya
Copy link
Member

Closing this issue as nightly playgrounds have been successfully working for last 2-3 releases https://playground.nightly.opensearch.org/
There are upcoming enhancements such as #153 which will be followed up in mentioned issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants