Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High availability for Sigstore services #814

Open
ndegory opened this issue Aug 16, 2024 · 2 comments
Open

High availability for Sigstore services #814

ndegory opened this issue Aug 16, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@ndegory
Copy link

ndegory commented Aug 16, 2024

Description

What needs to be improved

documentation on how to run the Sigstore service with high availability.

Context

The Helm charts in sigstore/helm-charts default to a single replica for all services. Some charts allow setting replicaCount for the main service, but there is no guidelines on how to make the service highly available.

As an example, the Rekor chart has a replicaCount for the rekor server, but the chart also deploys mysql and redis, without any options to run them on more than one replica. For these dependencies, it's more complicated than changing the replicaCount in the deployment or statefulset. Same can be said for Trillian, the mysql dependency doesn't allow an HA configuration.

What should be done

  • Updating the documentation with the current options to raise the replica counts (is it reliable, or should we leave it to 1)
  • Enhance the Helm charts by allowing to set HA configuration on dependencies (mysql, redis)
@ndegory ndegory added the enhancement New feature or request label Aug 16, 2024
@vipulagarwal
Copy link
Contributor

vipulagarwal commented Aug 16, 2024

Hi @ndegory, thanks for opening this issue. Yes, AFAIK Sigstore does not have documentation on scalability aspects of any component. @ianhundere has done some great work on adding tolerations, nodeSelector and affinity to all the helm charts which helps with scalability and highly available setup.

The following components are stateless and easy to scale using replicaCount and affinity/anti-affinity settings:

  • Fulcio
  • CTLog
  • Rekor
  • Trillian LogServer

For Trillian LogSigner, we need to setup etcd for leader election and this information is available in detail here

The Trillian helm chart (rekor's dependency) supports spinning up a MySQL instance automatically but I do believe the intention was to make Sigstore up and running for testing and development as easy as possible. The burden of scaling MySQL is on the user and most of the production setups make use of hosted MySQL on public clouds. A similar thought applies to Redis for Rekor.

We could do better by leveraging existing popular MySQL and Redis helm charts and make them as a dependency within Sigstore helm-charts and make them directly available to private Sigstore users operators. We can also add features like HPA that helps high availability.

@ndegory
Copy link
Author

ndegory commented Aug 16, 2024

Thank you @vipulagarwal for the overview of the HA options. I don't think it would be worth to add more dependencies, I'll give a try disabling mysql and redis in the values file and point to a MySQL cloud service and a HA deployment of Redis.
If that works, I'll submit a documentation PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants