Skip to content

Use Longhorn for storage #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions docs/hardware_ready/ADRs/Longhorn_as_storage_solution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
title: "Longhorn_as_storage_solution"
date: "2025-03-18"
---


| status: | date: | decision-makers: |
| --- | --- | --- |
| approved | 2025-03-18 | Sofus Albertsen |


## Context and Problem Statement

### Do i need it?

Even though Kubernetes (and their Container Storage Interface) is production ready for persistency and statefull workloads, keeping your cluster stateless have several advantages:

* Dead simple disaster recovery (and duplication) of your cluster: everything is defined as code, so (re)creating a cluster is as easy as running your setupscripts once more and wait for everything to come up.
* Backup and restore has never been simple, and Kubernetes is not solving this for you.

At it's core, we want stateless [cloud native applications](https://kodekloud.com/blog/cloud-native-principles-explained/).
Remember the distinction between the need for persistency and ephemeral storage; your chaching service needs ephemeral storage, but does not need backup/restore of said data. Those are perfectly fitting for Kubernetes.

Often time your Database is sitting on some special hardware, and is catered to by specialized competences.
Keep it that way, and connect to the database from your cluster.

### What are the criterias for choosing this?

* **Performance Requirements:** What are the expected Input/Output Operations Per Second (IOPS) that your applications will demand? What about throughput?
* **Scalability Needs:** Are you able to add storage seamlessly after the initial creation?
* **Data Availability and Durability:** What are you requrements to replica's, time to recover etc.
* **Team Expertise and Comfort Level:** What is your team's existing knowledge and experience with the specific storage solutions you are considering?

### What are our weights for making a choice?

While all criterias are important, choosing one persistency tool over the other can have vastly different requirements to the expertise of the team.

Therefore, the primary weight is; if you have a storage solution that already supports Kubernetes with a CSI driver, evaluate that one before anything else.

Secondary, it is the complexity introduced that we will focus on.


## Considered Options

* **Longhorn:** A lightweight, reliable, and easy-to-use distributed block storage system for Kubernetes. It's built by Rancher (now SUSE). Key features include built-in snapshots, backups, replication, and a user-friendly GUI. It's designed specifically *for* Kubernetes and integrates deeply.

* **Rook Ceph:** Rook is a storage *operator* for Kubernetes, and Ceph is a highly scalable, distributed storage system offering object, block, and file storage. Rook automates deployment, management, and scaling of Ceph within Kubernetes. This combination is powerful but complex.

* **OpenEBS:** A containerized storage solution that provides persistent block storage for Kubernetes applications. It offers several different "storage engines" (cStor, Jiva, Local PV, Mayastor), each with different performance and feature characteristics. Offers flexibility but can require careful selection of the right engine.

* **Portworx:** A commercial (paid) storage platform designed for Kubernetes. It offers high performance, high availability, and advanced features like data encryption, storage-level snapshots, and automated scaling. It's a mature and feature-rich solution, but comes with licensing costs.


## Decision Outcome


Chosen option: **Longhorn**, because it provides a good balance of features, ease of use, and integration with Kubernetes, while minimizing the complexity overhead for our team.

It's a strong, open-source option that aligns well with our focus on simplicity.
It meets our needs for persistent storage within the cluster without introducing the operational overhead of a more complex solution like Rook/Ceph.


Based on different scenarios, the following general recommendations can be made:

- For organizations that require massive scalability, support for block, object, and file storage within a unified system, and have a team with the necessary expertise to manage a complex distributed storage platform, Ceph/Rook is a powerful option.
- For users who are looking for a straightforward and reliable distributed block storage solution that is easy to deploy and manage within Kubernetes, especially for smaller to medium-sized environments, Longhorn is an excellent choice.
- If you lack the required skillset all together, and need a high-performance, feature-rich, and commercially supported solution, and where the licensing costs are justified then Portworx is our recommendation.


### Consequences

* Good, because it's relatively easy to deploy and manage, leading to lower operational overhead and faster time to value. It has good community support and active development.
* Good, because Longhorn's performance is generally good for typical workloads, meeting our initial performance requirements.

* Bad, because Longhorn is primarily focused on block storage. If we need robust support for shared filesystems (ReadWriteMany access mode with full POSIX compliance) *within* the cluster, we might need to wait untill newer versions of the tools supports this (see their [roadmap for more information](https://github.com/longhorn/longhorn/wiki/Roadmap#longhorn-v111-january-2026))
* Bad, because, although Longhorn has a growing community, it's not as mature as Ceph. While this is less of a direct "consequence" and more of a relative comparison, it's worth keeping in mind for long-term planning.
2 changes: 1 addition & 1 deletion docs/hardware_ready/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ In case virtualisation is chosen, the below recommendations are what you would r
| Problem domain | Description | Reason for importance | Tool recommendation |
|:---:|:---:|:---:|:---:|
| Kubernetes Node Operating System | The Operating System running on each of the hosts that will be part of your Kubernetes cluster | Choosing the right OS will be the foundation for building a production-grade Kubernetes cluster | [Talos OS](hardware_ready/ADRs/talos_as_os.md) |
| Storage solution | The underlying storage capabilities which Kubernetes will leverage to provide persistence for stateful workloads | Choosing the right storage solution for your clusters needs is important as there is a lot of balance tradeoffs associated with it, e.g redundancy vs. complexity | |
| Storage solution | The underlying storage capabilities which Kubernetes will leverage to provide persistence for stateful workloads | Choosing the right storage solution for your clusters needs is important as there is a lot of balance tradeoffs associated with it, e.g redundancy vs. complexity | [Longhorn](Longhorn_as_storage_solution.md) |
| Container Runtime (CRI) | The software that is responsible for running containers | You need a working container runtime on each node in your cluster, so that the kubelet can launch pods and their containers | |
| Network plugin (CNI) | Plugin used for cluster networking | A CNI plugin is required to implement the Kubernetes network model | [Cilium](Cilium_as_network_plugin.md) |