Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LVM support on top of another PV #227

Open
stevefan1999-personal opened this issue May 5, 2023 · 15 comments
Open

LVM support on top of another PV #227

stevefan1999-personal opened this issue May 5, 2023 · 15 comments
Assignees
Labels
Backlog question Further information is requested
Milestone

Comments

@stevefan1999-personal
Copy link

Describe the problem/challenge you have

There is currently no way to deploy LVM on top of another persistent volume.

Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Let us compose the LVM on top of other PV

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

I want to use Oracle Cloud's block storage driver to create a 200GB persistent volume that is a Block Storage resource in Oracle Cloud, which can be reattached to other nodes but only one can access at a time (in other word, ReadWriteOnce), so I can do node migration if things gone wrong.

I've considered using local-pv before as this is one of the supported features, but I need to have thin provision and quota support, which snapshot being a nice add-in.

@abhilashshetty04 abhilashshetty04 self-assigned this Aug 23, 2023
@abhilashshetty04
Copy link
Contributor

abhilashshetty04 commented Aug 23, 2023

@stevefan1999-personal , Thanks for raising the issue . Could you provide some more context on the issue. Are you not able to use Oracle block volume as PV for lvm?

@stevefan1999-personal
Copy link
Author

stevefan1999-personal commented Aug 23, 2023

@abhilashshetty04 Yes, and no. I can create a block volume on OCI, but with a minimum of 50GB per block volume. As I have 200GB free quota, this only means I can only have 4 block volumes. I clearly needed more than that. Since I would be running on a single node solely, having LVM access on top of a PVC attached from OCI is the most suitable, but I just didn't see the options here likeso.

I remember the OCI PVC can be freely migrated to different VMs using iSCSI. So basically we don't need NFS for that. This technique can be ported to other cloud platforms as a competitor to Rook/Ceph if we can support LVM on top of another block based PVC. I do understand the pros and cons of Ceph (as I should be using RBD as an alternative here)

@abhilashshetty04
Copy link
Contributor

@stevefan1999-personal , LV that gets created is tied to a particular LVMNode. Did i understand correctly. Are you trying to move PV to a different LVMNode as Oracle allows it? If yes, Did it work. Even though i believe pod utilizing lv will lose lv access right?

@stevefan1999-personal
Copy link
Author

@abhilashshetty04 Yes. Because iSCSI let's you move the volume to other nodes in case one is down. This is done behind the scene with Oracle's block volume provisioner. I want to preserve this behavior so that I don't need intervention when one of the node suddenly down like due to overloading

@abhilashshetty04
Copy link
Contributor

abhilashshetty04 commented Aug 30, 2023

@stevefan1999-personal , With this functionality. Suppose PV attached to lvmnode1. If lvmnode1 goes down somehow, iSCSCI volume backed by PV say for example gets mounted on lvmnode2. Wont you have to create PV with that volume manually? Or is mounted as read-only by other nodes all the time (was this your ask when you said ReadWriteOnce)?

With this also k8s should be aware which node in the cluster has acquired access to the volume? pod needs to be be scheduled to the correct node.

@stevefan1999-personal
Copy link
Author

stevefan1999-personal commented Aug 30, 2023

@stevefan1999-personal , With this functionality. Suppose PV attached to lvmnode1. If lvmnode1 goes down somehow, iSCSCI volume backed by PV say for example gets mounted on lvmnode2. Wont you have to create PV with that volume manually? Or is mounted as read-only by other nodes all the time (was this your ask when you said ReadWriteOnce)?

With this also k8s should be aware which node in the cluster has acquired access to the volume? pod needs to be be scheduled to the correct node.

Let's also call the LVM PV to be lvm-backstore, and the LocalPV created on top of lvm-backstore to be virtual-lvm

Consider that when lvmnode1 goes down, the pods are supposed to be migrated by Kubernetes scheduler too. Then the lock on lvm-backstore would be release, and other nodes can take it.

Of course, this comes with the downside that all the pods would have to be migrated to all the pods of that specific node hosting the LVM PV.

So, if lvmnode1 goes down, and lvmnode2 acquired lvm-backstore, all the pods that referenced virtual-lvm would have to run on lvmnode2 from then on since virtual-lvm referenced lvm-backstore which is bound to lvmnode2. There may also be some issue regarding metadata flushing, but that would be handled by user themselves.

All the volumes under this special setup should be ReadWriteOnce. It's like local-path-provisioner but migratable. lvm-backstore must have a volumeMode of Block, since you wouldn't want to make a LVM on top of files

@abhilashshetty04
Copy link
Contributor

@stevefan1999-personal Thanks for explaining. This seems like shared vg required. Let me know if i am wrong. I still have some questions:

  1. Is lvm-backstore is going to be a PV object from lvm perspective that is hosted externally and mounted by all lvm node?
  2. You have not mentioned VG in the usecase. Is it going to be accessible (shared-vg created on top lvm-backstore) by all nodes if my previous assumption is correct?

FYI we had tried shared-vg some time back. But due to some hurdles we had to shelf the PR..

@stevefan1999-personal
Copy link
Author

@stevefan1999-personal Thanks for explaining. This seems like shared vg required. Let me know if i am wrong. I still have some questions:

  1. Is lvm-backstore is going to be a PV object from lvm perspective that is hosted externally and mounted by all lvm node?
  2. You have not mentioned VG in the usecase. Is it going to be accessible (shared-vg created on top lvm-backstore) by all nodes if my previous assumption is correct?

FYI we had tried shared-vg some time back. But due to some hurdles we had to shelf the PR..

lvm-backstore is going to be a PersistentVolume of any kind. It is most likely be provisioned by any storage controller that provisioned block volume type (so for example, local-static-provisioner), although it is not strictly required that PersistentVolume needs to be provisioned by a storage controller. You can, in fact, make a block volume that reference the local disk path yourself without any trouble. I did this two years ago.

We should start concerning about LVM Physical Volume first. Essentially, the end goal of this feature request is that we can treat a Kubernetes PV as a LVM PV. A Kubernetes PV is supposed to be distributed or be marked for certain nodes for access, so while it technically requires all valid nodes to have access at any time, only one node can have exclusive access at one time for that specific PV, due to the nature of LVM not allow concurrent access so we need some exclusivity lock here.

For Logical Volume and Volume Group this is actually out of scope, but I think it will have to be tackled eventually.

@abhilashshetty04
Copy link
Contributor

Hi @stevefan1999-personal , Apologies for the deplayed response. We have made some producr restructuring. Now if you notice lvm and zfs local-pv engines are budled with Mayastor platform. Although all have its own provisioner and components.

Coming back to your requirement. I still dont get your point about having LVM PV reference a local diskpath for your use case. How that can be accessible from some other node in cluster.

In case device backing PV is on remote storage device wrt all cluster members. Then LVM has a shared-vg feature which uses local manager like sanlock or dlm for co-ordinating access to LV on shared-vg. Does this make sense?

@stevefan1999-personal
Copy link
Author

stevefan1999-personal commented May 23, 2024

Hi @stevefan1999-personal , Apologies for the deplayed response. We have made some producr restructuring. Now if you notice lvm and zfs local-pv engines are budled with Mayastor platform. Although all have its own provisioner and components.

Coming back to your requirement. I still dont get your point about having LVM PV reference a local diskpath for your use case. How that can be accessible from some other node in cluster.

In case device backing PV is on remote storage device wrt all cluster members. Then LVM has a shared-vg feature which uses local manager like sanlock or dlm for co-ordinating access to LV on shared-vg. Does this make sense?

My use case is to have remote attachment of LVM node, that some K8S storage provisioners uses iSCSI as remote mounting source, that can be attached to other nodes at any time, which is currently how Oracle Cloud handles block storage. That means although the block is local and exclusive to one specific node at a time, it can still be remounted on other nodes at any given time for quick recovery, given the exclusive lock is unlocked or expired for any reasons.

This would be a very useful feature since we can bypass the network layer such as GlusterFS/NFS/Ceph because the underlying block storage is already virtualized though host provided network. LVM + iSCSI is a validated solution for storage virtualization and I think we can do this on K8S too.

That said, I think the idea can be more general and apply to persistent volume as a whole as well. Other cloud providers such as Azure/AWS and GKE would also benefit from this especially with regards to their block storage option. Otherwise, the best choice for me now is just use Rook/Ceph which does support this kind of PVC layering use case.

@abhilashshetty04
Copy link
Contributor

The solution you want still has a single point of failure right? What if node hosting the PV accessible remotely goes down? HCI storage engines should replicate volumes for high availability. Lvm localpv was designed in a way where it uses native LVM capabilities. Keeping storage object and consumer local was a driving force of the development. We have not planned inter node storage access as of yet.

If you want to have a storage solution where storage objects hosted on local device can be accessed by other cluster members, You can give mayastor a try. Mayastor is based on nvme, it replicates volume as replicas for redundancy. Supports Thin provisioning, Snapshop , Volume resize, Performance monitoring etc.

Please find more information about mayastor here.
https://openebs.io/docs#replicated-volumes

Let me know if you have more questions.

@dsharma-dc
Copy link
Contributor

dsharma-dc commented Jun 5, 2024

@stevefan1999-personal This project is specifically for the localPV use case. Hence there is no support for the remote mounting of lvm LVs. The Mayastor offering under openEBS supports that over Nvme. The default backend there is not LVM but SPDK based. However, we have very recently introduced the support for LVM based backend as well that you may want to check out and provide feedback, though it doesn't support all the features there yet.

@dsharma-dc dsharma-dc added the wontfix This will not be worked on label Jun 5, 2024
@avishnu
Copy link
Member

avishnu commented Sep 19, 2024

@stevefan1999-personal just for clarification, what you'd like localpv-lvm driver is to be capable of detecting that the underlying block device (LVM PV) has moved to another node, and make the localpv volumes accessible once again?

@avishnu avishnu added Backlog and removed wontfix This will not be worked on labels Sep 19, 2024
@avishnu avishnu added this to the v4.3 milestone Sep 19, 2024
@avishnu avishnu added the question Further information is requested label Sep 19, 2024
@mhkarimi1383
Copy link
Contributor

I think by accepting Block volumes and adding a job to prepare that volume (creating PVs/LVs)
We will be able to do that, but we have to make provisioner to be able to be deployement to cover more needs

@stevefan1999-personal
Copy link
Author

@stevefan1999-personal just for clarification, what you'd like localpv-lvm driver is to be capable of detecting that the underlying block device (LVM PV) has moved to another node, and make the localpv volumes accessible once again?

This is one possible scenario for addressing storage migration in a distributed system. For example, if your underlying storage is based on iSCSI (Internet Small Computer Systems Interface) or Ceph RBD (RADOS Block Device), you can migrate that to another node without significant downtime or data loss. I want a more general approach because I abstracted those distributed storage into the form of Persistent Volume, that is why LVM on top of another PV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backlog question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants