Volume Snapshot in Kubernetes - Snapshot API Object discussion #44172

jingxu97 · 2017-04-06T18:40:30Z

Inspired by @MikaelCluseau's suggestion, The following is a list of discussions related to Snapshot API object design.

Use new API objects for snapshot or not
I think having new API objects to represent snapshots will be very convenient to support different operations on snapshots. The possible operations could be create snapshot from volume, create new volume or restore volume from snapshot, list and delete snapshots, show the status (available, pending etc) and detailed information (creation time, size) of a snapshot. So it would gives us more flexibility compared to just tagging the snapshot information into a PV.
How snapshot is created
Although it is possible to use just PVC to initiate a snapshot creation, I think using a new API object (SnapshotRequest) to represent snapshot creation gives much more flexibility and support more options.
a). It could support both one-time snapshot creation or scheduled snapshot with a lot more snapshot policy options. For example, how many snapshots could be taken periodically for a volume. A snapshot should be deleted after a certain amount of time to save space. How much storage space could be used by snapshots etc.

b). It could support to create snapshots for a list of volumes. For system admins, they might want to create snapshots for all volumes in the system overnight for backup purpose. Instead of specifying only one snapshot per PVC, they could use wild card to list all volumes in the snapshot request.

c). It could support to create snapshot for pod, statefulset, deployments etc. Users might not have direct information about the PVs for their pods. In snapshot request, the sources to create snapshots could be PVs, or pod, statefulset and kubernetes can retrieve the volumes information from the pod objects and take snapshots for the user.
How to create a new volume and restore a volume from a snapshot.
Currently kubernetes support dynamic provisioning to create a new volume for a PVC. To create a new volume from a snapshot is similar to this process. I am thinking again to use SnapshotRequest to perform this task by giving the snapshot identifier and some keyword to indicate to create a new volume or restore the existing volume. If a new volume is created, a new PV object will be created to represent this volume. In case of restore, the existing PV still represents the reverted volume. I think this way is cleaner than mixing this function into PVC which requires modifying the PVC object.
Namespace or not.
Similar to PVC and PV, we could make SnapshotRequest as namespaced, and snapshots object as non-namespaced so that user/developer can use request.

@MikaelCluseau's Example

kind: SnapshotRequest
metadata:
  namespace: production
  name: snap_app1_20170406_010000
spec:
  persistentVolumeClaims:  #Could be a list of PV, pods, or snapshot id (create volume from a snapshot)
  - data.app-1-db-0
  - data.app-1-db-1
  - data.app-1-mq-0
  - data.app-1-mq-1
  - data.app-1-filestore-0

kind: Snapshot
metadata:
  name: production-pv-for-claim-data.app-1-db-0@snap_app1_20170406_010000
spec:
  volume: production-pv-for-claim-data.app-1-db-0
  requestRef: #the ref could be other type too
    namespace: production
    name: snap_app1_20170406_010000

Any comments and suggestions for this would be appreciated!

The text was updated successfully, but these errors were encountered:

jingxu97 · 2017-04-06T18:45:37Z

cc @kubernetes/sig-storage-api-reviews @kubernetes/sig-storage-feature-requests @kubernetes/sig-storage-misc

wattsteve · 2017-04-06T18:54:58Z

cc @tsmetana

childsb · 2017-04-06T19:11:41Z

Just to be clear, this is a two object split.. correct? There's the "SnapShot" object which has no namespace.. then a "SnapShotRequest" which is a namespaced object..

As we add more imperative actions to PVs, should we have a 'generic' request object that would allow Snapshot, Replication, Resize, Restore, etc...

rootfs · 2017-04-06T19:17:07Z

And would SnapshotRequest also point to the Snapshot once snapshot succeeds?

mcluseau · 2017-04-06T19:38:16Z

@childsb I don't think having a generic request object is something we want to introduce now (and I suspect we won't need it because nearly everything will be specific to each case).

@rootfs I think I even made a mistake by adding the requestRef field in Snapshot; we should do exactly like the ReplicaSet/Pod relationship to have a consistent UX (and developer experience too ;)). We already have ObjectMeta.ownerReferences that even states something interesting:

If ALL objects in the list have been deleted, this object will be garbage collected. [...]

So owners will be the SnapshotRequest as the controller, and any volume created from the snapshot. The only thing we miss in this is the source of the Snapshot in the request (which pod's volume / PVC). I suspect this is somehow indexed by Kubernetes for already existing needs (Deployment creating ReplicaSets creating Pods).

mcluseau · 2017-04-06T19:41:06Z

And maybe rename SnapshotRequest to SnapshotSet ?

childsb · 2017-04-06T20:28:37Z

@MikaelCluseau a generic PVR (persistent volume request)... could be similar to PV where there's a set of common fields shared and specific fields that are specific to the actionable command. We could still keep the strong typing of fields for each request type.

Examples:

apiVersion: v1
kind: PersistentVolumeRequest
metadata:
  name: snapshot1
spec:
  snapshot:
    pvc:pvc1

apiVersion: v1
kind: PersistentVolumeRequest
metadata:
  name: action-create
spec:
  createVolumeFromSnapshot:
    snapshot:snapshot1

apiVersion: v1
kind: PersistentVolumeRequest
metadata:
  name: action5
spec:
  resizePVC:
    pvc:pvc1
    newSize:25Gi

apiVersion: v1
kind: PersistentVolumeRequest
metadata:
  name: action6
spec:
  restore:
    snapshot:snapshot1

jingxu97 · 2017-04-06T22:04:00Z

@childsb your proposal of a generic request is very interesting. But I have a concern about the complexity of the controller. Now with generic request, we will have many different operations from the request, we need to think through how such controller works.

mcluseau · 2017-04-06T23:39:35Z

@childsb to me it's not similar; a PV is an actual thing in the system (a mount point) that the user wants to manipulate. It just happens that this thing has a relatively complex setup because of the diversity of source and the lack of standardization in the storage industry. I think we can all agree that the PV object would be much better if we could simply have expressed the source as a path in the storage system instead of a different thing for each storage tech out there.

Here, the actual thing we manipulate is a snapshot (hence the mistake in the name SnapshotRequest, I should definitely have said SnapshotSet). In terms of consistent UX, we have StatefulSets and Deployments, not a generic object even though they share a common generic logic that's materialized as a ReplicaSet.

So, IMHO, if we need something generic at some point, I really think it shouldn't be a first class object.

jsafrane · 2017-04-07T10:19:48Z

@childsb, this is not how Kubernetes works. It's intent based. https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#types-kinds

Objects represent a persistent entity in the system.
Creating an API object is a record of intent - once created, the system will work to ensure that resource exists.

PersistentVolumeRequest is not an persistent entity in the system, it's command. To create a snapshot user should create an object representing the snapshot and Kubernetes will create it. To resize a volume, user modifies the volume and Kubernetes will try to resize it. It's complicated by security aspects, we don't show PVs to users, but that's why there is a claim. Users should resize claims and Kubernetes would try to make it happen in real objects outside of Kubernetes. Similarly with snapshots.

jsafrane · 2017-04-07T10:34:28Z

@jingxu97, IMO there needs to be an object that represents single snapshot and is in user namespace so the user can see what's its status and we can have quota around it. I don't have a good name for it, I'm going to use SnapshotClaim in this comment, but I know it's ugly (user is not claiming a snapshot, he's creating it...)

Any object that creates multiple snapshots, say SnapshotSet, should be IMO a separate Kind. It will spawn multiple SnapshotClaims in user namespace so the user can see what snapshots are available, which of them succeeded/failed and then the user can restore or delete them individually.

If there should be a Snapshot object in the global namespace is another discussion, I have nothing against it, however these should not be visible to users, similarly as PVs are not visible to users because of security. All regular operations with snapshots must be done by a user creating/modifying/deleting objects in user's namespace.

mcluseau · 2017-04-07T10:55:58Z

SnapshotClaim in this comment, but I know it's ugly (user is not claiming a snapshot, he's creating it...)

It's not that ugly, with dynamic provisionners, PVClaims are actually creating PVs.

jsafrane · 2017-04-07T11:37:06Z

SnapshotClaim in this comment, but I know it's ugly (user is not claiming a snapshot, he's creating it...)

It's not that ugly, with dynamic provisionners, PVClaims are actually creating PVs.

Important is the intent - with PVC, user wants to claim an existing PV. A new one is created only when there is nothing to claim. With SnapshotClaim, user probably does not want to claim an existing snapshot, he really wants to create a new one.

tsmetana · 2017-04-07T12:53:16Z

I actually like the idea of SnapshotRequest to create a Snapshot (object). The one thing I dislike in this proposal is the snapshotting of multiple volumes at once: the result would be multiple objects outside the cluster anyway so having them represented with separated object looks much more intuitive to me. It is also more flexible. I'd rather use some simple mechanism for marking snapshots belonging to one group.

mcluseau · 2017-04-07T13:02:10Z

The important part of "at once" is "consistent" ;-)

jsafrane · 2017-04-07T13:18:53Z

The important part of "at once" is "consistent" ;-)

We discussed this use case and we're trying to avoid it in the first version as it needs support in the application. How else do you ensure consistent snapshots of 10 different EBS volumes running say Cassandra?

mcluseau · 2017-04-07T13:47:01Z

I have no problem with pushing the "multiple snapshots at once" outside the scope of a first version.
To answer your question, without app-support, I think pretty good quality can be achieved just with fsfreeze, as any quorum-based CP system will stop acknowledging writes when the majority is frozen (and for AP systems you have already chosen to accept some loss). I'm not going to have an implementation here, but something along kubelets synchronizing themselves with leases could be a workable path (for instance).

tsmetana · 2017-04-07T14:13:20Z

fsfreeze is supported only on ext3/4, xfs, reiser and jfs... The proposal should be filesystem-agnostic and as such I don't think we should take responsibility of the the data consistency at all.

jsafrane · 2017-04-07T14:17:02Z

In addition, fsfreeze ensures just filesystem consistency, not consistency of the database stored on the filesystem. It may be corrupted.

Anyway, it's fine to have a high-level object like SnapshotSet and we can argue how to make it consistent later, but we need also an object that represents single snapshot in user namespace. The user needs to see what snapshots are available, their states and he/she needs a way how to restore / delete individual snapshots and not whole set. Something like StatefulSet and Pods - both are in user namespace and user can monitor them individually.

mcluseau · 2017-04-07T14:42:01Z

In addition, fsfreeze ensures just filesystem consistency, not consistency of the database stored on the filesystem. It may be corrupted.

Let's forget SnapshotSet; this applies to a single snapshot too. We won't fix users wishing filesystems without fsfreeze or equivalent support, but not using fsfreeze when possible (99% of the cases?) will cause much more damage. We won't fix in-app cache flushing generically either. Obviously, we'll have to document clearly what we do, and also do the best we can.

jingxu97 · 2017-04-07T17:10:44Z

Thank you everyone for the insightful feedback and discussion. I think it would be good to discuss this in F2F meeting. This snapshot API design is critical to our snapshot feature at the very first stage. That is why I want to initiate this discussion first here and gather feedback so that we could make a conclusion faster hopefully at the meeting. I will try to summarize the discussion here and update the proposal. Please continue to send your thoughts and comments about the design. Thank you!

…

On Fri, Apr 7, 2017 at 7:42 AM, Mikaël Cluseau ***@***.***> wrote: In addition, fsfreeze ensures just filesystem consistency, not consistency of the database stored on the filesystem. It may be corrupted. Let's forget SnapshotSet; this applies to a single snapshot too. We won't fix users wishing filesystems without fsfreeze or equivalent support, but not using fsfreeze when possible (99% of the cases?) will cause much more damage. We won't fix in-app cache flushing generically either. Obviously, we'll have to document clearly what we do, and also do the best we can. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#44172 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASSNxYhD3E4SIGt94iNNcNKjQRrD1vaiks5rtktdgaJpZM4M2Ae9> .

-- - Jing

jingxu97 · 2017-04-25T07:25:05Z

There are a few questions have been discussed related to snapshot API design including whether the API object should be in user namespace or not, whether there should be two objects such as Request and snapshot (also snapshotSet). The following is a summary of the points.

Namespace discussion

Unlike PVs that are created by admins only, snapshots could be initiated by users or admins. Clearly a snapshot object is used to show the detailed information and status about the snapshot. If user creates the snapshots, the objects should be in user namespace so that user can query/list them. Admins might also create snapshots for the PVs available in the system for data backup purpose. Those snapshots are not belong to a specific user namespace and might be used by any user.
How snapshots are used is quite unique. User might want to create or restore volumes from the snapshots they have taken. It is also possible that they want to use the snapshots taken by the admins. Also considering data replication and some other scenarios, snapshots taken from a namespace might be needed in another namespace.

Now we propose a two-object API design. Similar to PVC/PV, snapshot has two related objects, VolumeSnapshotRequest and VolumeSnapshot.

VolumeSnapshotRequest (VSR) is a request for snapshot by a user. Similar to PVC, VSR is a usernamespaced object, which provides a way to request or consume snapshot resources.
VolumeSnapshot (VS) is a representation of the snapshot taken from a volume. This API object captures the details of the implementation of the snapshot. Similar to PV, VS is a non-namespaced object. There are two ways to create VS: statically or dynamically.
Static:
If there are existing snapshots created outside of kubernetes, an administrator can create VSes to represent details of the snapshots which are available for use by cluster users. Users create VSRs in their namespaces and point them to the corresponding VSes. VSR becomes a handler to access those snapshots by a user.
Dynamic
User could also user VSR to create snapshots from a volume on-demand. VSR specifies which volume the snapshot should be taken from. In such case, a corresponding VS is created. VSR and VS will have pointers to each other.

Many-to-one mapping between VSR and VS
There could be use cases that snapshots are needed across different namespaces, e.g. for data replication. To achieve this, here we propose to support many-to-one mapping relationship between VSR and VS. There is a field “allowedNamespaces” in VS/VSR object to specify in which namespaces the VSRs could be created for the snapshot. For static creation, system admins can add the list of namespaces. In dynamic creation, user specifies the namespace list which will be propagated to the VS.
If a user want to access VS from a different namespace, he/she could create a VSR to reference the VS from the namespace as long as it is in the “allowedNamespaces” list. In this way, VSR and VS is many-to-one mapping.

Differences between PVC/PV and VSR/VS
Binding between PVC/PV requires atomic transaction to avoid race condition causing multiple PVCs bind to the same PV. PVC and PV is one-to-one mapping so that we have to make sure only one PVC could bind to the PV. However, snapshots does not have this requirement so that the binding between VSR and VS does not need to be atomic.
Since VSR/VS is many-to-one mapping, delete a VSR is not necessary trigger a deletion for VS unless this VSR is the last reference to it.

smarterclayton · 2017-04-25T18:16:15Z

Can you discuss why we need more than a single object? Argument 1 (that admins want to do their own backups) isn't enough of a reason to have two objects.

…

On Tue, Apr 25, 2017 at 3:25 AM, Jing Xu ***@***.***> wrote: There are a few questions have been discussed related to snapshot API design including whether the API object should be in user namespace or not, whether there should be two objects such as Request and snapshot (also snapshotSet). The following is a summary of the points. *Namespace discussion* 1. Unlike PVs that are created by admins only, snapshots could be initiated by users or admins. Clearly a snapshot object is used to show the detailed information and status about the snapshot. If user creates the snapshots, the objects should be in user namespace so that user can query/list them. Admins might also create snapshots for the PVs available in the system for data backup purpose. Those snapshots are not belong to a specific user namespace and might be used by any user. 2. How snapshots are used is quite unique. User might want to create or restore volumes from the snapshots they have taken. It is also possible that they want to use the snapshots taken by the admins. Also considering data replication and some other scenarios, snapshots taken from a namespace might be needed in another namespace. [image: volume snapshot in kubernetes] <https://cloud.githubusercontent.com/assets/19172805/25373099/198846f4-294d-11e7-9185-577d102a4a72.jpg> Now we propose a two-object API design. Similar to PVC/PV, snapshot has two related objects, VolumeSnapshotRequest and VolumeSnapshot. 1. VolumeSnapshotRequest (VSR) is a request for snapshot by a user. Similar to PVC, VSR is a usernamespaced object, which provides a way to request or consume snapshot resources. 2. VolumeSnapshot (VS) is a representation of the snapshot taken from a volume. This API object captures the details of the implementation of the snapshot. Similar to PV, VS is a non-namespaced object. There are two ways to create VS: statically or dynamically. *Static:* If there are existing snapshots created outside of kubernetes, an administrator can create VSes to represent details of the snapshots which are available for use by cluster users. Users create VSRs in their namespaces and point them to the corresponding VSes. VSR becomes a handler to access those snapshots by a user. *Dynamic* User could also user VSR to create snapshots from a volume on-demand. VSR specifies which volume the snapshot should be taken from. In such case, a corresponding VS is created. VSR and VS will have pointers to each other. *Many-to-one mapping between VSR and VS* There could be use cases that snapshots are needed across different namespaces, e.g. for data replication. To achieve this, here we propose to support many-to-one mapping relationship between VSR and VS. There is a field “allowedNamespaces” in VS/VSR object to specify in which namespaces the VSRs could be created for the snapshot. For static creation, system admins can add the list of namespaces. In dynamic creation, user specifies the namespace list which will be propagated to the VS. If a user want to access VS from a different namespace, he/she could create a VSR to reference the VS from the namespace as long as it is in the “allowedNamespaces” list. In this way, VSR and VS is many-to-one mapping. *Differences between PVC/PV and VSR/VS* Binding between PVC/PV requires atomic transaction to avoid race condition causing multiple PVCs bind to the same PV. PVC and PV is one-to-one mapping so that we have to make sure only one PVC could bind to the PV. However, snapshots does not have this requirement so that the binding between VSR and VS does not need to be atomic. Since VSR/VS is many-to-one mapping, delete a VSR is not necessary trigger a deletion for VS unless this VSR is the last reference to it. — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#44172 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p2j28fQaB1whn2r30VyA67IYLuc_ks5rzZ_hgaJpZM4M2Ae9> .

smarterclayton · 2017-04-25T18:17:54Z

For instance - every user requesting a snapshot is requesting that independently. Unless the user sets preconditions, the system can very well coalesce multiple snapshot requests at the same time. The snapshot should be suitable for creating new PVCs from via dynamic provisioning. But I still don't see the requirement for two objects. On Tue, Apr 25, 2017 at 2:16 PM, Clayton Coleman <ccoleman@redhat.com> wrote:

…

Can you discuss why we need more than a single object? Argument 1 (that admins want to do their own backups) isn't enough of a reason to have two objects. On Tue, Apr 25, 2017 at 3:25 AM, Jing Xu ***@***.***> wrote: > There are a few questions have been discussed related to snapshot API > design including whether the API object should be in user namespace or not, > whether there should be two objects such as Request and snapshot (also > snapshotSet). The following is a summary of the points. > > *Namespace discussion* > > 1. Unlike PVs that are created by admins only, snapshots could be > initiated by users or admins. Clearly a snapshot object is used to show the > detailed information and status about the snapshot. If user creates the > snapshots, the objects should be in user namespace so that user can > query/list them. Admins might also create snapshots for the PVs available > in the system for data backup purpose. Those snapshots are not belong to a > specific user namespace and might be used by any user. > 2. How snapshots are used is quite unique. User might want to create > or restore volumes from the snapshots they have taken. It is also possible > that they want to use the snapshots taken by the admins. Also considering > data replication and some other scenarios, snapshots taken from a namespace > might be needed in another namespace. > [image: volume snapshot in kubernetes] > <https://cloud.githubusercontent.com/assets/19172805/25373099/198846f4-294d-11e7-9185-577d102a4a72.jpg> > > Now we propose a two-object API design. Similar to PVC/PV, snapshot has > two related objects, VolumeSnapshotRequest and VolumeSnapshot. > > 1. VolumeSnapshotRequest (VSR) is a request for snapshot by a user. > Similar to PVC, VSR is a usernamespaced object, which provides a way to > request or consume snapshot resources. > 2. VolumeSnapshot (VS) is a representation of the snapshot taken from > a volume. This API object captures the details of the implementation of the > snapshot. Similar to PV, VS is a non-namespaced object. There are two ways > to create VS: statically or dynamically. > *Static:* > If there are existing snapshots created outside of kubernetes, an > administrator can create VSes to represent details of the snapshots which > are available for use by cluster users. Users create VSRs in their > namespaces and point them to the corresponding VSes. VSR becomes a handler > to access those snapshots by a user. > *Dynamic* > User could also user VSR to create snapshots from a volume on-demand. > VSR specifies which volume the snapshot should be taken from. In such case, > a corresponding VS is created. VSR and VS will have pointers to each other. > > *Many-to-one mapping between VSR and VS* > There could be use cases that snapshots are needed across different > namespaces, e.g. for data replication. To achieve this, here we propose to > support many-to-one mapping relationship between VSR and VS. There is a > field “allowedNamespaces” in VS/VSR object to specify in which namespaces > the VSRs could be created for the snapshot. For static creation, system > admins can add the list of namespaces. In dynamic creation, user specifies > the namespace list which will be propagated to the VS. > If a user want to access VS from a different namespace, he/she could > create a VSR to reference the VS from the namespace as long as it is in the > “allowedNamespaces” list. In this way, VSR and VS is many-to-one mapping. > > *Differences between PVC/PV and VSR/VS* > Binding between PVC/PV requires atomic transaction to avoid race > condition causing multiple PVCs bind to the same PV. PVC and PV is > one-to-one mapping so that we have to make sure only one PVC could bind to > the PV. However, snapshots does not have this requirement so that the > binding between VSR and VS does not need to be atomic. > Since VSR/VS is many-to-one mapping, delete a VSR is not necessary > trigger a deletion for VS unless this VSR is the last reference to it. > > — > You are receiving this because you are on a team that was mentioned. > Reply to this email directly, view it on GitHub > <#44172 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABG_p2j28fQaB1whn2r30VyA67IYLuc_ks5rzZ_hgaJpZM4M2Ae9> > . >

0xmichalis · 2017-04-26T08:45:42Z

I think snapshots for data replication will be very useful for StatefulSets and that means that data stays in the same namespace. I am not sure of the security implications of cross-namespace snapshots and the proposed mechanism needs to be vetted by @kubernetes/sig-auth-api-reviews folks.

I share Clayton's thoughts, why can't the flow simply include a single object and as a user I create such an object (volumeSnapshotRequest or snapshotRequest) and get back a PVC bound to the data I asked a snapshot for? Admin backups can still be PVs, no?

jingxu97 · 2017-04-26T19:29:30Z

Hi Michall, Could you please elaborate about "get back a PVC bound to the data I asked a snapshot for"? When user requests to create a snapshot (I assume the snapshot request specifies a PVC to indicate which volume the snapshot should be taken), does it mean it also create a volume and PVC bound this this volume, or PVC bound to this snapshot?

…

On Wed, Apr 26, 2017 at 1:46 AM, Michail Kargakis ***@***.***> wrote: I think snapshots for data replication will be very useful for StatefulSets and that means that data stays in the same namespace. I am not sure of the security implications of cross-namespace snapshots and the proposed mechanism needs to be vetted by @kubernetes/sig-auth-api-reviews <https://github.com/orgs/kubernetes/teams/sig-auth-api-reviews> folks. I share Clayton's thoughts, why can't the flow simply include a single object and as a user I create such an object (volumeSnapshotRequest or snapshotRequest) and get back a PVC bound to the data I asked a snapshot for? Admin backups can still be PVs, no? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#44172 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASSNxeu2ExFE7omC7l06dFdC7K7L48Aiks5rzwRcgaJpZM4M2Ae9> .

-- - Jing

0xmichalis · 2017-04-26T19:42:08Z

The outcome of the snapshotRequest is a new PV ("the data") bound to a new PVC. The new PVC is what I get back in the namespace I did the snapshoting.

eparis · 2017-04-27T18:27:24Z

SnapShots are NOT volumes. When dealing with a storage array once can create a snapshot of a volume. In many storage arrays a 'snapshot' could in fact just be a diff from some previous state. One can later create a volume from that snapshot. We plan to represent that similarly in kube.

Let me walk through a flow at a high level:

User creates a PVC which creates a PV
User uses this PVC like always, nothing interesting yet.
User creates a SnapShotRequest which points to a PVC
System triggers the snapshot creation action on the storage array. System also creates a SnapShot API object with data from the storage array which is bound to the SnapShotRequest.
At some later point a user might create a new PVC which points to the SnapShotRequest
System triggers the snapshot->volume action on the storage array. System also creates a new PV API object from the storage array result bound to the new PVC
User uses this new PVC like always, nothing interesting anymore.

I had a discussion with clayton and tend to agree. It is bikeshedding but important bikeshedding. Lets change the object names. Users should create something called a Snapshot. The system should create an object with a less human friendly name, maybe SnapshotRepresentation or something. We don't call it a PodRequest. We don't call it an IngressRequest. The user visible name should be Snapshot. I think we messed that up for PVs and think we should get it right here. I don't really care what we call the other half. But I do feel pretty strongly about the user facing name.

(In kube -v2 I think we should make it PV and VolumeRepresentation or something like that)

smarterclayton · 2017-04-27T22:49:39Z

@thockin and I had previously agreed that the "PersistentVolumeClaim" and "PersistentVolume" naming was incorrect - what an end user saw should be "PersistentVolume" (because that's how the user thinks of it), and the lower level object (what an admin sees) can be more technical, like "PersistentVolumeDefinition" or similar. So I'm ok with the two object model - but as Eric describes I really want to ensure we focus on the human focused name first (for the thing everyday users deal with), and the admin focused object can be more precise and technical and less human-friendly. If a user wants to take a snapshot, they create a Snapshot.

…

On Thu, Apr 27, 2017 at 2:27 PM, Eric Paris ***@***.***> wrote: SnapShots are NOT volumes. When dealing with a storage array once can create a snapshot of a volume. In many storage arrays a 'snapshot' could in fact just be a diff from some previous state. One can later create a volume from that snapshot. We plan to represent that similarly in kube. Let me walk through a flow at a high level: 1. User creates a PVC which creates a PV 2. User uses this PVC like always, nothing interesting yet. 3. User creates a SnapShotRequest which points to a PVC 4. System triggers the snapshot creation action on the storage array. System also creates a SnapShot API object with data from the storage array which is bound to the SnapShotRequest. 5. At some later point a user might create a new PVC which points to the SnapShotRequest 6. System triggers the snapshot->volume action on the storage array. System also creates a new PV API object from the storage array result bound to the new PVC 7. User uses this new PVC like always, nothing interesting anymore. ------------------------------ I had a discussion with clayton and tend to agree. It is bikeshedding but important bikeshedding. Lets change the object names. Users should create something called a Snapshot. The system should create an object with a less human friendly name, maybe SnapshotRepresentation or something. We don't call it a PodRequest. We don't call it an IngressRequest. The user visible name should be Snapshot. I think we messed that up for PVs and think we should get it right here. I don't really care what we call the other half. But I do feel pretty strongly about the user facing name. (In kube -v2 I think we should make it PV and VolumeRepresentation or something like that) — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#44172 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p_nOUG46gTFuNrAOv96OO2ePv_2mks5r0N4dgaJpZM4M2Ae9> .

jeffvance · 2017-04-28T01:54:35Z

@eparis Re 5. At some later point a user might create a new PVC which points to the SnapShotRequest:
Why would the new PVC point to the snapshot request object rather than the snapshot object? In other words, the user/PVC is not requesting another snapshot, he/she is requesting a PV based on a snapshot, right?

smarterclayton · 2017-04-28T02:27:59Z

The user is making a request to link a high level object under his control to another high level object. The lower level details don't really matter to her (ie it doesn't matter there is a PV or some tar file on S3). The user should be able to request new PVCs based on existing snapshots that have been successfully captured (or request a new pvc pointing to a new snapshot at the same time and expect Kube to fulfill that).

tsmetana · 2017-04-28T06:59:12Z

Thanks for the discussion. We're moving forward.

We will use two objects and it also looks like we have agreed to rename SnapshotRequest to Snapshot. Now we need to rename the second object. I have no preferences whatsoever (SnapshotRepresentation sounds OK to me).

jingxu97 · 2017-04-28T07:48:26Z

I have been thinking more about this two-object proposal and PV/PVC relationship and I feel that that two-object model are necessary for pre-existing volume case, but not for dynamic provisioning case.

The need for two objects: Just like node, pre-existing and configured disk storage is a type of resources that could be shared by different users. Based on the user's request, scheduler/controller finds an appropriate resources to be consumed by user. A PVC or pod spec is the place to list the requirements. Once a pod is assigned to a node, or a PVC is bound to a PV, the API object pod or PVC becomes a handler for user to access the chosen resources. Once finishing using the resources, the disk resource and its representation PV could becomes available to be consumed by others again. It supports binding and release operations between these two objects PV and PVC.
Considering only dynamic provisioning case, a new volume is created for PVC creation and will be deleted for PVC deletion. I feel two objects representation is not necessary. Just like a pod, since this volume will not be reused by anyone else, one object representation is enough.
For security reason, two objects with one object in non-namespace could prevent some user to forge a PVC/PV to access other people's data. But one object with subresource or hidden field could also reach the same goal, I think.

Go back to snapshot, similarly, depending on the use cases, two objects might be needed or not.

If only consider the case that users create a snapshot and use a snapshot to create new volume or roll back a volume, a single object of snapshot is good enough.
If we consider snapshot is a resource could be shared across different users/namespaces, two objects model might better support it. In such case, the snapshot request from a user does not really create a new snapshot, instead it binds to an existing snapshot entity (e.g., based on PVC information). But I do feel this binding is not as natural as PVC/PV case. I have a many-to-one mapping for two snapshot objects proposal to allow cross-namespace sharing, but during snapshot meeting, we got some security concerns about this feature.

I think we need to be clear about what use cases we plan to support, and whether the use cases are reasonable to make the decision. If case 2 is not something we want or should support reasonably, I do feel two objects are not very necessary. Although we are saying it is similar to PV/PVC case, but snapshot is closer to dynamic provisioning case of PVC/PV which does not really require two objects by itself.

tsmetana · 2017-05-02T06:42:29Z

I don't have that strong preference for the two-object model as it may seem. I originally proposed one object model too. The two object model emerged from the upstream discussion for the reasons I have mentioned earlier: mostly it looks to be less limiting for future development.

If case 2 is not something we want or should support reasonably

This I believe is the important point: The two-object model is unnecessary in most cases. However the data replication use case comes popping up in every discussion. The two-object model actually tells users we're not writing the use case off and we want to try to resolve it. I'm also not sure how feasible or reasonable it is...

cantbewong · 2017-05-02T16:50:39Z

I question whether scheduled create or delete should be in the API. Once you enable snapshots to be created by API, you can implement scheduled snapshots, including periodically scheduled snapshots. SO I don't think this level of detail needs to be, or should be called out unless you propose that the scheduled snap mechanism is part of Kubernetes itself, which I do not think is a requirement, now, and perhaps forever. Warning: if you declare yourself to be the “one true standard” for implementing this, be advised than with complexities such as timezones for one, this is a huge undertaking.

Also need to API expose snapshot delete API BTW in order to make this useful for common applications. And obviously and API to ascertain the count and identity of snapshots is also required. In order to support applications engaing in recurring snapshot create/delete, it is desirable to allow such an application to appy a label or other unique searchable/querable identifier to allow determination of whether your application created a snapshot. At a practical level you don't want to have a backup service deleting an admin created snapshot simply because it can't discriminate based on who creted the snapshot.

In order to support application initiated (as presumable application consistent) snapshots it would be desirable to have a mechanism to enable/disable this (by either pod or container), with an API authentication mechanism that simply identifies the request as coming from within the pod/container without requiring further credentials - the ask here is to enable making snapshot create
/delete calls on “self” without keeping credentials inside the pod/container. (Database admin runs script inside container that quieisces app, flushes to file file system and then triggers snapshot)

I am worried that support of a list of volumes in the snapshot create api will be erroneously interpreted as an expectation that these snaps would be captured at a “singular and common” point in time. Few (and perhaps zero) underlying storage implementations are capable of delivering this in a Kubernetes context.

I question whether bundling “create new volume from snapshot” into the SnapshotRequest is desirable. Common use cases exist for creating a volume from a snapshot long after the original snapshot was created. If as a developer you decide you want to do this, I think finding the functionality in something labeled SnapshotRequest is unexpected.

jeffvance · 2017-05-02T17:07:11Z

@cantbewong wrote: Common use cases exist for creating a volume from a snapshot long after the original snapshot was created. If as a developer you decide you want to do this, I think finding the functionality in something labeled SnapshotRequest is unexpected
I agree re. naming.

mcluseau · 2017-05-03T05:30:20Z

@jingxu97 even in case 1, how do you hide the storage access keys from the user without 2 objects? ie, accessing a Ceph snapshot will require a monitor list, a key, a pool, etc. I don't want my users to have access to that.

jingxu97 · 2017-05-03T05:56:43Z

Could subresource be used here to hide information from users? As Jan mentioned before "I checked Kubernetes and it is possible to have a namespaced Snapshot object with a subresource (SnapshotStorage?). Access to the subresource is subject of RBAC, so we can restrict users in creating, editing or even getting the subresource and let only a controller or admin access it. I checked Kubernetes sources and subresources are widely used, e.g. for Status. However, both e.g. namespaces/myns and namespaces/myns/status lead to the *same* object in etcd, it's a full Namespace object with both Namespace.Spec and Namespace.Status. To be honest, I don't understand this approach. So, it is possible to have something like api/v1/namespaces/myns/snasphots/mysnapshot with Snapshot object, accessible by user and api/v1/namespaces/myns/snasphots/mysnapshot/storage with SnapshotStorage object with coordinates of the snapshot in AWS/GCE; accessible by admin/controllers. We would be probably the first to do that and I'd like to ask @bgrant, @etune or @lavalamp if it's the right approach."

…

On Tue, May 2, 2017 at 10:30 PM, Mikaël Cluseau ***@***.***> wrote: @jingxu97 <https://github.com/jingxu97> even in case 1, how do you hide the storage access keys from the user without 2 objects? ie, accessing a Ceph snapshot will require a monitor list, a key, a pool, etc. I don't want my users to have access to that. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#44172 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASSNxTn1NYjvc6thTVchqU7X5VmyyORBks5r2BEHgaJpZM4M2Ae9> .

-- - Jing

mcluseau · 2017-05-03T06:37:01Z

You have the whole subject really well in mind :-) So this is a technical solution implying RBAC on subresources. It's 1 object in etcd, but 2 ones in the API anyway. Pure opinion here: I find the subresource approach much more complex to understand for the admin/user; I mean, why should I do kubectl get snapshotstorage --all-namespaces (and deduplicate the output if a snapshot is shared between namespaces) to get a list of actual "real world" objects that are not namespaced by nature? I completely fail to see any gain and still see a potential danger in forcing such an artificial property. In particular, we're kind of locking the snapshots to their namespace while they are not in reality.

jingxu97 · 2017-05-03T08:05:58Z

I think so far the workflow we want to support the most is that users create snapshots and use them. In this way, snapshot represent user's data which should be in user's namespace. About Ceph, from document I think only Ceph block device support snapshots and kubernetes currently does not support block device. For snapshot command, it mentioned that "you must specify a user name or ID and a path to the keyring containing the corresponding key for the user". Isn't it something that user actually have to provide to taking a snapshot? I might be wrong, but I guess you are more focusing on system admin taking snapshot case. I wondered if system admin creates a Ceph snapshot, is that possible for user to access it?

…

On Tue, May 2, 2017 at 11:37 PM, Mikaël Cluseau ***@***.***> wrote: You have the whole subject really well in mind :-) So this is a technical solution implying RBAC on subresources. It's 1 object in etcd, but 2 ones in the API anyway. Pure opinion here: I find the subresource approach much more complex to understand for the admin/user; I mean, why should I do kubectl get snapshotstorage --all-namespaces (and deduplicate the output if a snapshot is shared between namespaces) to get a list of actual "real world" objects that are not namespaced by nature? I completely fail to see any gain and still see a potential danger in forcing such an artificial property. In particular, we're kind of locking the snapshots to their namespace while they are not in reality. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#44172 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASSNxfy4WMWBOM0DbOzupwGuTQFkztF5ks5r2CCpgaJpZM4M2Ae9> .

-- - Jing

mcluseau · 2017-05-03T08:33:06Z

Kubernetes does support Ceph RBD (Rados Block Device), it's what I use :) https://github.com/kubernetes/kubernetes/tree/master/examples/volumes/rbd.

The ceph workflow is around this:

# Create a volume (ie the provisionner does that)
rbd create [key, mons, pool args] my-image --size 1024

# map and use the image...
rbd map [key, mons, pool args] my-image
=> /dev/rbd0
mount /dev/rbd0 /var/lib/kubelet/plugins/...rbd/...

# Take a snapshot
fsfreeze -f /dev/rbd0
rbd snap [key, mons, pool args] create my-image@snap-1
fsfreeze -u /dev/rbd0

# List snapshots of an image
rbd snap [key, mons, pool args] ls my-image

# Create a volume from the snapshot
rbd clone [key, mons, pool args] my-image@snap-1 my-image-from-snap-1

The key/user are usually those given to the "volume manager" (ie cinder in openstack). So, if "the user" is Kubernetes, then yes, the user has to provide its key to make operations on the images. But if "the user" is the Kubernetes' user, then she doesn't need to provide this information (there's no link between the Ceph's user and the Kubernetes' one). So, Kubernetes will have access to the snapshots it does, whatever the namespace or any Kubernetes' concept is involved.

In the PV/PVC split, the key, mons and pool args are in the PV, provided for instance by the provisionner.

rootfs · 2017-05-03T17:42:58Z

@MikaelCluseau good point. So in rbd PV/PVC, there are two keys: admin and user. Admin key is used to create and delete volume while user key is used to map/unmap rbd image. We can use the same idea here: a admin key in its own namespace and let controller use it to snap/clone.

timothysc · 2017-05-10T21:22:25Z

/cc @skriss

erictune · 2017-05-17T00:10:27Z

I had a long chat with @jingxu97 about this today. She asked me to summarize here. I haven't read this whole issue, so apologies in advance if I misunderstand or state what has already been said.

In Kubernetes RBAC, it is desirable, and I think common, to give a user within a namespace permission to access all resources of all objects within that namespace, including subresources.

(Details: this happens by giving that user the cluster-admin role via a RoleBinding in that namespace, which effectively makes it a namespace-admin role, although that is not what it is called. See: https://kubernetes.io/docs/admin/authorization/rbac/#default-roles-and-role-bindings)

So, there is no way to prevent at this namespace-admin user from having access to all subresources of a namespaced Snapshot Kind.

Therefore, there is no way from preventing someone in namespace N1 from editing a Snapshot object to point to a path with data not belonging to that namespace.

(You might ask why not change the role definition. The problem is that then the role definition would either have to have an explicit deny in it (which RBAC intentionally does not support because it has other problems) or the role would have to enumerate every kind and subresource except the one sensitive one, which is not possible, since the set of kinds is unbounded.)

The best approach, from our conversation, seemed to be either the two Kinds approach or storing some private-to-the-controller information in an external data store (effectively two objects).

jeffvance · 2017-05-17T22:35:48Z

Is it an option, at least initially if not permanently, to disallow all editing of a snapshot? In a way, this is consistent with the basic concept of snap-shotting...

tsmetana · 2017-05-18T15:24:06Z

Thanks for the updates. I think the two objects would be the safest path forward then: it looks to me it might be really more extensible in the future...

erictune · 2017-05-20T00:16:27Z

On further thought, my comment above may not be accurate. I've asked the RBAC experts in sig-auth for clarification.

erictune · 2017-05-30T21:47:56Z

I spoke with Sig Auth. The leads have a different view of how the namespace admin role is going to be extended than I had. Their view is consistent with the subresources approach. So, I retract my previous objection. Apologies if this slowed things down.

fejta-bot · 2017-12-25T14:52:39Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-24T16:41:27Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-02-23T16:47:51Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

jingxu97 added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Apr 6, 2017

0xmichalis added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 6, 2017

davidopp mentioned this issue Apr 16, 2017

Snapshot / Restore Volume Support for Kubernetes (CRD + External Controller) kubernetes/enhancements#177

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 24, 2018

k8s-ci-robot closed this as completed Feb 23, 2018

jbw976 mentioned this issue Mar 6, 2018

Design for volume snapshot and backup policy rook/rook#1552

Closed

billimek mentioned this issue May 31, 2018

Figure out how to access and back-up data persisted in rook billimek/k8s-templates#1

Closed

Volume Snapshot in Kubernetes - Snapshot API Object discussion #44172

Volume Snapshot in Kubernetes - Snapshot API Object discussion #44172

Comments

jingxu97 commented Apr 6, 2017 • edited Loading

jingxu97 commented Apr 6, 2017

wattsteve commented Apr 6, 2017

childsb commented Apr 6, 2017

rootfs commented Apr 6, 2017

mcluseau commented Apr 6, 2017

mcluseau commented Apr 6, 2017

childsb commented Apr 6, 2017

jingxu97 commented Apr 6, 2017

mcluseau commented Apr 6, 2017

jsafrane commented Apr 7, 2017

jsafrane commented Apr 7, 2017

mcluseau commented Apr 7, 2017

jsafrane commented Apr 7, 2017

tsmetana commented Apr 7, 2017

mcluseau commented Apr 7, 2017

jsafrane commented Apr 7, 2017

mcluseau commented Apr 7, 2017 • edited Loading

tsmetana commented Apr 7, 2017

jsafrane commented Apr 7, 2017

mcluseau commented Apr 7, 2017

jingxu97 commented Apr 7, 2017 via email

jingxu97 commented Apr 25, 2017

smarterclayton commented Apr 25, 2017 via email

smarterclayton commented Apr 25, 2017 via email

0xmichalis commented Apr 26, 2017

jingxu97 commented Apr 26, 2017 via email

0xmichalis commented Apr 26, 2017

eparis commented Apr 27, 2017

smarterclayton commented Apr 27, 2017 via email

jeffvance commented Apr 28, 2017 • edited Loading

smarterclayton commented Apr 28, 2017 via email

tsmetana commented Apr 28, 2017

jingxu97 commented Apr 28, 2017 • edited Loading

tsmetana commented May 2, 2017

cantbewong commented May 2, 2017

jeffvance commented May 2, 2017

mcluseau commented May 3, 2017

jingxu97 commented May 3, 2017 via email

mcluseau commented May 3, 2017

jingxu97 commented May 3, 2017 via email

mcluseau commented May 3, 2017

rootfs commented May 3, 2017

timothysc commented May 10, 2017

erictune commented May 17, 2017

jeffvance commented May 17, 2017

tsmetana commented May 18, 2017

erictune commented May 20, 2017

erictune commented May 30, 2017

fejta-bot commented Dec 25, 2017

fejta-bot commented Jan 24, 2018

fejta-bot commented Feb 23, 2018

jingxu97 commented Apr 6, 2017 •

edited

Loading

mcluseau commented Apr 7, 2017 •

edited

Loading

jeffvance commented Apr 28, 2017 •

edited

Loading

jingxu97 commented Apr 28, 2017 •

edited

Loading