object bucket provisioning #1383

jeffvance · 2019-11-26T05:10:56Z

KEP for bucket provisioning API.

cc @copejon @erinboyd @saad-ali @alarge @guymguym @travisn

copejon

Partial review, more to come

keps/sig-storage/20191125-bucket-provisioning.md

copejon · 2019-11-26T21:27:02Z

keps/sig-storage/20191125-bucket-provisioning.md

+Without a formal API to describe and manage bucket provisioning, storage providers are left to write their own provisioners including all of the necessary controller code.
+This "wild west" landscape for bucket provisioning is potentially confusing, overly complex, inconsistent, may slow down Kubernetes adoption by object store users, and may hinder application portability.
+
+This KEP is a proposal for a consistent bucket provisioning API which is object store agnostic and provides a familiar, intuitive management plane.


I think we should call out more specifically that it's agnostic to protocols/APIs (s3, gcs, etc).

e.g.

"We propose a common Kubernetes control-plane API for the management of object store bucket lifecycles which has no dependencies on underlying provider interfaces."

copejon · 2019-11-26T21:36:57Z

keps/sig-storage/20191125-bucket-provisioning.md

+### Non-Goals
+
+ Update the native Kubernetes PVC-PV API to support object buckets.
+ Support and manage object store data planes.


Maybe add

"Provide a POSIX-like interface for object storage protocols"

Or something to call out that we're not trying to write a kube native object store interface

I don't like the POSIX comparison. POSIX has ACLs, write/read verbs, etc.

read/write verbs are not in scope, per above.

Agree with @copejon. It's not a comparison, it's making very clear this is a non-goal. From experience, way too many people conflate object storage with file-like access. Making clear this is not a project to provide any kind of POSIX/file-like access to object storage would definitely be a clarification for some.

keps/sig-storage/20191125-bucket-provisioning.md

copejon · 2019-11-26T21:45:11Z

keps/sig-storage/20191125-bucket-provisioning.md

+
+### Moving Parts
+
+Before deploying an object bucket consuming app, the OB and OBC CRDs need to be created with RBAC rules granting _get_, _list_, and _watch_ verbs.


append "delete" verb for secrets, cms, and obs.

copejon · 2019-11-26T21:49:18Z

keps/sig-storage/20191125-bucket-provisioning.md

+Depending on the object store, a secret in the object-store's namespace, containing new bucket owner credentials, may need to be created.
+The OBCs needed by the app should exist in the app's namespace.
+
+As is true for dynamic PV provisioning, a bucket provisioner pod needs to be running for each object-store created in the Kubernetes cluster.


"a bucket provisioner pod" --> "bucket provisioning operator"

"operator" is somewhat of a red hat technique. Pod is accurate isn't it? It's true that this pod could do more than bucket provisioning -- is that your concern?

pod -> service?

copejon · 2019-11-26T21:51:32Z

keps/sig-storage/20191125-bucket-provisioning.md

+Depending on the object store, a secret in the object-store's namespace, containing new bucket owner credentials, may need to be created.
+The OBCs needed by the app should exist in the app's namespace.
+
+As is true for dynamic PV provisioning, a bucket provisioner pod needs to be running for each object-store created in the Kubernetes cluster.


"needs to be running for each object-store created in the Kubernetes cluster."

It's technically possible that a single operator server more than one object store of the same provider. I think you mean

"Bucket provisioning operators are specialized for the storage provider for which they are written"

copejon · 2019-11-26T21:58:14Z

keps/sig-storage/20191125-bucket-provisioning.md

+OBCs reference a storage class. The storage class references the external provisioner, defines a reclaim policy, and specifies object-store specific parameters, such as region, owner secret, and bucket lifecycle polcies.
+For brownfield provisioning, the storage class also specifies the name of the existing bucket so that app developers are not burdened with typically long, random bucket names.
+
+### Moving Parts


It might be good from a readability point to create 2 sub sections: Greenfield and Brownfield. This would de-tangle the logical paths of each.

I have added sub-sections under Moving Parts for greenfield and brownfield.

copejon · 2019-11-26T22:00:04Z

keps/sig-storage/20191125-bucket-provisioning.md

+A deleted _brownfield_ OBC is similar except that provisioners typically will not delete the physical bucket. Instead, bucket access is revoked and related artifacts are expected to be cleaned up.
+Again, the library will delete the secret, configmap and OB.
+
+### API


"API" is used interchangeably up to this point to reference both the k8s API (CRDs) and the library's interface. How about defining terms for each at the start ("API" == CRDs, "Interface" == library methods)

copejon · 2019-11-26T22:05:28Z

keps/sig-storage/20191125-bucket-provisioning.md

+
+- **`Run`** is a required controller method called by provisioners to start the OBC controller.
+
+- **`SetLabels`** is an optional controller method called by provisioners to define the labels applied to the Kubernetes resources created by the library.


This could be shortened to "allows for setting custom labels on library objects"

jeffvance · 2019-12-05T17:00:50Z

ping @saad-ali @jsafrane @alarge @travisn @guymguym

travisn · 2019-12-08T07:34:53Z

keps/sig-storage/20191125-bucket-provisioning.md

+The S3 provisioner pod watches for OBCs whose storage classes point to the AWS S3 provisioner, while ignoring all other OBCs.
+Likewise, the same cluster can also run the Rook-Ceph RGW provisioner, which also watches OBCs, only handling OBCs that reference storage classes which define ceph-rgw.
+
+**Note:** it is possible for one provisioner to handle OBCs for different instances of the same type of object store.


Wouldn't it be expected that a provisioner would handle OBCs for a given type of object store? Or this is just saying it's possible to have multiple provisioners for each type of object store?

Example: a rook-ceph object store runs in namespace A and another rook-ceph object store runs in namespace B. A ceph-RGW provisioner runs in namespace C, where C could == A or B or any. Then that single provisioner could handle OBCs referencing storage classes which point to stores in A or B.

Ok, is this decision an implementation detail of the provisioner? In rook-ceph's case, I'm thinking for minimal privs the provisioner should only provision from the object store(s) in a given cluster. If you have multiple object stores in the same rook-ceph cluster (same namespace), you would have a single provisioner. But if you have multiple rook clusters (not as common), a separate provisioner may be used. If a single provisioner is needed, that provisioner needs RBAC access to all the rook-ceph clusters. Seems like it could be a deployment decision by the admin.

It's really good to think about minimal privileges and, in fact, that's why this KEP proposes to make OBs namespaced rather than cluster scoped. However, provisioners need to be able to create secrets and configMaps in many namespaces (the ns of the OBC), and now need to create the OB in the ns of the store. If the admin wants a 1:1 mapping of object stores and provisioners s/he can limit write access to a single store. In that case, if an OBC references a storage class which points to a store in a different ns (w/o RBAC rules) then the library's OB creation will fail and the lib's retry code will run.

This is a long way of saying that I should delete the note above since this is an admin detail.

Wouldn't it be expected that a provisioner would handle OBCs for a given type of object store?

Yes. It's just saying that a provisioner can handle different 'flavors' of OBCs that all reference the provisioner. This would be represented by having more than 1 storage class for 1 object store where each SC encapsulates different feature sets.

travisn · 2019-12-08T07:37:40Z

keps/sig-storage/20191125-bucket-provisioning.md

+The secret and configmap reside in the same namespace as the OBC, and the OB lives in the provisioner's namespace.
+The app pod consumes the secret and configmap and thus can reference the bucket.
+
+When a _greenfield_ OBC is deleted the associated provisioner is expected to delete the newly provisioned bucket and the related object store-specific artifacts.


There is a retain policy for the bucket, right? This makes it sound like the bucket is always deleted when the OBC is deleted.

The retain policy is defined in the storage class and also stored in the OB. The Delete method passes the OB to the provisioner. The lib decides to call Delete rather than Revoke because the OBC is for a greenfield bucket. Provisioners have full control over whether or not to delete the bucket. The expectation is that a provisioner's Delete method will delete the bucket but it might not and still return a nil error. Likewise, the expectation is that a provisioner's Revoke method will not delete the bucket but it might and still return a nil error. The lib will always remove the k8s resources it created when a nil error is returned.

travisn · 2019-12-08T07:41:30Z

keps/sig-storage/20191125-bucket-provisioning.md

+**Note:** it was mentioned at the 2019 NA Kubecon storage face-to-face that to improve scaling we could decide to move the configmap info into the secret, thus reducing k8s resources required.
+The OB is an abstraction of the bucket and can save some state data needed by provisioners.
+The secret and configmap reside in the same namespace as the OBC, and the OB lives in the provisioner's namespace.
+The app pod consumes the secret and configmap and thus can reference the bucket.


Multiple client pods can consume the same OBC to read/write to the same bucket, right?

yes, subject to object store access rules. k8s has no opinion on this since there is no accessMode defined for buckets.

Is there any reason why we don't want accessMode input(opinion) from K8s side, presumably in claim? Read-only and Read-write access to buckets seem to be reasonable use cases. I'm imagining in a brownfield scenario, user can have a bucket setup for both consumer and producer application, accessMode would be handy to limit consumer to read-only access and allow producer to have read write access. Also read-only vs read-write seem to be well-supported across major object storage vendors.

Agree that there are good use cases for accessMode. My (admittedly dated) understanding is that accessMode is unenforceable by k8s. Is this still true @jsafrane?
For bucket provisioning, the underlying object store would need to support various access modes.

Yes, AccessMode is used only during binding / provisioning, but it's never used when running a pod - you can have multiple pods using the same RWO volume in parallel.

jeffvance · 2019-12-08T18:13:33Z

keps/sig-storage/20191125-bucket-provisioning.md

+The secret contains the credentails needed to access the bucket and the configmap contains bucket endpoint information.
+**Note:** it was mentioned at the 2019 NA Kubecon storage face-to-face that to improve scaling we could decide to move the configmap info into the secret, thus reducing k8s resources required.
+The OB is an abstraction of the bucket and can save some state data needed by provisioners.
+The secret and configmap reside in the same namespace as the OBC, and the OB lives in the provisioner's namespace.


If a single provisioner for the same type of object store should handle this object store living in multiple namespaces, then the OB needs to reside in the object store's ns not the provisioner's ns.

alarge · 2019-12-09T21:38:04Z

keps/sig-storage/20191125-bucket-provisioning.md

+## Summary
+
+Kubernetes natively supports dynamic provisioning for file and block storage but lacks support for object bucket provisioning.
+Without a formal API to describe and manage bucket provisioning, storage providers are left to write their own provisioners including all of the necessary controller code.


I'm uncomfortable with the implied idea that object bucket provisioning is just an extension of the existing block and file support.

Block and file support are accessed from within containers using local standard APIs. Object storage is logically just a remote service - there's nothing in particular that makes it inherently similar to block/file. Imagine that you wanted to be able to support provisioning against various "relational database as a service" products (e.g., RDS, CloudSQL, etc.). You could imagine an api for things like "create a database" and "create credentials for accessing this database". And you'd interact with the database via some sort of SQL client. I see such a system as a much closer analog for object storage than block/file.

Fair point. When I think of storage I think of file, block and object. I tend to lump object with file and block more than with databases because of its simplicity and narrower scope.
Of course the fact that there is no mount for object makes it similar, from a pod's perspective, to a database. And, from a provisioner's perspective I suppose there's no real difference between a bucket API and a DB API.

I'll leave some time for others to chime in and then will push changes to this doc to reflect your comments.

Of course the fact that there is no mount for object makes it similar, from a pod's perspective, to a database.

Agreed. Given that the data path is so different between file/block and object, it's better to describe it as such. We've already seen that by lumping object in with file/block, there's been some assumptions about whether we're also proposing a POSIX analog for bucket interfaces, which we're not.

alarge · 2019-12-09T21:39:19Z

keps/sig-storage/20191125-bucket-provisioning.md

+### Goals
+
+ Define a _control-plane_ object bucket management API thus relieving object store providers from Kubernetes controller details.
+ Minimize technical ramp-up for storage vendors who have already contributed external storage provisioners (file and block).


I don't understand why "who have already contributed..." is relevant. Minimizing ramp-up is good, but again (per previous comment), I don't think there is any inherent association between object and volumes (block and file).

Agree there is not necessarily an inherent association, per your comment above. Some storage vendors provide file, block and object store. My thinking was that there could be some leverage if a file/block vendor decided to write their own bucket provisioner.

if a file/block vendor decided to write their own bucket provisioner.

The feedback we've gotten on the bucket provisioning library ("why not make it CSI-like?") makes even more sense when considering adoption by vendors into existing provisioners. For one, CSI being the standard model for file/block means that it may actually be more cumbersome to incorporate the library. Being written in go immediately creates more work for provisioner authors who don't use the language. And the library pattern be in fact be unfamiliar to provisioner authors who jumped straight to CSI.

alarge · 2019-12-09T21:39:49Z

keps/sig-storage/20191125-bucket-provisioning.md

+
+ Define a _control-plane_ object bucket management API thus relieving object store providers from Kubernetes controller details.
+ Minimize technical ramp-up for storage vendors who have already contributed external storage provisioners (file and block).
+ Make bucket consumption similar to file or block consumption using familiar concepts and commands.


I don't think it is bucket consumption that you are trying to make similar - I think it is bucket provisioning.

Right, it's the elements of the control plane that are meant to be familiar. It should not propose an abstraction of object store APIs specifically.

alarge · 2019-12-09T21:41:42Z

keps/sig-storage/20191125-bucket-provisioning.md

+ Define a _control-plane_ object bucket management API thus relieving object store providers from Kubernetes controller details.
+ Minimize technical ramp-up for storage vendors who have already contributed external storage provisioners (file and block).
+ Make bucket consumption similar to file or block consumption using familiar concepts and commands.
+ Use native Kubernetes resources where possible, again, to keep the bucket experience for users and admins similar to existing storage provisioning.


I'm assuming this refers to the decision to overload StorageClass for parameters. I do not see the value in this - I see an anti-value. Volume provisioning is fundamentally different than provisioning some external entity. Artificial association is more likely to confuse IMO than reduce confusion.

This concern should be discussed.
My team feels it is important to use native storage classes for the reason cited. We know others who use a custom resource, similar to a storage class, but specific to their object store needs.
An advantage of a CR is you get api validation vs the parameters map of storage classes, and a CR could handle brownfield buckets more intuitively than this proposal.

There are several benefits to a separate CR:

Avoiding cognitive dissonance - StorageClass is used for volume operations. Other classes (e.g., VolumeSnapshotClass) have been introduced for other class-style parameterization. Adding ObjectBucketClass would seem to be the most logical approach.

You can evolve it independently of StorageClass, which responds to requirements from block/file (you don't have to bring them along, and you don't have to be brought along).

StorageClass is non-namespaced. You've made progress towards making the object support install-able and usable without requiring write access to non-namespace objects. If you had a single well-known namespace for object operations, ObjectBucketClass resources could go into that namespace (along with provisioners and OBs).

These are all pretty compelling points for a bespoke BucketClass object. We leaned on SCs initially because of assumptions we made about the system being familiar to k8s users. However, like you point out, we're beholden to the API which is exclusively intended for file/block. Namespacing the classes with their provisioners also lends itself to scalability.

alarge · 2019-12-09T21:48:44Z

keps/sig-storage/20191125-bucket-provisioning.md

+
+## Proposal
+
+Create a bucket provisioning library, similar to the Kubernetes [sig-storage-lib-external-provisioner](https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go) library, which handles all of the controller and Kubernetes resources related to provisioning.


After years of dealing with the issues of having multiple vendors support provisioning of volumes, the community decided on an approach that abstracts the control plane behind a simple gRPC API, allowing for provisioning drivers to be effectively independent of Kubernetes. I believe that this is the approach we should also use for object. Instead of providing a library and asking object vendors to implement a controller using the library, I think we should provide top-level and sidecar controllers and define the interface via gRPC. This massively simplifies the process of writing and deploying a driver. Using a central top-level controller (e.g., see the 1.17 volume snapshot architecture) allows for a clean approach to CRD versioning and lifecycle management and standardizing observability metrics.

It also could theoretically allow for situations in the future where the gRPC functions could be optionally added to both CSI (for cases where volume storage providers also provide object support) and separately (e.g., via a new "OSI").

I'd like to see discussion on this suggestion.
I am not experienced enough in CSI to argue or agree with this comment. There are a couple of provsioners recently using this lib and my understanding is that the effort was relatively low.

Having someone deliver the whole solution in a single controller creates a lot of problems. For example:

If there is a bug in the library that is discovered, how do you ensure that all the controllers using that library are updated?

If you have multiple controllers installed, who "owns" the CRDs? How do they get installed/upgraded?

If storage vendors are providing their own whole controllers, how do you ensure a degree of uniformity (and correctness) around observability?

The volume snapshot folks directly addressed this set of issues with the beta volume snapshot work and came up with a dual-control + driver architecture:

A single-instance global controller that handles most of the complicated creation, provisioner lookup, and binding logic

A very simple sidecar controller that serves primarily to convert k8s resources into gRPC calls

A plugin driver supplied by storage vendors that is kubernetes and language independent.

I'm advocating a very similar model here.

+1 And since many storage vendors have already switched over to CSI drivers, it's going to be a hard sell to convince them they should adopt a model that k8s has moved on from.

I am starting to see value a CSI-like approach (maybe COSI or "cozy?).
@alarge can you elaborate on this point:

If there is a bug in the library that is discovered, how do you ensure that all the controllers using that library are updated?

I agree it's a potential problem with the library. But CRDs can support multiple versions and so can the lib (though it doesn't today). Can you explain how CSI mitigates this issue?

@jeffvance A bug in this context could mean 2 kinds of bug: 1) kubernetes related bug(i.e CRD, controller common logic in the library) 2) object storage vendor related bug(i.e Provisioner bug). The library approach IMO seems to be unnecessarily coupling these 2 types of bugs. An issue on either side(K8s controller/CRD VS. Object Storage Provisioner) IMO shouldn't require attention from the other side. For example, a common library bug fix shouldn't require vendor to release a new version of their controller because there's no change in the "Provisioner" they've implemented. Having a CSI-like interface would allow us to decouple these 2 sides of the problem. For instance, a bug fix in the central controller would just require end-user to bump up the version number in the deployment yaml and object storage vendor is no involved at all.

I do like the separation of provisioner bugs vs a csi bug. I'll add this point to the CSI Pros section of the KEP.

Follow up: it appears that accessMode is used by CSI. I don't yet know how it's enforced or controlled, but it is passed along in the request.

alarge · 2019-12-09T21:49:12Z

keps/sig-storage/20191125-bucket-provisioning.md

+## Proposal
+
+Create a bucket provisioning library, similar to the Kubernetes [sig-storage-lib-external-provisioner](https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go) library, which handles all of the controller and Kubernetes resources related to provisioning.
+This library will implement a simple and familiar API via Custom Resources, and define a _contract_ between app developers and provisioners regarding the Kubernetes resources related to bucket provisioning.


Per above, I think a gRPC API is a better form of contract than a golang-specific library interface.

Is a point, when mentioning go, that gRPC has multi-language support?

Yes, that was part of my point. But also the fact that the gRPC API is its own standalone thing, so documentation and version control tend to be handled better than internal library interfaces.

alarge · 2019-12-09T21:50:35Z

keps/sig-storage/20191125-bucket-provisioning.md

+
+As is true for PVCs-PVs, there is a 1:1 relationship between an OBC and an OB, and as will be seen below, there is a [_binding_](#binding) between an OBC and OB. 
+
+OBCs reference a storage class. The storage class references the external provisioner, defines a reclaim policy, and specifies object-store specific parameters, such as region, owner secret, bucket lifecycle policies, etc.


Per previous comment, I think it would be better to have an ObjectBucketClass to handle these issues rather than overloading StorageClass with functionality not related to volumes.

alarge · 2019-12-09T21:52:18Z

keps/sig-storage/20191125-bucket-provisioning.md

+
+As is true for dynamic PV provisioning, a bucket provisioner pod needs to be running for each type of object-store existing in the Kubernetes cluster.
+For example, for AWS S3, the developer creates an OBC referencing a storage class which references the S3 store.
+This storage class needs to be created by the admin.


Note that using ObjectBucketClass instead of StorageClass may allow this class to be stored in a namespaced area along with the OBs

alarge · 2019-12-09T21:54:48Z

keps/sig-storage/20191125-bucket-provisioning.md

+  name: MY-BUCKET-1 [1]
+  namespace: USER-NAMESPACE [2]
+spec:
+  bucketName: [3]


Why should this be allowed? (claim specifying exact bucket name)

Do you mean as opposed to assigning a randomly generated name?

To clarify, this field is ignored when specifying a brownfield storageClass (which I agree is a messy way of handling it, better suited to a BucketClass).

alarge · 2019-12-09T21:55:25Z

keps/sig-storage/20191125-bucket-provisioning.md

+  bucketName: [3]
+  generateBucketName: "photo-booth" [4]
+  storageClassName: AN-OBJECT-STORE-STORAGE-CLASS [5]
+  additionalConfig: [6]


This feels like non-portable information, akin to passing through annotations.

True. We want a way to pass provisioner specific information. The storage class (or bucketClass) allows this, but we were thinking of the case where, say region, was defaulted in the SC but could be overridden in the OBC. Of course each provisioner decides which of these fields to process or ignore.

alarge · 2019-12-09T22:00:21Z

keps/sig-storage/20191125-bucket-provisioning.md

+    bucketName: my-photos-1xj4a
+    region: # provisioner dependent
+    subRegion: # provisioner dependent
+    additionalConfigData: [] #string:string


This list of endpoint parameters feels specific to a particular provider. I feel like the goal of this should be to produce the fields that are needed by client SDKs and to be useful across at least the three main object APIs (S3, GCS, and Azure). Given that, I'd argue that a "data path protocol" should be supplied with the claim (and declared by drivers). The endpoint and credential information supplied by the driver (and recorded in the OB and Secret) should be appropriate for the given data path protocol (e.g., nearly directly usable by the standard API for that data protocol).

+1 It sounds like you're describing a pattern similar to the *VolumeSource subfields of PVs, which I agree would be a better fit. The overuse of string:string maps is something we need to address generally.

@alarge:

"data path protocol" should be supplied with the claim (and declared by drivers).

A goal was to not embed specific object store fields into the lib (though admittedly there is an s3 flavor). The reason for this is so object store vendors would not have to wait for their fields to be added. IOW, (I think to do this) the lib would have some switch dataPathProtocol code which has to know which fields go with which protocol -- at least that's what I'm thinking about how you'd implement this. Can you elaborate more?

alarge · 2019-12-09T22:01:32Z

keps/sig-storage/20191125-bucket-provisioning.md

+  BUCKET_PORT: 80 [8]
+  BUCKET_NAME: MY-BUCKET-1 [9]
+  BUCKET_REGION: us-west-1
+  ... [10]


Per previous comment - these should be whatever is appropriate for the given data path protocol.

If we do this and have a "DataPathProtocol" type field in the OBC then app pods have little chance of being portable across different object store, right? Some degree (obviously not 100%) of portability is/was desired in the original design.

The app would be tethered to the protocol regardless as it's expected they've been written using some SDK. This doesn't restrict apps to a specific object provider, but they do need to be aware of what protocol is being used at the endpoint. That is, S3 is pretty common, but not ubiquitous, and this must be communicated.

alarge · 2019-12-09T22:03:50Z

keps/sig-storage/20191125-bucket-provisioning.md

+  containers:
+  - name: mycontainer
+    image: redis
+    envFrom: [1]


This is just an example, right? Mounted secrets might be preferred (e.g., in the case of multiple bucket claims)

Yes. It would probably be helpful to add that case as well.

alarge · 2019-12-09T22:04:12Z

keps/sig-storage/20191125-bucket-provisioning.md

+```
+1. use `env:` if mapping of the defined key names to the env var names used by the app is needed.
+1. makes available to the pod as env variables: BUCKET_HOST, BUCKET_PORT, BUCKET_NAME
+1. makes available to the pod as env variables: ACCESS_KEY_ID, SECRET_ACCESS_KEY


Again, specific to S3. Different variables for different data path protocols.

jsafrane · 2019-12-13T11:18:02Z

keps/sig-storage/20191125-bucket-provisioning.md

+ an _**ObjectBucket**_ (OB), similar to a PV, is the k8s representation of the provisioned bucket and may contain object store-specific data needed to allow the bucket to be de-provisioned. The OB resides in the same namespace as the object store, and is typically not seen by bucket consumers.
+**Note:** namespaced OBs is a change from the original design and was suggested at the 2019 NA Kubecon storage face-to-face with the premise being that if there is no _technical_ reason for a cluster scoped resource then it should be namespaced.
+
+As is true for PVCs-PVs, there is a 1:1 relationship between an OBC and an OB, and as will be seen below, there is a [_binding_](#binding) between an OBC and OB. 


What is the reason to have two separate objects, OBC and OB? For PV/PVC it's security. If volume ID was in PVC, user could craft their own PVCs and guess volume ID of some other volume and steal its data. In this proposal, you give full bucket ID and secrets to access it to a pod. So user can guess ID of some other bucket and steal it in the pod.

Therefore you don't need OB object at all. All can be in OBC (and it could be called ObjectBucket instead of claim).

A reason for the OB is to contain provisioner-specific data, eg. a newly created user name. The OB is passed to Delete and Revoke providing provisioners this static information so they can delete buckets and clean up policies w/o maintaining their state separately.
Another reason was to mimic the PVC - PV model even though now there are some who believe that design is overly complex.

Are these provisioner-specific data secret? In other words, can a malicious user misuse it to revoke / delete data of someone else?

@jsafrane The secret is created by the lib and lives in the same namespace as the obc. The secret contains s3-like access and secret key values, and a provider-specific map[string]string.
The secret cannot be used maliciously AFAICT, because the provisioner is expected to return creds that are specific to a single bucket and are not the owner's creds.

@jsafrane This is dependent on the provisioner implementation. The library only passes data returned from Provision()/Grant() calls directly into the secret. Rook, for instance, creates a new Ceph user with limited rights and returns those credentials. That user may only access the generated/attached bucket, may not create buckets, and (as a design of Ceph), may not view or mutate other user's buckets.

xing-yang · 2020-10-01T04:05:56Z

keps/sig-storage/1979-object-storage-support/README.md

+
+## ProvisonerCreateBucket
+
+This call is made to create the bucket in the backend. If a bucket that matches both name and parameters already exists, then OK (success) must be returned. If a bucket by same name, but different parameters is provided, then the appropriate error code `ALREADY_EXISTS` must be returned by the provisioner.


Please add "ProvisonerCreateBucket MUST be idempotent".

xing-yang · 2020-10-01T04:15:47Z

keps/sig-storage/1979-object-storage-support/README.md

+
+## ProvisonerDeleteBucket
+
+This call is made to delete the bucket in the backend. If the bucket has already been deleted, then no error should be returned.


Please add "ProvisonerDeleteBucket MUST be idempotent".

xing-yang · 2020-10-01T04:19:14Z

keps/sig-storage/1979-object-storage-support/README.md

+- [ ] (R) KEP approvers have approved the KEP status as `implementable`
+- [ ] (R) Design details are appropriately documented
+- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
+- [X] (R) Graduation criteria is in place


This one probably should not be checked yet as I see you only have "Alpha" there.

xing-yang · 2020-10-01T19:42:19Z

My comments are addressed. Looks good to me. Please squash your commits.

wlan0 · 2020-10-02T01:18:26Z

@xing-yang I've squashed the commits into one.

xing-yang · 2020-10-02T02:00:49Z

/lgtm

wlan0 · 2020-10-02T17:24:10Z

/approve

k8s-ci-robot · 2020-10-02T17:24:40Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jeffvance, wlan0
To complete the pull request process, please assign xing-yang after the PR has been reviewed.
You can assign the PR to them by writing /assign @xing-yang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-storage/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

saad-ali · 2020-10-07T08:03:25Z

keps/sig-storage/1979-object-storage-support/README.md

+  finalizers:
+    - cosi.io/finalizer [2]
+spec:
+  protocol:


We should change this to a list of protocols: to support asking for multiple protocols. Effectively this is a way for an app to say I understand this set of protocols.

Why? Imagine you have an app that is deployed across multiple cloud providers, and bundles multiple SDKs to speak the native object protocol on each environment. The app dev should be able to define a single BucketRequest that will work across all those environments.

Earlier drafts had a list of protocols in the BC and a single protocol in the BR. During reviews we concluded that if a vendor supports multiple protocols the easiest solution is to create 1 BC per protocol. We believe this approach is more intuitive and less complex. BC.protocol is passed to the driver which can reject the request if the protocol is not supported.

Along the lines of intuitive and less complex, we are considering eliminating protocol from the BR. The reason why is that the BC reference in the BR is sufficient to know the protocol. The only value today in BR.protocol is that COSI can validate the BC's protocol matches the BR's. If validation fails then the BR is retried as is typical in k8s. An issue with this is the BC is immutable so the admin would need to recreate the BC with the correct protocol. Or, the BR needs to be deleted and recreated to use the right protocol. Both of these scenarios are unnecessary if BR.prtocol is removed.

Thoughts?

I think the use case I described above is an important one:

Imagine you have an app that is deployed across multiple cloud providers, and bundles multiple SDKs to speak the native object protocol on each environment. The app dev should be able to define a single BucketRequest that will work across all those environments.

Having to create a new BR for each platform breaks the portability we are trying to achieve.

@saad-ali Are you describing a single app that speaks multiple protocols, ie .accessing >1 bucket and using a different protocol among buckets?

But I think regardless of the answer above, the app can reference 2+ BARs in the csi-driver portion of the pod spec. Each BAR references a BR. Each BR references a BC and the BC defines a single protocol (according to the current kep). I don't see a portability issue since we assume that the admin sets up the correct BCs and provisioners on the target cluster.

The app dev should be able to define a single BucketRequest that will work across all those environments.

There is some detachment between the BR and the app pod in that the pod does not reference BRs. If the app needs multiple buckets it doesn't seem unreasonable to require multiple BRs/BARs.

saad-ali · 2020-10-07T08:06:16Z

keps/sig-storage/1979-object-storage-support/README.md

+```
+
+1. `provisioner`: (required) the name of the vendor-specific driver supporting the `protocol`.
+2. `isDefaultBucketClass`: (optional) boolean, default is false. If set to true then a `BucketRequest` may not need to specify a `BucketClass`. If the greenfield `BucketRequest` omits the `BucketClass` and a default `BucketClass`'s protocol matches the `BucketRequest`'s protocol then the default bucket class is used; otherwise an error is logged.


Define what happens if for a given protocol, more than one BucketClass is marked as default (on StorageClass we fail provisioning of PVC with no StorageClass in that case).

Storageclasses have (or had) this same issue. We have 2 choices: 1) create a BC admission controller that rejects a new default BC if it matches an existing default BC for the given protocol, 2) let the result be undefined, meaning the first BC retrieved that matches the BR's protocol wins.

If we remove BR.protocol, and a greenfield BR omits the BC, then we have to figure out which BC we use as the default (since we could have a default BC per protocol). My current thinking is if we remove BR.protocol then we will support only a single default BC, implying a single default protocol.

saad-ali · 2020-10-07T08:13:17Z

keps/sig-storage/1979-object-storage-support/README.md

+   - "publicReadWrite": Read/Write, same as `ro` with the addition of PutObject being allowed.
+7. `bucketClassName`: (optional for greenfield, omitted for brownfield) name of the associated bucket class. 
+8. `bucketRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketRequest`.
+9. `allowedNamespaces`: a list of namespaces that are permitted to either create new buckets or to access existing buckets.


Networking folks are also looking for ways to address this problem in GatewayClass. In the future, we may want to consider aligning with whatever they land on for consistency across k8s.

saad-ali · 2020-10-07T08:14:07Z

keps/sig-storage/1979-object-storage-support/README.md

+   - _Delete_: the bucket and its contents are destroyed.
+> Note: the `Bucket`'s retention policy is set to "Retain" as a default. Exercise caution when using the "Delete" retention policy as the bucket content will be deleted.
+> Note: a `Bucket` is not deleted if it is bound to a `BucketRequest` or `BucketAccess`.
+6. `anonymousAccessMode`: a string specifying *uncredentialed* access to the Bucket.  This is applicable to cases where the bucket objects are intended to be publicly readable and/or writable. One of:


I don't really follow this. But if Andrew Large is happy with it, I'm fine with it.

anonymousAccessMode is defined in the BC and copied to the Bucket. It's purpose is to define the level of public access to a new bucket in the absence of other access policies.

saad-ali · 2020-10-07T08:15:11Z

keps/sig-storage/1979-object-storage-support/README.md

+  name: 
+provisioner: [1]
+isDefaultBucketClass: [2]
+protocol: {"azureblob", "gs", "s3", ... } [3]


Should this also contain version information like BucketRequest?

The BC should contain the protocol version as an optional field.
If we remove BR.protocol then version should also be removed from the BR.

protocol version added.

saad-ali · 2020-10-07T08:21:04Z

keps/sig-storage/1979-object-storage-support/README.md

+1. `provisioner`: (optional) The provisioner field as defined in the `BucketAccessClass`. If empty then there is no driver/sidecar, thus the admin had to create the `Bucket` and `BucketAccess` instances.
+1. `bucketInstanceName`:  name of the `Bucket` instance bound to this BA.
+1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
+1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.


How is this different from the ServiceAccount in BucketAccessRequest? Do we need it in both places? What happens if they skew for some reason?

BA.serviceAccount is copied from BAR.serviceAccount and converted to an ObjectReference. We need both since the sidecar may need the SA and won't have access to the BAR (due to minimal privileges). There is no check for skew.

In later COSI version we can handle update events on the BAR and BA and deal with changes to SAs.

saad-ali · 2020-10-07T08:23:17Z

keps/sig-storage/1979-object-storage-support/README.md

+1. `bucketInstanceName`:  name of the `Bucket` instance bound to this BA.
+1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
+1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
+1. `mintedSecretName`: name of the admin created or provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and must be copied to the app namespace by the COSI controller. **Note:** the provisioner's namespace is contained in the registered driver name.


Can you describe this process? Who creates mintedSecretName in provisioner's namespace? How do you prevent name collisions? Why does it need to be copied to the app namespace? Why can't secret just be provisioned in app namespace?

A provisioner does not have access to the user's namespace, and the minted secret lives in provisioner namespace. This secret is created by either the admin (static) or by COSI when it is the result of backend bucket access creation. It contains the credential returned by the driver. Name collisions are prevented by using UUID(secret-uuid).

saad-ali · 2020-10-07T08:24:23Z

keps/sig-storage/1979-object-storage-support/README.md

+1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
+1. `mintedSecretName`: name of the admin created or provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and must be copied to the app namespace by the COSI controller. **Note:** the provisioner's namespace is contained in the registered driver name.
+1. `policyActionsConfigMapData`: encoded data that contains a set of provisioner/platform defined policy actions to a given user identity. Contents of the ConfigMap that the *policyActionsConfigMap* field in the `BucketAccessClass` refers to.
+1. `principal`: username/access-key for the object storage provider to uniquely identify the user who has access to this bucket  


Is this always required?

This is filled in by COSI. This is not a required field

To add to @wlan0 's reply: we have several use cases involving BA.principal:

static/driverless brownfield where the admin creates the BA and fills in principal, so here principal is minted by the admin.

brownfield where COSI creates the BA and fills in principal based on the driver's minted principal.

the BAR specifies a serviceAccount which is mapped by the backend to a user id, and principal is empty.

saad-ali · 2020-10-07T08:27:12Z

keps/sig-storage/1979-object-storage-support/README.md

+1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
+1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
+1. `mintedSecretName`: name of the admin created or provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and must be copied to the app namespace by the COSI controller. **Note:** the provisioner's namespace is contained in the registered driver name.
+1. `policyActionsConfigMapData`: encoded data that contains a set of provisioner/platform defined policy actions to a given user identity. Contents of the ConfigMap that the *policyActionsConfigMap* field in the `BucketAccessClass` refers to.


What does the content of the policyActionsConfigMap look like? Is it defined by the driver? If so, can you provide an example of the type of information it would be used to track. What happens if the object doesn't exist? What happens if it exists, but is then deleted?

BA.policyActionsConfigMapData could look like this:

{“Effect”:“Allow”,“Action”:“s3:PutObject”,“Resource”:“arn:aws:s3:::profilepics/*“}, {“Effect”:“Allow”,“Action”:“s3:GetObject”,“Resource”:“arn:aws:s3:::profilepics/*“}, {“Effect”:“Deny”,“Action”:“s3:*“,”NotResource”:“arn:aws:s3:::profilepics/*“}

It’s content is defined by the driver but is created by the admin as part of creating a new BAC. The CM content is copied into the BA as the source of truth to handle cases where the CM is replaced, and to allow the sidecar to remain minimally privileged as to not need access to the user's ns. If the CM does not exist when creating the BA we would retry (wait and requeue).

Just to clarify do Principal allowed here, or admin can only specify resources?

In the workflow of how we provisioner BA, the principal is not known upfront. The policy should only be about the resources themselves, and when the access is minted, the policy is applied to the minted prinicipal

saad-ali · 2020-10-07T08:29:49Z

keps/sig-storage/1979-object-storage-support/README.md

+1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
+1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
+1. `mintedSecretName`: name of the admin created or provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and must be copied to the app namespace by the COSI controller. **Note:** the provisioner's namespace is contained in the registered driver name.
+1. `policyActionsConfigMapData`: encoded data that contains a set of provisioner/platform defined policy actions to a given user identity. Contents of the ConfigMap that the *policyActionsConfigMap* field in the `BucketAccessClass` refers to.


BucketAccessClass allows you to specify a namespace for policyActionsConfigMap. Is the namespace here assumed to be the same as the corresponding BucketAccessRequest? If so, what happens if BucketAccessRequest points to a BucketAccessClass where the config map is pointing to a different namespace?

I would think more typically the BAC.policyActionsConfigMap's ns would be the same as the provisioner. It could be a security issue if this CM existed in the user ns, right?

jeffvance · 2020-10-07T22:56:24Z

keps/sig-storage/1979-object-storage-support/README.md

+
+#### BucketAccessRequest
+
+A user namespaced custom resource representing an object store user and an access policy defining the user’s relation to that storage.  A user creates a `BucketAccessRequest` (BAR) in the app's namespace (which is the same namespace as the `BucketRequest`) A 'BucketAccessRequest' specifies *either* a ServiceAccount or a secret name which contains access policy.  Specifying a ServiceAccount enables provisioners to support cloud provider identity integration with their respective Kubernetes offerings.


FIX: secret name is no longer part of the BAR.

Just to clarify, do that means serviceAccountName becomes required now. If possible can u please explain how SA can be connected to user in storage provider, also can single service account can attach multiple BARs?

I have the same concern regarding sharing service account across multiple BARs. I don't see a good mechanism to disallow that if we decide to do so.
If we share a service account there might be edge-cases related to granting/revoking access. For example if we share a service account across BARs with intersecting permissions, how we can properly revoke these permissions when a user deletes one of the BARs?

@oleksiys This is something we went over. Conflicts can arise and we decided that if two different access policies are requested for the same SA, then we reject the second request.

We'll update the KEP to reflect this. This was discussed in the review session: https://docs.google.com/presentation/d/180c1-fcRvGUU_AqMbe5swsDAbNg2yvLdQYdxnXVP9Xs/edit#slide=id.g907055d5e0_0_87

note to FIX this --
@thotz wrote:

Just to clarify, do that means serviceAccountName becomes required now. If possible can u please explain how SA can be connected to user in storage provider?

BAR.serviceAccountName is still optional. If the name is provided then it will be used by the backend object store to map the SA to a user identity. If SA is omitted then the backend will mint a principal who is the user for this bucket.

also can single service account can attach multiple BARs?

Within the same namespace, multiple BARs could reference the same SA. BARs cannot reference a SA in a different ns.

thotz · 2020-10-08T05:58:56Z

keps/sig-storage/1979-object-storage-support/README.md

+4. `provisioner`: (optional) The provisioner field as defined in the `BucketClass`. If empty then there is no driver/sidecar, thus the admin had to create the `Bucket` and `BucketAccess` instances.
+5. `retentionPolicy`: Prescribes the outcome when the `BucketRequest` is deleted.
+   - _Retain_: (default) the `Bucket` and its data are preserved. The `Bucket` can potentially be reused.
+   - _Delete_: the bucket and its contents are destroyed.


If I understand correctly the "retentionPolicy Delete" is applicable to both greenfield and brownfield cases,
so that means Existing buckets can be removed COSI or it will just remove the Bucket CR and keep the actual backend bucket.
A similar issue can be also seen in the case of existing OBC/OB.

Yeah, this follows the exact same pattern as PVs.

can you explain what is OBC and what the "issue" is?

OB/OBC are resources defined in the object bucket library

Yeah, this follows the exact same pattern as PVs.

So, a retentionPolicy = "Delete" will deleted the backend bucket [1] and, of course, the k8s resources associated with that bucket. It works this way for greenfield and brownfield.

1- it is ultimately up to each driver whether or not the backend bucket is actually deleted. For example, instead of permanently deleting the bucket, the driver could archive/rename it etc. However, whatever the driver implements has to be idempotent and reentrant.

Yeah, this follows the exact same pattern as PVs.

can you explain what is OBC and what the "issue" is?

It is not an "issue" issue, I meant for brownfield cases it may have objects prior to bucket provisioning will be removed as part of Delete policy.

thotz · 2020-10-08T06:03:14Z

keps/sig-storage/1979-object-storage-support/README.md

+   - _Retain_: (default) the `Bucket` and its data are preserved. The `Bucket` can potentially be reused.
+   - _Delete_: the bucket and its contents are destroyed.
+> Note: the `Bucket`'s retention policy is set to "Retain" as a default. Exercise caution when using the "Delete" retention policy as the bucket content will be deleted.
+> Note: a `Bucket` is not deleted if it is bound to a `BucketRequest` or `BucketAccess`.


This may contradictory Delete Bucket section, "Delete bucket his means that even if a BucketAccess references a Bucket, the "Bucket is still deleted and the BA is orphaned"

yeah, your understanding is correct. I'm not sure if you're asking a question, or just explaining what you're understanding

...to a BucketRequest or BucketAccess.

the "or BucketAccess" is wrong and was missed in earlier edits

Yeah it was not a question, just pointed out two different statements

thotz · 2020-10-08T06:05:01Z

keps/sig-storage/1979-object-storage-support/README.md

+1. `protocol.version`: (optional) specifies the desired version of the `protocol`. For "s3", a value of "v2" or "v4" could be used. 
+1. `bucketPrefix`: (optional) prefix prepended to a generated new bucket name, eg. “yosemite-photos-". If `bucketInstanceName` is supplied then `bucketPrefix` is ignored because the request is for access to an existing bucket. If `bucketPrexix` is specified then it is added to the `Bucket` instance as a label.
+1. `bucketClassName`: (optional for greenfield, omitted for brownfield) name of the `BucketClass` used to provision this request. If omitted for a greenfield bucket request, a default bucket class matching the protocol, if available, is used. If the greenfield bucket class is missing or does not support the requested protocol, an error is logged and the request is retried (with exponential backoff). A `BucketClass` is necessary for greenfield requests since BCs support a list of allowed namespaces. A `BucketClass` is not needed for brownfield requests since the `Bucket` instance, created by the admin, also contains `allowedNamespaces`.
+1. `bucketInstanceName`: (optional) name of the cluster-wide `Bucket` instance. If blank, then COSI assumes this is a greenfield request and will fill in the name during the binding step. If defined by the user, then this names the `Bucket` instance created by the admin. There is no automation available to make this name known to the user.


If both bucketPrefix and BucketInstanceName in BucketRequest CR are empty, whether COSI will create a bucket with a random name?

@thotz This is already explained in the KEP . The BucketInstanceName is filled in by COSI, and the bucketPrefix is optional. The generated name follows the format br-${uuid}

FIX: delete "There is no automation available to make this name known to the user."

thotz · 2020-10-08T06:13:57Z

keps/sig-storage/1979-object-storage-support/README.md

+
+#### BucketAccessRequest
+
+A user namespaced custom resource representing an object store user and an access policy defining the user’s relation to that storage.  A user creates a `BucketAccessRequest` (BAR) in the app's namespace (which is the same namespace as the `BucketRequest`) A 'BucketAccessRequest' specifies *either* a ServiceAccount or a secret name which contains access policy.  Specifying a ServiceAccount enables provisioners to support cloud provider identity integration with their respective Kubernetes offerings.


Just to clarify, do that means serviceAccountName becomes required now. If possible can u please explain how SA can be connected to user in storage provider, also can single service account can attach multiple BARs?

thotz · 2020-10-08T06:18:23Z

keps/sig-storage/1979-object-storage-support/README.md

+1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
+1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
+1. `mintedSecretName`: name of the admin created or provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and must be copied to the app namespace by the COSI controller. **Note:** the provisioner's namespace is contained in the registered driver name.
+1. `policyActionsConfigMapData`: encoded data that contains a set of provisioner/platform defined policy actions to a given user identity. Contents of the ConfigMap that the *policyActionsConfigMap* field in the `BucketAccessClass` refers to.


Just to clarify do Principal allowed here, or admin can only specify resources?

thotz · 2020-10-08T06:24:59Z

keps/sig-storage/1979-object-storage-support/README.md

+> Note: the `Bucket`'s retention policy is set to "Retain" as a default. Exercise caution when using the "Delete" retention policy as the bucket content will be deleted.
+> Note: a `Bucket` is not deleted if it is bound to a `BucketRequest` or `BucketAccess`.
+6. `allowedNamespaces`: a list of namespaces that are permitted to either create new buckets or to access existing buckets. This list is static for the life of the `BucketClass`, but the `Bucket` instance's list of allowed namespaces can be mutated by the admin.
+7. `parameters`: (optional) a map of string:string key values.  Allows admins to set provisioner key-values.


Can this field utilise for the backend-specific features like bucket notifications, object quota, bucket policies etc?

Yeah, the field is completely opaque to COSI. You can utilize it for those functions.

jeffvance · 2020-10-08T23:32:24Z

keps/sig-storage/1979-object-storage-support/README.md

+  bucketAvailable: [12]
+```
+
+1. `name`: When created by COSI, the `Bucket` name is generated in this format: _"<BR.namespace*>-<BR.name*>-<uuid>"_. If an admin creates a `Bucket`, as is necessary for brownfield access, they can use any name. The uuid is unique within a cluster. `*` the first 100 characters of the namespace and the name are used.


FIX this and BA: name: "br-"+uuid

jeffvance · 2020-10-08T23:33:49Z

keps/sig-storage/1979-object-storage-support/README.md

+5. `retentionPolicy`: Prescribes the outcome when the `BucketRequest` is deleted.
+   - _Retain_: (default) the `Bucket` and its data are preserved. The `Bucket` can potentially be reused.
+   - _Delete_: the bucket and its contents are destroyed.
+> Note: the `Bucket`'s retention policy is set to "Retain" as a default. Exercise caution when using the "Delete" retention policy as the bucket content will be deleted.


FIX: consider deleting all notes

FIX: Bucket can be deleted even if the BAR/BA remain.

k8s-ci-robot · 2020-10-12T20:10:41Z

New changes are detected. LGTM label has been removed.

…e references to accessSecret, clarity principal and serviceAcct

saad-ali

Github Unicorn is making this very difficult to review. Should we consider closing this PR, and reopening it new?

keps/sig-storage/1979-object-storage-support/README.md

saad-ali · 2020-10-15T06:25:55Z

keps/sig-storage/1979-object-storage-support/README.md

+  finalizers:
+    - cosi.io/finalizer [2]
+spec:
+  protocol:


I think the use case I described above is an important one:

Imagine you have an app that is deployed across multiple cloud providers, and bundles multiple SDKs to speak the native object protocol on each environment. The app dev should be able to define a single BucketRequest that will work across all those environments.

Having to create a new BR for each platform breaks the portability we are trying to achieve.

saad-ali · 2020-10-15T06:28:59Z

keps/sig-storage/1979-object-storage-support/README.md

+  bucketRequest: [8]
+  allowedNamespaces: [9]
+    - name:
+  protocol: [10]


Along the lines of intuitive and less complex, we are considering eliminating protocol from the BR. The reason why is that the BC reference in the BR is sufficient to know the protocol. The only value today in BR.protocol is that COSI can validate the BC's protocol matches the BR's. If validation fails then the BR is retried as is typical in k8s. An issue with this is the BC is immutable so the admin would need to recreate the BC with the correct protocol. Or, the BR needs to be deleted and recreated to use the right protocol. Both of these scenarios are unnecessary if BR.prtocol is removed.

Remember that BC may be deleted or changed (deleted and recreated with different values but same name), therefore Bucket should not depend on a reference to BC -- it has to be self sufficient even if the BC used to create it is deleted.

saad-ali · 2020-10-15T06:38:13Z

keps/sig-storage/1979-object-storage-support/README.md

+```
+
+1. `provisioner`: (required) the name of the vendor-specific driver supporting the `protocol`.
+1. `isDefaultBucketClass`: (optional) boolean, default is false. If set to true then a `BucketRequest` may omit the `BucketClass` reference. If the greenfield `BucketRequest` skips the `BucketClass` and a default `BucketClass`'s protocol matches the `BucketRequest`'s protocol then the default bucket class is used; otherwise an error is logged. It is possible that more than one `BucketClass` of the same protocol is set as the default class. In this case, it is non-deterministic which `BucketClass` is used as the default.


It is possible that more than one BucketClass of the same protocol is set as the default class. In this case, it is non-deterministic which BucketClass is used as the default.

We should align with the current predictable StorageClass behavior here.

Storageclasses have (or had) this same issue. We have 2 choices: 1) create a BC admission controller that rejects a new default BC if it matches an existing default BC for the given protocol, 2) let the result be undefined, meaning the first BC retrieved that matches the BR's protocol wins.

There is a third option which is what StorageClass does, runtime validation (at volume provision time rather than object creation time) -- fail CreateVolume (or in this case fail CreateBucket) with error indicating their are multiple defaults.

BC will have admission webhook to enforce no more than 1 default, so we'll be able to catch this problem at BC creation time. KEP will be updated.

@jeffvance Is this possible without race condition?

saad-ali · 2020-10-15T06:41:32Z

keps/sig-storage/1979-object-storage-support/README.md

+
+1. `provisioner`: (required) the name of the vendor-specific driver supporting the `protocol`.
+1. `isDefaultBucketClass`: (optional) boolean, default is false. If set to true then a `BucketRequest` may omit the `BucketClass` reference. If the greenfield `BucketRequest` skips the `BucketClass` and a default `BucketClass`'s protocol matches the `BucketRequest`'s protocol then the default bucket class is used; otherwise an error is logged. It is possible that more than one `BucketClass` of the same protocol is set as the default class. In this case, it is non-deterministic which `BucketClass` is used as the default.
+1. `protocol`: (required) protocol supported by the associated object store. This field validates that the `BucketRequest`'s desired protocol is supported.


Should this also contain version information like BucketRequest?

@jeffvance jeffvance 7 days ago •
Author
The BC should contain the protocol version as an optional field.
If we remove BR.protocol then version should also be removed from the BR.

Remember the rule that BC can be deleted and Bucket object must be self-sufficient. Therefore version info seems crucial here. Depending on BR is also not sufficient (if we end up, as I am proposing above, to have a "list of protocols/versions" on BR).

Thanks for catching this omission. BC's protocol is the same struct as used in the BR. Will update.

saad-ali · 2020-10-15T06:48:18Z

keps/sig-storage/1979-object-storage-support/README.md

+1. `bucketInstanceName`:  name of the `Bucket` instance bound to this BA.
+1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
+1. `serviceAccount`: an `ObjectReference` containing the name, namespace and UID of the associated `BAR.serviceAccountName`. If empty then integrated Kubernetes -> cloud identity is not being used, in which case, `BucketAccess.principal` contains the user identity, which is minted by the provisioner.
+1. `mintedSecretName`: name of the provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and is copied to the app namespace by the COSI controller.


Can you describe this process? Who creates mintedSecretName in provisioner's namespace? How do you prevent name collisions? Why does it need to be copied to the app namespace? Why can't secret just be provisioned in app namespace?

@jeffvance jeffvance 7 days ago Author
A provisioner does not have access to the user's namespace, and the minted secret lives in provisioner namespace. This secret is created by either the admin (static) or by COSI when it is the result of backend bucket access creation. It contains the credential returned by the driver. Name collisions are prevented by using UUID(secret-uuid).

Why does it need to be copied to the app namespace? Why can't the secret just be provisioned in app namespace to begin with?

Copying the secret is a carryover from an earlier design. Will update kep to reflect the current design, which adheres to minimal privileges such that the provisioner does not need permissions to access user namespaces. Instead, the cosi-node-adapter will write the secret contents (provisioner's ns) to the app pod's secret mount.

k8s-ci-robot requested review from childsb and saad-ali November 26, 2019 05:11

copejon reviewed Nov 26, 2019

View reviewed changes

jeffvance force-pushed the buckets branch 2 times, most recently from ad52b53 to 0794c74 Compare November 27, 2019 01:34

travisn reviewed Dec 8, 2019

View reviewed changes

jeffvance commented Dec 8, 2019

View reviewed changes

alarge reviewed Dec 9, 2019

View reviewed changes

jsafrane reviewed Dec 13, 2019

View reviewed changes

jeffvance mentioned this pull request Dec 13, 2019

Change key names in storage class parameters section kube-object-storage/lib-bucket-provisioner#182

Open

jeffvance force-pushed the buckets branch 2 times, most recently from 1fcdf62 to 65a2242 Compare December 20, 2019 22:52

xing-yang reviewed Oct 1, 2020

View reviewed changes

wlan0 force-pushed the buckets branch 2 times, most recently from e7dce75 to e18d289 Compare October 1, 2020 19:22

wlan0 force-pushed the buckets branch from e18d289 to f878218 Compare October 2, 2020 01:16

k8s-ci-robot assigned xing-yang Oct 2, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 2, 2020

saad-ali reviewed Oct 7, 2020

View reviewed changes

jeffvance commented Oct 7, 2020

View reviewed changes

thotz reviewed Oct 8, 2020

View reviewed changes

jeffvance commented Oct 8, 2020

View reviewed changes

travisn mentioned this pull request Oct 12, 2020

ceph: bucket notifications CRD documentation rook/rook#6425

Merged

10 tasks

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2020

jeffvance force-pushed the buckets branch from b050c93 to dfef5b0 Compare October 12, 2020 20:14

wlan0 and others added 2 commits October 12, 2020 13:15

[BUCKET API] Object Storage support in Kubernetes

564df67

[BUCKET API] reduce num of notes, simplify B and BA.name, remove stal…

ab7f4e1

…e references to accessSecret, clarity principal and serviceAcct

jeffvance force-pushed the buckets branch from dfef5b0 to ab7f4e1 Compare October 12, 2020 20:16

saad-ali reviewed Oct 15, 2020

View reviewed changes

[BUCKET API] clarify BC copy to B, add protocol version to BC

e2c34d4

wlan0 mentioned this pull request Oct 16, 2020

Object bucket provisioning #2100

Merged

k8s-ci-robot merged commit e2c34d4 into kubernetes:master Oct 20, 2020

innobead mentioned this pull request Mar 21, 2023

[FEATURE] S3/Object Gateway Support - Experimental longhorn/longhorn#4154

Closed

kastl-ars mentioned this pull request May 15, 2024

Archive this repository? kube-object-storage/lib-bucket-provisioner#229

Open

galexrt mentioned this pull request Nov 26, 2024

OBC/COSI: Include OBC namespace in bucket metrics galexrt/extended-ceph-exporter#16

Open


		### Moving Parts

		Before deploying an object bucket consuming app, the OB and OBC CRDs need to be created with RBAC rules granting _get_, _list_, and _watch_ verbs.


		- `Run` is a required controller method called by provisioners to start the OBC controller.

		- `SetLabels` is an optional controller method called by provisioners to define the labels applied to the Kubernetes resources created by the library.


		## Proposal

		Create a bucket provisioning library, similar to the Kubernetes [sig-storage-lib-external-provisioner](https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go) library, which handles all of the controller and Kubernetes resources related to provisioning.


		As is true for PVCs-PVs, there is a 1:1 relationship between an OBC and an OB, and as will be seen below, there is a [_binding_](#binding) between an OBC and OB.

		OBCs reference a storage class. The storage class references the external provisioner, defines a reclaim policy, and specifies object-store specific parameters, such as region, owner secret, bucket lifecycle policies, etc.


		## ProvisonerCreateBucket

		This call is made to create the bucket in the backend. If a bucket that matches both name and parameters already exists, then OK (success) must be returned. If a bucket by same name, but different parameters is provided, then the appropriate error code `ALREADY_EXISTS` must be returned by the provisioner.


		## ProvisonerDeleteBucket

		This call is made to delete the bucket in the backend. If the bucket has already been deleted, then no error should be returned.


		#### BucketAccessRequest

		A user namespaced custom resource representing an object store user and an access policy defining the user’s relation to that storage. A user creates a `BucketAccessRequest` (BAR) in the app's namespace (which is the same namespace as the `BucketRequest`) A 'BucketAccessRequest' specifies either a ServiceAccount or a secret name which contains access policy. Specifying a ServiceAccount enables provisioners to support cloud provider identity integration with their respective Kubernetes offerings.

object bucket provisioning #1383

object bucket provisioning #1383

Conversation

jeffvance commented Nov 26, 2019 • edited Loading

copejon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Nov 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance commented Dec 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Dec 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Dec 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Dec 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Dec 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Dec 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Dec 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Dec 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Dec 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Dec 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffvance Dec 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

copejon Dec 12, 2019 • edited Loading

Choose a reason for hiding this comment

jeffvance Dec 20, 2019 • edited Loading

Choose a reason for hiding this comment

alarge Dec 9, 2019 • edited Loading

jeffvance commented Nov 26, 2019 •

edited

Loading

jeffvance Nov 26, 2019 •

edited

Loading

jeffvance Dec 8, 2019 •

edited

Loading

jeffvance Dec 8, 2019 •

edited

Loading

jeffvance Dec 8, 2019 •

edited

Loading

jeffvance Dec 11, 2019 •

edited

Loading

jeffvance Dec 11, 2019 •

edited

Loading

jeffvance Dec 11, 2019 •

edited

Loading

jeffvance Dec 11, 2019 •

edited

Loading

jeffvance Dec 20, 2019 •

edited

Loading

jeffvance Dec 11, 2019 •

edited

Loading

jeffvance Dec 13, 2019 •

edited

Loading

copejon Dec 12, 2019 •

edited

Loading

jeffvance Dec 20, 2019 •

edited

Loading

alarge Dec 9, 2019 •

edited

Loading

jeffvance Dec 20, 2019 •

edited

Loading

jeffvance Dec 13, 2019 •

edited

Loading