Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object bucket provisioning #1383

Merged
merged 3 commits into from
Oct 20, 2020
Merged

Conversation

jeffvance
Copy link
Contributor

@jeffvance jeffvance commented Nov 26, 2019

KEP for bucket provisioning API.

cc @copejon @erinboyd @saad-ali @alarge @guymguym @travisn

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Nov 26, 2019
Copy link

@copejon copejon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, more to come

keps/sig-storage/20191125-bucket-provisioning.md Outdated Show resolved Hide resolved
Without a formal API to describe and manage bucket provisioning, storage providers are left to write their own provisioners including all of the necessary controller code.
This "wild west" landscape for bucket provisioning is potentially confusing, overly complex, inconsistent, may slow down Kubernetes adoption by object store users, and may hinder application portability.

This KEP is a proposal for a consistent bucket provisioning API which is object store agnostic and provides a familiar, intuitive management plane.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call out more specifically that it's agnostic to protocols/APIs (s3, gcs, etc).

e.g.

"We propose a common Kubernetes control-plane API for the management of object store bucket lifecycles which has no dependencies on underlying provider interfaces."

### Non-Goals

+ Update the native Kubernetes PVC-PV API to support object buckets.
+ Support and manage object store data planes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add

"Provide a POSIX-like interface for object storage protocols"

Or something to call out that we're not trying to write a kube native object store interface

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the POSIX comparison. POSIX has ACLs, write/read verbs, etc.

read/write verbs are not in scope, per above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @copejon. It's not a comparison, it's making very clear this is a non-goal. From experience, way too many people conflate object storage with file-like access. Making clear this is not a project to provide any kind of POSIX/file-like access to object storage would definitely be a clarification for some.

keps/sig-storage/20191125-bucket-provisioning.md Outdated Show resolved Hide resolved

### Moving Parts

Before deploying an object bucket consuming app, the OB and OBC CRDs need to be created with RBAC rules granting _get_, _list_, and _watch_ verbs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append "delete" verb for secrets, cms, and obs.

Depending on the object store, a secret in the object-store's namespace, containing new bucket owner credentials, may need to be created.
The OBCs needed by the app should exist in the app's namespace.

As is true for dynamic PV provisioning, a bucket provisioner pod needs to be running for each object-store created in the Kubernetes cluster.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a bucket provisioner pod" --> "bucket provisioning operator"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"operator" is somewhat of a red hat technique. Pod is accurate isn't it? It's true that this pod could do more than bucket provisioning -- is that your concern?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pod -> service?

Depending on the object store, a secret in the object-store's namespace, containing new bucket owner credentials, may need to be created.
The OBCs needed by the app should exist in the app's namespace.

As is true for dynamic PV provisioning, a bucket provisioner pod needs to be running for each object-store created in the Kubernetes cluster.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"needs to be running for each object-store created in the Kubernetes cluster."

It's technically possible that a single operator server more than one object store of the same provider. I think you mean

"Bucket provisioning operators are specialized for the storage provider for which they are written"

OBCs reference a storage class. The storage class references the external provisioner, defines a reclaim policy, and specifies object-store specific parameters, such as region, owner secret, and bucket lifecycle polcies.
For brownfield provisioning, the storage class also specifies the name of the existing bucket so that app developers are not burdened with typically long, random bucket names.

### Moving Parts
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good from a readability point to create 2 sub sections: Greenfield and Brownfield. This would de-tangle the logical paths of each.

Copy link
Contributor Author

@jeffvance jeffvance Nov 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added sub-sections under Moving Parts for greenfield and brownfield.

A deleted _brownfield_ OBC is similar except that provisioners typically will not delete the physical bucket. Instead, bucket access is revoked and related artifacts are expected to be cleaned up.
Again, the library will delete the secret, configmap and OB.

### API
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"API" is used interchangeably up to this point to reference both the k8s API (CRDs) and the library's interface. How about defining terms for each at the start ("API" == CRDs, "Interface" == library methods)


- **`Run`** is a required controller method called by provisioners to start the OBC controller.

- **`SetLabels`** is an optional controller method called by provisioners to define the labels applied to the Kubernetes resources created by the library.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be shortened to "allows for setting custom labels on library objects"

@jeffvance jeffvance force-pushed the buckets branch 2 times, most recently from ad52b53 to 0794c74 Compare November 27, 2019 01:34
@jeffvance
Copy link
Contributor Author

The S3 provisioner pod watches for OBCs whose storage classes point to the AWS S3 provisioner, while ignoring all other OBCs.
Likewise, the same cluster can also run the Rook-Ceph RGW provisioner, which also watches OBCs, only handling OBCs that reference storage classes which define ceph-rgw.

**Note:** it is possible for one provisioner to handle OBCs for different instances of the same type of object store.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be expected that a provisioner would handle OBCs for a given type of object store? Or this is just saying it's possible to have multiple provisioners for each type of object store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example: a rook-ceph object store runs in namespace A and another rook-ceph object store runs in namespace B. A ceph-RGW provisioner runs in namespace C, where C could == A or B or any. Then that single provisioner could handle OBCs referencing storage classes which point to stores in A or B.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, is this decision an implementation detail of the provisioner? In rook-ceph's case, I'm thinking for minimal privs the provisioner should only provision from the object store(s) in a given cluster. If you have multiple object stores in the same rook-ceph cluster (same namespace), you would have a single provisioner. But if you have multiple rook clusters (not as common), a separate provisioner may be used. If a single provisioner is needed, that provisioner needs RBAC access to all the rook-ceph clusters. Seems like it could be a deployment decision by the admin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really good to think about minimal privileges and, in fact, that's why this KEP proposes to make OBs namespaced rather than cluster scoped. However, provisioners need to be able to create secrets and configMaps in many namespaces (the ns of the OBC), and now need to create the OB in the ns of the store. If the admin wants a 1:1 mapping of object stores and provisioners s/he can limit write access to a single store. In that case, if an OBC references a storage class which points to a store in a different ns (w/o RBAC rules) then the library's OB creation will fail and the lib's retry code will run.

This is a long way of saying that I should delete the note above since this is an admin detail.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be expected that a provisioner would handle OBCs for a given type of object store?

Yes. It's just saying that a provisioner can handle different 'flavors' of OBCs that all reference the provisioner. This would be represented by having more than 1 storage class for 1 object store where each SC encapsulates different feature sets.

The secret and configmap reside in the same namespace as the OBC, and the OB lives in the provisioner's namespace.
The app pod consumes the secret and configmap and thus can reference the bucket.

When a _greenfield_ OBC is deleted the associated provisioner is expected to delete the newly provisioned bucket and the related object store-specific artifacts.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a retain policy for the bucket, right? This makes it sound like the bucket is always deleted when the OBC is deleted.

Copy link
Contributor Author

@jeffvance jeffvance Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retain policy is defined in the storage class and also stored in the OB. The Delete method passes the OB to the provisioner. The lib decides to call Delete rather than Revoke because the OBC is for a greenfield bucket. Provisioners have full control over whether or not to delete the bucket. The expectation is that a provisioner's Delete method will delete the bucket but it might not and still return a nil error. Likewise, the expectation is that a provisioner's Revoke method will not delete the bucket but it might and still return a nil error. The lib will always remove the k8s resources it created when a nil error is returned.

**Note:** it was mentioned at the 2019 NA Kubecon storage face-to-face that to improve scaling we could decide to move the configmap info into the secret, thus reducing k8s resources required.
The OB is an abstraction of the bucket and can save some state data needed by provisioners.
The secret and configmap reside in the same namespace as the OBC, and the OB lives in the provisioner's namespace.
The app pod consumes the secret and configmap and thus can reference the bucket.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple client pods can consume the same OBC to read/write to the same bucket, right?

Copy link
Contributor Author

@jeffvance jeffvance Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, subject to object store access rules. k8s has no opinion on this since there is no accessMode defined for buckets.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why we don't want accessMode input(opinion) from K8s side, presumably in claim? Read-only and Read-write access to buckets seem to be reasonable use cases. I'm imagining in a brownfield scenario, user can have a bucket setup for both consumer and producer application, accessMode would be handy to limit consumer to read-only access and allow producer to have read write access. Also read-only vs read-write seem to be well-supported across major object storage vendors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that there are good use cases for accessMode. My (admittedly dated) understanding is that accessMode is unenforceable by k8s. Is this still true @jsafrane?
For bucket provisioning, the underlying object store would need to support various access modes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, AccessMode is used only during binding / provisioning, but it's never used when running a pod - you can have multiple pods using the same RWO volume in parallel.

The secret contains the credentails needed to access the bucket and the configmap contains bucket endpoint information.
**Note:** it was mentioned at the 2019 NA Kubecon storage face-to-face that to improve scaling we could decide to move the configmap info into the secret, thus reducing k8s resources required.
The OB is an abstraction of the bucket and can save some state data needed by provisioners.
The secret and configmap reside in the same namespace as the OBC, and the OB lives in the provisioner's namespace.
Copy link
Contributor Author

@jeffvance jeffvance Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a single provisioner for the same type of object store should handle this object store living in multiple namespaces, then the OB needs to reside in the object store's ns not the provisioner's ns.

## Summary

Kubernetes natively supports dynamic provisioning for file and block storage but lacks support for object bucket provisioning.
Without a formal API to describe and manage bucket provisioning, storage providers are left to write their own provisioners including all of the necessary controller code.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncomfortable with the implied idea that object bucket provisioning is just an extension of the existing block and file support.

Block and file support are accessed from within containers using local standard APIs. Object storage is logically just a remote service - there's nothing in particular that makes it inherently similar to block/file. Imagine that you wanted to be able to support provisioning against various "relational database as a service" products (e.g., RDS, CloudSQL, etc.). You could imagine an api for things like "create a database" and "create credentials for accessing this database". And you'd interact with the database via some sort of SQL client. I see such a system as a much closer analog for object storage than block/file.

Copy link
Contributor Author

@jeffvance jeffvance Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. When I think of storage I think of file, block and object. I tend to lump object with file and block more than with databases because of its simplicity and narrower scope.
Of course the fact that there is no mount for object makes it similar, from a pod's perspective, to a database. And, from a provisioner's perspective I suppose there's no real difference between a bucket API and a DB API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave some time for others to chime in and then will push changes to this doc to reflect your comments.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course the fact that there is no mount for object makes it similar, from a pod's perspective, to a database.

Agreed. Given that the data path is so different between file/block and object, it's better to describe it as such. We've already seen that by lumping object in with file/block, there's been some assumptions about whether we're also proposing a POSIX analog for bucket interfaces, which we're not.

### Goals

+ Define a _control-plane_ object bucket management API thus relieving object store providers from Kubernetes controller details.
+ Minimize technical ramp-up for storage vendors who have already contributed external storage provisioners (file and block).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why "who have already contributed..." is relevant. Minimizing ramp-up is good, but again (per previous comment), I don't think there is any inherent association between object and volumes (block and file).

Copy link
Contributor Author

@jeffvance jeffvance Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree there is not necessarily an inherent association, per your comment above. Some storage vendors provide file, block and object store. My thinking was that there could be some leverage if a file/block vendor decided to write their own bucket provisioner.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a file/block vendor decided to write their own bucket provisioner.

The feedback we've gotten on the bucket provisioning library ("why not make it CSI-like?") makes even more sense when considering adoption by vendors into existing provisioners. For one, CSI being the standard model for file/block means that it may actually be more cumbersome to incorporate the library. Being written in go immediately creates more work for provisioner authors who don't use the language. And the library pattern be in fact be unfamiliar to provisioner authors who jumped straight to CSI.


+ Define a _control-plane_ object bucket management API thus relieving object store providers from Kubernetes controller details.
+ Minimize technical ramp-up for storage vendors who have already contributed external storage provisioners (file and block).
+ Make bucket consumption similar to file or block consumption using familiar concepts and commands.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is bucket consumption that you are trying to make similar - I think it is bucket provisioning.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's the elements of the control plane that are meant to be familiar. It should not propose an abstraction of object store APIs specifically.

+ Define a _control-plane_ object bucket management API thus relieving object store providers from Kubernetes controller details.
+ Minimize technical ramp-up for storage vendors who have already contributed external storage provisioners (file and block).
+ Make bucket consumption similar to file or block consumption using familiar concepts and commands.
+ Use native Kubernetes resources where possible, again, to keep the bucket experience for users and admins similar to existing storage provisioning.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this refers to the decision to overload StorageClass for parameters. I do not see the value in this - I see an anti-value. Volume provisioning is fundamentally different than provisioning some external entity. Artificial association is more likely to confuse IMO than reduce confusion.

Copy link
Contributor Author

@jeffvance jeffvance Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concern should be discussed.
My team feels it is important to use native storage classes for the reason cited. We know others who use a custom resource, similar to a storage class, but specific to their object store needs.
An advantage of a CR is you get api validation vs the parameters map of storage classes, and a CR could handle brownfield buckets more intuitively than this proposal.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several benefits to a separate CR:

  • Avoiding cognitive dissonance - StorageClass is used for volume operations. Other classes (e.g., VolumeSnapshotClass) have been introduced for other class-style parameterization. Adding ObjectBucketClass would seem to be the most logical approach.

  • You can evolve it independently of StorageClass, which responds to requirements from block/file (you don't have to bring them along, and you don't have to be brought along).

  • StorageClass is non-namespaced. You've made progress towards making the object support install-able and usable without requiring write access to non-namespace objects. If you had a single well-known namespace for object operations, ObjectBucketClass resources could go into that namespace (along with provisioners and OBs).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all pretty compelling points for a bespoke BucketClass object. We leaned on SCs initially because of assumptions we made about the system being familiar to k8s users. However, like you point out, we're beholden to the API which is exclusively intended for file/block. Namespacing the classes with their provisioners also lends itself to scalability.


## Proposal

Create a bucket provisioning library, similar to the Kubernetes [sig-storage-lib-external-provisioner](https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go) library, which handles all of the controller and Kubernetes resources related to provisioning.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After years of dealing with the issues of having multiple vendors support provisioning of volumes, the community decided on an approach that abstracts the control plane behind a simple gRPC API, allowing for provisioning drivers to be effectively independent of Kubernetes. I believe that this is the approach we should also use for object. Instead of providing a library and asking object vendors to implement a controller using the library, I think we should provide top-level and sidecar controllers and define the interface via gRPC. This massively simplifies the process of writing and deploying a driver. Using a central top-level controller (e.g., see the 1.17 volume snapshot architecture) allows for a clean approach to CRD versioning and lifecycle management and standardizing observability metrics.

It also could theoretically allow for situations in the future where the gRPC functions could be optionally added to both CSI (for cases where volume storage providers also provide object support) and separately (e.g., via a new "OSI").

Copy link
Contributor Author

@jeffvance jeffvance Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see discussion on this suggestion.
I am not experienced enough in CSI to argue or agree with this comment. There are a couple of provsioners recently using this lib and my understanding is that the effort was relatively low.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having someone deliver the whole solution in a single controller creates a lot of problems. For example:

  • If there is a bug in the library that is discovered, how do you ensure that all the controllers using that library are updated?

  • If you have multiple controllers installed, who "owns" the CRDs? How do they get installed/upgraded?

  • If storage vendors are providing their own whole controllers, how do you ensure a degree of uniformity (and correctness) around observability?

The volume snapshot folks directly addressed this set of issues with the beta volume snapshot work and came up with a dual-control + driver architecture:

  • A single-instance global controller that handles most of the complicated creation, provisioner lookup, and binding logic

  • A very simple sidecar controller that serves primarily to convert k8s resources into gRPC calls

  • A plugin driver supplied by storage vendors that is kubernetes and language independent.

I'm advocating a very similar model here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 And since many storage vendors have already switched over to CSI drivers, it's going to be a hard sell to convince them they should adopt a model that k8s has moved on from.

Copy link
Contributor Author

@jeffvance jeffvance Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am starting to see value a CSI-like approach (maybe COSI or "cozy?).
@alarge can you elaborate on this point:

If there is a bug in the library that is discovered, how do you ensure that all the controllers using that library are updated?

I agree it's a potential problem with the library. But CRDs can support multiple versions and so can the lib (though it doesn't today). Can you explain how CSI mitigates this issue?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffvance A bug in this context could mean 2 kinds of bug: 1) kubernetes related bug(i.e CRD, controller common logic in the library) 2) object storage vendor related bug(i.e Provisioner bug). The library approach IMO seems to be unnecessarily coupling these 2 types of bugs. An issue on either side(K8s controller/CRD VS. Object Storage Provisioner) IMO shouldn't require attention from the other side. For example, a common library bug fix shouldn't require vendor to release a new version of their controller because there's no change in the "Provisioner" they've implemented. Having a CSI-like interface would allow us to decouple these 2 sides of the problem. For instance, a bug fix in the central controller would just require end-user to bump up the version number in the deployment yaml and object storage vendor is no involved at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the separation of provisioner bugs vs a csi bug. I'll add this point to the CSI Pros section of the KEP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up: it appears that accessMode is used by CSI. I don't yet know how it's enforced or controlled, but it is passed along in the request.

## Proposal

Create a bucket provisioning library, similar to the Kubernetes [sig-storage-lib-external-provisioner](https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go) library, which handles all of the controller and Kubernetes resources related to provisioning.
This library will implement a simple and familiar API via Custom Resources, and define a _contract_ between app developers and provisioners regarding the Kubernetes resources related to bucket provisioning.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, I think a gRPC API is a better form of contract than a golang-specific library interface.

Copy link
Contributor Author

@jeffvance jeffvance Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a point, when mentioning go, that gRPC has multi-language support?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was part of my point. But also the fact that the gRPC API is its own standalone thing, so documentation and version control tend to be handled better than internal library interfaces.


As is true for PVCs-PVs, there is a 1:1 relationship between an OBC and an OB, and as will be seen below, there is a [_binding_](#binding) between an OBC and OB.

OBCs reference a storage class. The storage class references the external provisioner, defines a reclaim policy, and specifies object-store specific parameters, such as region, owner secret, bucket lifecycle policies, etc.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per previous comment, I think it would be better to have an ObjectBucketClass to handle these issues rather than overloading StorageClass with functionality not related to volumes.


As is true for dynamic PV provisioning, a bucket provisioner pod needs to be running for each type of object-store existing in the Kubernetes cluster.
For example, for AWS S3, the developer creates an OBC referencing a storage class which references the S3 store.
This storage class needs to be created by the admin.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that using ObjectBucketClass instead of StorageClass may allow this class to be stored in a namespaced area along with the OBs

name: MY-BUCKET-1 [1]
namespace: USER-NAMESPACE [2]
spec:
bucketName: [3]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be allowed? (claim specifying exact bucket name)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean as opposed to assigning a randomly generated name?

To clarify, this field is ignored when specifying a brownfield storageClass (which I agree is a messy way of handling it, better suited to a BucketClass).

bucketName: [3]
generateBucketName: "photo-booth" [4]
storageClassName: AN-OBJECT-STORE-STORAGE-CLASS [5]
additionalConfig: [6]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like non-portable information, akin to passing through annotations.

Copy link
Contributor Author

@jeffvance jeffvance Dec 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. We want a way to pass provisioner specific information. The storage class (or bucketClass) allows this, but we were thinking of the case where, say region, was defaulted in the SC but could be overridden in the OBC. Of course each provisioner decides which of these fields to process or ignore.

bucketName: my-photos-1xj4a
region: # provisioner dependent
subRegion: # provisioner dependent
additionalConfigData: [] #string:string
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list of endpoint parameters feels specific to a particular provider. I feel like the goal of this should be to produce the fields that are needed by client SDKs and to be useful across at least the three main object APIs (S3, GCS, and Azure). Given that, I'd argue that a "data path protocol" should be supplied with the claim (and declared by drivers). The endpoint and credential information supplied by the driver (and recorded in the OB and Secret) should be appropriate for the given data path protocol (e.g., nearly directly usable by the standard API for that data protocol).

Copy link

@copejon copejon Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 It sounds like you're describing a pattern similar to the *VolumeSource subfields of PVs, which I agree would be a better fit. The overuse of string:string maps is something we need to address generally.

Copy link
Contributor Author

@jeffvance jeffvance Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alarge:

"data path protocol" should be supplied with the claim (and declared by drivers).

A goal was to not embed specific object store fields into the lib (though admittedly there is an s3 flavor). The reason for this is so object store vendors would not have to wait for their fields to be added. IOW, (I think to do this) the lib would have some switch dataPathProtocol code which has to know which fields go with which protocol -- at least that's what I'm thinking about how you'd implement this. Can you elaborate more?

BUCKET_PORT: 80 [8]
BUCKET_NAME: MY-BUCKET-1 [9]
BUCKET_REGION: us-west-1
... [10]
Copy link

@alarge alarge Dec 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per previous comment - these should be whatever is appropriate for the given data path protocol.

Copy link
Contributor Author

@jeffvance jeffvance Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this and have a "DataPathProtocol" type field in the OBC then app pods have little chance of being portable across different object store, right? Some degree (obviously not 100%) of portability is/was desired in the original design.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The app would be tethered to the protocol regardless as it's expected they've been written using some SDK. This doesn't restrict apps to a specific object provider, but they do need to be aware of what protocol is being used at the endpoint. That is, S3 is pretty common, but not ubiquitous, and this must be communicated.

containers:
- name: mycontainer
image: redis
envFrom: [1]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an example, right? Mounted secrets might be preferred (e.g., in the case of multiple bucket claims)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It would probably be helpful to add that case as well.

```
1. use `env:` if mapping of the defined key names to the env var names used by the app is needed.
1. makes available to the pod as env variables: BUCKET_HOST, BUCKET_PORT, BUCKET_NAME
1. makes available to the pod as env variables: ACCESS_KEY_ID, SECRET_ACCESS_KEY
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, specific to S3. Different variables for different data path protocols.

+ an _**ObjectBucket**_ (OB), similar to a PV, is the k8s representation of the provisioned bucket and may contain object store-specific data needed to allow the bucket to be de-provisioned. The OB resides in the same namespace as the object store, and is typically not seen by bucket consumers.
**Note:** namespaced OBs is a change from the original design and was suggested at the 2019 NA Kubecon storage face-to-face with the premise being that if there is no _technical_ reason for a cluster scoped resource then it should be namespaced.

As is true for PVCs-PVs, there is a 1:1 relationship between an OBC and an OB, and as will be seen below, there is a [_binding_](#binding) between an OBC and OB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to have two separate objects, OBC and OB? For PV/PVC it's security. If volume ID was in PVC, user could craft their own PVCs and guess volume ID of some other volume and steal its data. In this proposal, you give full bucket ID and secrets to access it to a pod. So user can guess ID of some other bucket and steal it in the pod.

Therefore you don't need OB object at all. All can be in OBC (and it could be called ObjectBucket instead of claim).

Copy link
Contributor Author

@jeffvance jeffvance Dec 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reason for the OB is to contain provisioner-specific data, eg. a newly created user name. The OB is passed to Delete and Revoke providing provisioners this static information so they can delete buckets and clean up policies w/o maintaining their state separately.
Another reason was to mimic the PVC - PV model even though now there are some who believe that design is overly complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these provisioner-specific data secret? In other words, can a malicious user misuse it to revoke / delete data of someone else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane The secret is created by the lib and lives in the same namespace as the obc. The secret contains s3-like access and secret key values, and a provider-specific map[string]string.
The secret cannot be used maliciously AFAICT, because the provisioner is expected to return creds that are specific to a single bucket and are not the owner's creds.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane This is dependent on the provisioner implementation. The library only passes data returned from Provision()/Grant() calls directly into the secret. Rook, for instance, creates a new Ceph user with limited rights and returns those credentials. That user may only access the generated/attached bucket, may not create buckets, and (as a design of Ceph), may not view or mutate other user's buckets.


## ProvisonerCreateBucket

This call is made to create the bucket in the backend. If a bucket that matches both name and parameters already exists, then OK (success) must be returned. If a bucket by same name, but different parameters is provided, then the appropriate error code `ALREADY_EXISTS` must be returned by the provisioner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "ProvisonerCreateBucket MUST be idempotent".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


## ProvisonerDeleteBucket

This call is made to delete the bucket in the backend. If the bucket has already been deleted, then no error should be returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "ProvisonerDeleteBucket MUST be idempotent".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [ ] (R) Design details are appropriately documented
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [X] (R) Graduation criteria is in place
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one probably should not be checked yet as I see you only have "Alpha" there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@wlan0 wlan0 force-pushed the buckets branch 2 times, most recently from e7dce75 to e18d289 Compare October 1, 2020 19:22
@xing-yang
Copy link
Contributor

My comments are addressed. Looks good to me. Please squash your commits.

@wlan0
Copy link
Member

wlan0 commented Oct 2, 2020

@xing-yang I've squashed the commits into one.

@xing-yang
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 2, 2020
@wlan0
Copy link
Member

wlan0 commented Oct 2, 2020

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jeffvance, wlan0
To complete the pull request process, please assign xing-yang after the PR has been reviewed.
You can assign the PR to them by writing /assign @xing-yang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

finalizers:
- cosi.io/finalizer [2]
spec:
protocol:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this to a list of protocols: to support asking for multiple protocols. Effectively this is a way for an app to say I understand this set of protocols.

Why? Imagine you have an app that is deployed across multiple cloud providers, and bundles multiple SDKs to speak the native object protocol on each environment. The app dev should be able to define a single BucketRequest that will work across all those environments.

Copy link
Contributor Author

@jeffvance jeffvance Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier drafts had a list of protocols in the BC and a single protocol in the BR. During reviews we concluded that if a vendor supports multiple protocols the easiest solution is to create 1 BC per protocol. We believe this approach is more intuitive and less complex. BC.protocol is passed to the driver which can reject the request if the protocol is not supported.

Along the lines of intuitive and less complex, we are considering eliminating protocol from the BR. The reason why is that the BC reference in the BR is sufficient to know the protocol. The only value today in BR.protocol is that COSI can validate the BC's protocol matches the BR's. If validation fails then the BR is retried as is typical in k8s. An issue with this is the BC is immutable so the admin would need to recreate the BC with the correct protocol. Or, the BR needs to be deleted and recreated to use the right protocol. Both of these scenarios are unnecessary if BR.prtocol is removed.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use case I described above is an important one:

Imagine you have an app that is deployed across multiple cloud providers, and bundles multiple SDKs to speak the native object protocol on each environment. The app dev should be able to define a single BucketRequest that will work across all those environments.

Having to create a new BR for each platform breaks the portability we are trying to achieve.

Copy link
Contributor Author

@jeffvance jeffvance Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saad-ali Are you describing a single app that speaks multiple protocols, ie .accessing >1 bucket and using a different protocol among buckets?

But I think regardless of the answer above, the app can reference 2+ BARs in the csi-driver portion of the pod spec. Each BAR references a BR. Each BR references a BC and the BC defines a single protocol (according to the current kep). I don't see a portability issue since we assume that the admin sets up the correct BCs and provisioners on the target cluster.

The app dev should be able to define a single BucketRequest that will work across all those environments.

There is some detachment between the BR and the app pod in that the pod does not reference BRs. If the app needs multiple buckets it doesn't seem unreasonable to require multiple BRs/BARs.

```

1. `provisioner`: (required) the name of the vendor-specific driver supporting the `protocol`.
2. `isDefaultBucketClass`: (optional) boolean, default is false. If set to true then a `BucketRequest` may not need to specify a `BucketClass`. If the greenfield `BucketRequest` omits the `BucketClass` and a default `BucketClass`'s protocol matches the `BucketRequest`'s protocol then the default bucket class is used; otherwise an error is logged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define what happens if for a given protocol, more than one BucketClass is marked as default (on StorageClass we fail provisioning of PVC with no StorageClass in that case).

Copy link
Contributor Author

@jeffvance jeffvance Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storageclasses have (or had) this same issue. We have 2 choices: 1) create a BC admission controller that rejects a new default BC if it matches an existing default BC for the given protocol, 2) let the result be undefined, meaning the first BC retrieved that matches the BR's protocol wins.

If we remove BR.protocol, and a greenfield BR omits the BC, then we have to figure out which BC we use as the default (since we could have a default BC per protocol). My current thinking is if we remove BR.protocol then we will support only a single default BC, implying a single default protocol.

- "publicReadWrite": Read/Write, same as `ro` with the addition of PutObject being allowed.
7. `bucketClassName`: (optional for greenfield, omitted for brownfield) name of the associated bucket class.
8. `bucketRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketRequest`.
9. `allowedNamespaces`: a list of namespaces that are permitted to either create new buckets or to access existing buckets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Networking folks are also looking for ways to address this problem in GatewayClass. In the future, we may want to consider aligning with whatever they land on for consistency across k8s.

- _Delete_: the bucket and its contents are destroyed.
> Note: the `Bucket`'s retention policy is set to "Retain" as a default. Exercise caution when using the "Delete" retention policy as the bucket content will be deleted.
> Note: a `Bucket` is not deleted if it is bound to a `BucketRequest` or `BucketAccess`.
6. `anonymousAccessMode`: a string specifying *uncredentialed* access to the Bucket. This is applicable to cases where the bucket objects are intended to be publicly readable and/or writable. One of:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really follow this. But if Andrew Large is happy with it, I'm fine with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anonymousAccessMode is defined in the BC and copied to the Bucket. It's purpose is to define the level of public access to a new bucket in the absence of other access policies.

name:
provisioner: [1]
isDefaultBucketClass: [2]
protocol: {"azureblob", "gs", "s3", ... } [3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also contain version information like BucketRequest?

Copy link
Contributor Author

@jeffvance jeffvance Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BC should contain the protocol version as an optional field.
If we remove BR.protocol then version should also be removed from the BR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protocol version added.

1. `provisioner`: (optional) The provisioner field as defined in the `BucketAccessClass`. If empty then there is no driver/sidecar, thus the admin had to create the `Bucket` and `BucketAccess` instances.
1. `bucketInstanceName`: name of the `Bucket` instance bound to this BA.
1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the ServiceAccount in BucketAccessRequest? Do we need it in both places? What happens if they skew for some reason?

Copy link
Contributor Author

@jeffvance jeffvance Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BA.serviceAccount is copied from BAR.serviceAccount and converted to an ObjectReference. We need both since the sidecar may need the SA and won't have access to the BAR (due to minimal privileges). There is no check for skew.

In later COSI version we can handle update events on the BAR and BA and deal with changes to SAs.

1. `bucketInstanceName`: name of the `Bucket` instance bound to this BA.
1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
1. `mintedSecretName`: name of the admin created or provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and must be copied to the app namespace by the COSI controller. **Note:** the provisioner's namespace is contained in the registered driver name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe this process? Who creates mintedSecretName in provisioner's namespace? How do you prevent name collisions? Why does it need to be copied to the app namespace? Why can't secret just be provisioned in app namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A provisioner does not have access to the user's namespace, and the minted secret lives in provisioner namespace. This secret is created by either the admin (static) or by COSI when it is the result of backend bucket access creation. It contains the credential returned by the driver. Name collisions are prevented by using UUID(secret-uuid).

1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
1. `mintedSecretName`: name of the admin created or provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and must be copied to the app namespace by the COSI controller. **Note:** the provisioner's namespace is contained in the registered driver name.
1. `policyActionsConfigMapData`: encoded data that contains a set of provisioner/platform defined policy actions to a given user identity. Contents of the ConfigMap that the *policyActionsConfigMap* field in the `BucketAccessClass` refers to.
1. `principal`: username/access-key for the object storage provider to uniquely identify the user who has access to this bucket
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is filled in by COSI. This is not a required field

Copy link
Contributor Author

@jeffvance jeffvance Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to @wlan0 's reply: we have several use cases involving BA.principal:

  1. static/driverless brownfield where the admin creates the BA and fills in principal, so here principal is minted by the admin.
  2. brownfield where COSI creates the BA and fills in principal based on the driver's minted principal.
  3. the BAR specifies a serviceAccount which is mapped by the backend to a user id, and principal is empty.

1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
1. `mintedSecretName`: name of the admin created or provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and must be copied to the app namespace by the COSI controller. **Note:** the provisioner's namespace is contained in the registered driver name.
1. `policyActionsConfigMapData`: encoded data that contains a set of provisioner/platform defined policy actions to a given user identity. Contents of the ConfigMap that the *policyActionsConfigMap* field in the `BucketAccessClass` refers to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the content of the policyActionsConfigMap look like? Is it defined by the driver? If so, can you provide an example of the type of information it would be used to track. What happens if the object doesn't exist? What happens if it exists, but is then deleted?

Copy link
Contributor Author

@jeffvance jeffvance Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BA.policyActionsConfigMapData could look like this:

{“Effect”:“Allow”,“Action”:“s3:PutObject”,“Resource”:“arn:aws:s3:::profilepics/*“},
{“Effect”:“Allow”,“Action”:“s3:GetObject”,“Resource”:“arn:aws:s3:::profilepics/*“},
{“Effect”:“Deny”,“Action”:“s3:*“,”NotResource”:“arn:aws:s3:::profilepics/*“}

It’s content is defined by the driver but is created by the admin as part of creating a new BAC. The CM content is copied into the BA as the source of truth to handle cases where the CM is replaced, and to allow the sidecar to remain minimally privileged as to not need access to the user's ns. If the CM does not exist when creating the BA we would retry (wait and requeue).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify do Principal allowed here, or admin can only specify resources?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the workflow of how we provisioner BA, the principal is not known upfront. The policy should only be about the resources themselves, and when the access is minted, the policy is applied to the minted prinicipal

1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
1. `mintedSecretName`: name of the admin created or provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and must be copied to the app namespace by the COSI controller. **Note:** the provisioner's namespace is contained in the registered driver name.
1. `policyActionsConfigMapData`: encoded data that contains a set of provisioner/platform defined policy actions to a given user identity. Contents of the ConfigMap that the *policyActionsConfigMap* field in the `BucketAccessClass` refers to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BucketAccessClass allows you to specify a namespace for policyActionsConfigMap. Is the namespace here assumed to be the same as the corresponding BucketAccessRequest? If so, what happens if BucketAccessRequest points to a BucketAccessClass where the config map is pointing to a different namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think more typically the BAC.policyActionsConfigMap's ns would be the same as the provisioner. It could be a security issue if this CM existed in the user ns, right?


#### BucketAccessRequest

A user namespaced custom resource representing an object store user and an access policy defining the user’s relation to that storage. A user creates a `BucketAccessRequest` (BAR) in the app's namespace (which is the same namespace as the `BucketRequest`) A 'BucketAccessRequest' specifies *either* a ServiceAccount or a secret name which contains access policy. Specifying a ServiceAccount enables provisioners to support cloud provider identity integration with their respective Kubernetes offerings.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: secret name is no longer part of the BAR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, do that means serviceAccountName becomes required now. If possible can u please explain how SA can be connected to user in storage provider, also can single service account can attach multiple BARs?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same concern regarding sharing service account across multiple BARs. I don't see a good mechanism to disallow that if we decide to do so.
If we share a service account there might be edge-cases related to granting/revoking access. For example if we share a service account across BARs with intersecting permissions, how we can properly revoke these permissions when a user deletes one of the BARs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oleksiys This is something we went over. Conflicts can arise and we decided that if two different access policies are requested for the same SA, then we reject the second request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll update the KEP to reflect this. This was discussed in the review session: https://docs.google.com/presentation/d/180c1-fcRvGUU_AqMbe5swsDAbNg2yvLdQYdxnXVP9Xs/edit#slide=id.g907055d5e0_0_87

Copy link
Contributor Author

@jeffvance jeffvance Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to FIX this --
@thotz wrote:

Just to clarify, do that means serviceAccountName becomes required now. If possible can u please explain how SA can be connected to user in storage provider?

BAR.serviceAccountName is still optional. If the name is provided then it will be used by the backend object store to map the SA to a user identity. If SA is omitted then the backend will mint a principal who is the user for this bucket.

also can single service account can attach multiple BARs?

Within the same namespace, multiple BARs could reference the same SA. BARs cannot reference a SA in a different ns.

4. `provisioner`: (optional) The provisioner field as defined in the `BucketClass`. If empty then there is no driver/sidecar, thus the admin had to create the `Bucket` and `BucketAccess` instances.
5. `retentionPolicy`: Prescribes the outcome when the `BucketRequest` is deleted.
- _Retain_: (default) the `Bucket` and its data are preserved. The `Bucket` can potentially be reused.
- _Delete_: the bucket and its contents are destroyed.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the "retentionPolicy Delete" is applicable to both greenfield and brownfield cases,
so that means Existing buckets can be removed COSI or it will just remove the Bucket CR and keep the actual backend bucket.
A similar issue can be also seen in the case of existing OBC/OB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this follows the exact same pattern as PVs.

can you explain what is OBC and what the "issue" is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OB/OBC are resources defined in the object bucket library

Copy link
Contributor Author

@jeffvance jeffvance Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this follows the exact same pattern as PVs.

So, a retentionPolicy = "Delete" will deleted the backend bucket [1] and, of course, the k8s resources associated with that bucket. It works this way for greenfield and brownfield.

1- it is ultimately up to each driver whether or not the backend bucket is actually deleted. For example, instead of permanently deleting the bucket, the driver could archive/rename it etc. However, whatever the driver implements has to be idempotent and reentrant.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this follows the exact same pattern as PVs.

can you explain what is OBC and what the "issue" is?

It is not an "issue" issue, I meant for brownfield cases it may have objects prior to bucket provisioning will be removed as part of Delete policy.

- _Retain_: (default) the `Bucket` and its data are preserved. The `Bucket` can potentially be reused.
- _Delete_: the bucket and its contents are destroyed.
> Note: the `Bucket`'s retention policy is set to "Retain" as a default. Exercise caution when using the "Delete" retention policy as the bucket content will be deleted.
> Note: a `Bucket` is not deleted if it is bound to a `BucketRequest` or `BucketAccess`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may contradictory Delete Bucket section, "Delete bucket his means that even if a BucketAccess references a Bucket, the "Bucket is still deleted and the BA is orphaned"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, your understanding is correct. I'm not sure if you're asking a question, or just explaining what you're understanding

Copy link
Contributor Author

@jeffvance jeffvance Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...to a BucketRequest or BucketAccess.

the "or BucketAccess" is wrong and was missed in earlier edits

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was not a question, just pointed out two different statements

1. `protocol.version`: (optional) specifies the desired version of the `protocol`. For "s3", a value of "v2" or "v4" could be used.
1. `bucketPrefix`: (optional) prefix prepended to a generated new bucket name, eg. “yosemite-photos-". If `bucketInstanceName` is supplied then `bucketPrefix` is ignored because the request is for access to an existing bucket. If `bucketPrexix` is specified then it is added to the `Bucket` instance as a label.
1. `bucketClassName`: (optional for greenfield, omitted for brownfield) name of the `BucketClass` used to provision this request. If omitted for a greenfield bucket request, a default bucket class matching the protocol, if available, is used. If the greenfield bucket class is missing or does not support the requested protocol, an error is logged and the request is retried (with exponential backoff). A `BucketClass` is necessary for greenfield requests since BCs support a list of allowed namespaces. A `BucketClass` is not needed for brownfield requests since the `Bucket` instance, created by the admin, also contains `allowedNamespaces`.
1. `bucketInstanceName`: (optional) name of the cluster-wide `Bucket` instance. If blank, then COSI assumes this is a greenfield request and will fill in the name during the binding step. If defined by the user, then this names the `Bucket` instance created by the admin. There is no automation available to make this name known to the user.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both bucketPrefix and BucketInstanceName in BucketRequest CR are empty, whether COSI will create a bucket with a random name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thotz This is already explained in the KEP . The BucketInstanceName is filled in by COSI, and the bucketPrefix is optional. The generated name follows the format br-${uuid}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: delete "There is no automation available to make this name known to the user."


#### BucketAccessRequest

A user namespaced custom resource representing an object store user and an access policy defining the user’s relation to that storage. A user creates a `BucketAccessRequest` (BAR) in the app's namespace (which is the same namespace as the `BucketRequest`) A 'BucketAccessRequest' specifies *either* a ServiceAccount or a secret name which contains access policy. Specifying a ServiceAccount enables provisioners to support cloud provider identity integration with their respective Kubernetes offerings.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, do that means serviceAccountName becomes required now. If possible can u please explain how SA can be connected to user in storage provider, also can single service account can attach multiple BARs?

1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
1. `serviceAccount`: an `objectReference` containing the name, namespace and UID of the associated `ServiceAccount`. Empty when the `BucketAccessRequest.mintedSecretName` is specified.
1. `mintedSecretName`: name of the admin created or provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and must be copied to the app namespace by the COSI controller. **Note:** the provisioner's namespace is contained in the registered driver name.
1. `policyActionsConfigMapData`: encoded data that contains a set of provisioner/platform defined policy actions to a given user identity. Contents of the ConfigMap that the *policyActionsConfigMap* field in the `BucketAccessClass` refers to.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify do Principal allowed here, or admin can only specify resources?

> Note: the `Bucket`'s retention policy is set to "Retain" as a default. Exercise caution when using the "Delete" retention policy as the bucket content will be deleted.
> Note: a `Bucket` is not deleted if it is bound to a `BucketRequest` or `BucketAccess`.
6. `allowedNamespaces`: a list of namespaces that are permitted to either create new buckets or to access existing buckets. This list is static for the life of the `BucketClass`, but the `Bucket` instance's list of allowed namespaces can be mutated by the admin.
7. `parameters`: (optional) a map of string:string key values. Allows admins to set provisioner key-values.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this field utilise for the backend-specific features like bucket notifications, object quota, bucket policies etc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the field is completely opaque to COSI. You can utilize it for those functions.

bucketAvailable: [12]
```

1. `name`: When created by COSI, the `Bucket` name is generated in this format: _"<BR.namespace*>-<BR.name*>-<uuid>"_. If an admin creates a `Bucket`, as is necessary for brownfield access, they can use any name. The uuid is unique within a cluster. `*` the first 100 characters of the namespace and the name are used.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX this and BA: name: "br-"+uuid

5. `retentionPolicy`: Prescribes the outcome when the `BucketRequest` is deleted.
- _Retain_: (default) the `Bucket` and its data are preserved. The `Bucket` can potentially be reused.
- _Delete_: the bucket and its contents are destroyed.
> Note: the `Bucket`'s retention policy is set to "Retain" as a default. Exercise caution when using the "Delete" retention policy as the bucket content will be deleted.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: consider deleting all notes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: Bucket can be deleted even if the BAR/BA remain.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2020
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github Unicorn is making this very difficult to review. Should we consider closing this PR, and reopening it new?

finalizers:
- cosi.io/finalizer [2]
spec:
protocol:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use case I described above is an important one:

Imagine you have an app that is deployed across multiple cloud providers, and bundles multiple SDKs to speak the native object protocol on each environment. The app dev should be able to define a single BucketRequest that will work across all those environments.

Having to create a new BR for each platform breaks the portability we are trying to achieve.

bucketRequest: [8]
allowedNamespaces: [9]
- name:
protocol: [10]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the lines of intuitive and less complex, we are considering eliminating protocol from the BR. The reason why is that the BC reference in the BR is sufficient to know the protocol. The only value today in BR.protocol is that COSI can validate the BC's protocol matches the BR's. If validation fails then the BR is retried as is typical in k8s. An issue with this is the BC is immutable so the admin would need to recreate the BC with the correct protocol. Or, the BR needs to be deleted and recreated to use the right protocol. Both of these scenarios are unnecessary if BR.prtocol is removed.

Remember that BC may be deleted or changed (deleted and recreated with different values but same name), therefore Bucket should not depend on a reference to BC -- it has to be self sufficient even if the BC used to create it is deleted.

```

1. `provisioner`: (required) the name of the vendor-specific driver supporting the `protocol`.
1. `isDefaultBucketClass`: (optional) boolean, default is false. If set to true then a `BucketRequest` may omit the `BucketClass` reference. If the greenfield `BucketRequest` skips the `BucketClass` and a default `BucketClass`'s protocol matches the `BucketRequest`'s protocol then the default bucket class is used; otherwise an error is logged. It is possible that more than one `BucketClass` of the same protocol is set as the default class. In this case, it is non-deterministic which `BucketClass` is used as the default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that more than one BucketClass of the same protocol is set as the default class. In this case, it is non-deterministic which BucketClass is used as the default.

We should align with the current predictable StorageClass behavior here.

Storageclasses have (or had) this same issue. We have 2 choices: 1) create a BC admission controller that rejects a new default BC if it matches an existing default BC for the given protocol, 2) let the result be undefined, meaning the first BC retrieved that matches the BR's protocol wins.

There is a third option which is what StorageClass does, runtime validation (at volume provision time rather than object creation time) -- fail CreateVolume (or in this case fail CreateBucket) with error indicating their are multiple defaults.

Copy link
Contributor Author

@jeffvance jeffvance Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BC will have admission webhook to enforce no more than 1 default, so we'll be able to catch this problem at BC creation time. KEP will be updated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffvance Is this possible without race condition?


1. `provisioner`: (required) the name of the vendor-specific driver supporting the `protocol`.
1. `isDefaultBucketClass`: (optional) boolean, default is false. If set to true then a `BucketRequest` may omit the `BucketClass` reference. If the greenfield `BucketRequest` skips the `BucketClass` and a default `BucketClass`'s protocol matches the `BucketRequest`'s protocol then the default bucket class is used; otherwise an error is logged. It is possible that more than one `BucketClass` of the same protocol is set as the default class. In this case, it is non-deterministic which `BucketClass` is used as the default.
1. `protocol`: (required) protocol supported by the associated object store. This field validates that the `BucketRequest`'s desired protocol is supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also contain version information like BucketRequest?

@jeffvance jeffvance 7 days ago •
Author
The BC should contain the protocol version as an optional field.
If we remove BR.protocol then version should also be removed from the BR.

Remember the rule that BC can be deleted and Bucket object must be self-sufficient. Therefore version info seems crucial here. Depending on BR is also not sufficient (if we end up, as I am proposing above, to have a "list of protocols/versions" on BR).

Copy link
Contributor Author

@jeffvance jeffvance Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this omission. BC's protocol is the same struct as used in the BR. Will update.

1. `bucketInstanceName`: name of the `Bucket` instance bound to this BA.
1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`.
1. `serviceAccount`: an `ObjectReference` containing the name, namespace and UID of the associated `BAR.serviceAccountName`. If empty then integrated Kubernetes -> cloud identity is not being used, in which case, `BucketAccess.principal` contains the user identity, which is minted by the provisioner.
1. `mintedSecretName`: name of the provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace and is copied to the app namespace by the COSI controller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe this process? Who creates mintedSecretName in provisioner's namespace? How do you prevent name collisions? Why does it need to be copied to the app namespace? Why can't secret just be provisioned in app namespace?

@jeffvance jeffvance 7 days ago Author
A provisioner does not have access to the user's namespace, and the minted secret lives in provisioner namespace. This secret is created by either the admin (static) or by COSI when it is the result of backend bucket access creation. It contains the credential returned by the driver. Name collisions are prevented by using UUID(secret-uuid).

Why does it need to be copied to the app namespace? Why can't the secret just be provisioned in app namespace to begin with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying the secret is a carryover from an earlier design. Will update kep to reflect the current design, which adheres to minimal privileges such that the provisioner does not need permissions to access user namespaces. Instead, the cosi-node-adapter will write the secret contents (provisioner's ns) to the app pod's secret mount.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.