Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Self-serve Minio object storage #9

Closed
blairdrummond opened this issue Apr 23, 2020 · 14 comments
Closed

[Epic] Self-serve Minio object storage #9

blairdrummond opened this issue Apr 23, 2020 · 14 comments
Assignees
Labels
area/engineering Requires attention from engineering: focus on foundational components or platform DevOps component/kubeflow Kubeflow Related component/storage Persistence related (e.g. Minio, cloud, or user storage) priority/soon size/L 4-5 days

Comments

@blairdrummond
Copy link
Contributor

Get every kubeflow namespace a small self-serve bucket.

We have the namespaces per-user for Kubeflow, I think what would be great is if we could give them access to a bucket with that name, which is accessible outside the cluster ideally, but behind auth

I really like this https://www.arrikto.com/kubeflow/
arrikto.comarrikto.com
Kubeflow – Arrikto
Arrikto is building decentralized storage for the cloud native world. We set out to make data discoverable and accessible instantly. Anywhere.

https://www.arrikto.com/wp-content/uploads/2018/10/20181002-2-DataScience-Whitepaper.pdf

Want to implement the same thing

@blairdrummond blairdrummond added component/kubeflow Kubeflow Related component/storage Persistence related (e.g. Minio, cloud, or user storage) size/L 4-5 days labels Apr 23, 2020
@sylus
Copy link
Member

sylus commented Apr 24, 2020

So good news is that we got this roughly solved in a pretty awesome manner.

https://github.com/statcan/minio

Basically using Kustomize we can easily via P.R. add a new tenant or type of workload against the minio operator which will ensure that each instance defined (which can comprise for 4 servers for quorum and erasure code support) gets created.

Then using OPA integrated with Azure AD / OpenID Connect we can then restrict who / where can create buckets using OPA against JWT tokens. This works really well so in practice on each deployed instance:

a) You can create your own bucket as long as matches your user.name (only you can see this)
b) You can create your folder with your user.name under the shared bucket of which everyone can see
c) Really the options are endless ones we add organizations to the JWT token and do OPA rules against that

The currently policy can be found here: https://github.com/StatCan/minio/blob/master/opa/minio.rego

Every deployed instance will get this bare minimum and for different tiers we can add in group organization and different levels of minio instances:

Screen Shot 2020-04-23 at 1 36 16 PM

Screen Shot 2020-04-23 at 9 25 23 PM

a) Those backed via managed premium disks for higher IOPS
b) Those backed via big storage measured in Terabytes
c) Simple minimal ones for testing and sandboxing
d) Specific ones with object storage backing from a Cloud (Azure Storage Account, Amazon S3, etc)

References

https://www.openpolicyagent.org/docs/latest/
https://docs.min.io/docs/minio-sts-quickstart-guide
https://github.com/minio/minio/blob/master/docs/sts/opa.md
https://github.com/minio/minio-operator

@sylus sylus changed the title Self-serve S3 by kubeflow namespace Self-serve S3 by OPA + Minio Apr 24, 2020
@sylus sylus added priority/soon area/engineering Requires attention from engineering: focus on foundational components or platform DevOps labels Apr 24, 2020
@brendangadd
Copy link
Contributor

@sylus Any issues with name conflicts across email domains (e.g. johnsmith@gmail.com and johnsmith@outlook.com being two distinct users)?

@sylus
Copy link
Member

sylus commented Apr 24, 2020

@brendangadd according to current logic that would indeed break :P So will have to be looked at.

If we had org or an other field to also check would fix.

@ca-scribner
Copy link
Contributor

could use email address (or derivative of email if @ is a problem) for the name, but then making a shared space publishes your email address.

@ca-scribner
Copy link
Contributor

This sounds really useful btw. facilitates better collaboration and personalization at the same time

@sylus
Copy link
Member

sylus commented Apr 26, 2020

Now that the tenants are integrated via PR and infra is up.

@zachomedia is going to use vault injector so whenever a kubeflow pipeline or jupyter notebook is brought up that they recieve the correct credentials based on user AD. Bit of work here to do but we have a plan :D

@zachomedia
Copy link

For Vault integration, we're going to look at: https://github.com/kula/vault-plugin-secrets-minio

@justbert
Copy link

We can build some access policies with identity templates.

@blairdrummond
Copy link
Contributor Author

This is long-since done, right?

@justbert
Copy link

justbert commented Jun 9, 2020

I'd say yes!

@sylus
Copy link
Member

sylus commented Jun 9, 2020

Yeah i think can close this even though still needs improvement. ^_^

@sylus sylus closed this as completed Jun 9, 2020
@brendangadd
Copy link
Contributor

This issue is kept open because the current implementation allows for naming collisions which could be abused to gain access to other people's storage.

@sylus Any issues with name conflicts across email domains (e.g. johnsmith@gmail.com and johnsmith@outlook.com being two distinct users)?

@brendangadd according to current logic that would indeed break :P So will have to be looked at.

@brendangadd brendangadd reopened this Jun 15, 2020
@brendangadd
Copy link
Contributor

Making this an Epic. Will make associated smaller issues.

@brendangadd brendangadd changed the title Self-serve S3 by OPA + Minio [Epic] Self-serve Minio object storage Jun 17, 2020
@blairdrummond
Copy link
Contributor Author

@brendangadd Do we still need this epic?

@wg102 wg102 mentioned this issue Jul 12, 2022
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/engineering Requires attention from engineering: focus on foundational components or platform DevOps component/kubeflow Kubeflow Related component/storage Persistence related (e.g. Minio, cloud, or user storage) priority/soon size/L 4-5 days
Projects
None yet
Development

No branches or pull requests

7 participants