Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming conventions for shared folders #4

Open
aculich opened this issue Feb 22, 2021 · 9 comments
Open

Naming conventions for shared folders #4

aculich opened this issue Feb 22, 2021 · 9 comments

Comments

@aculich
Copy link

aculich commented Feb 22, 2021

This is a feature request for future hubs.

I suggest the naming convention for shared folders should follow this pattern:

  • shared as the place where all hub users have access to a shared read-write folder, whereas currently this is called shared-readwrite, but since most users will be using this in their scripts to write out files it would be helpful if this were the shortest.
  • shared-readonly as the place where instructors put datasets and other files that only they (as admins) have write-access so they can use this as a place to put files for students and the students know it is read-only.
  • shared-private or private or admin-private as the place where instructors (admins) can share files with each other and the folder itself is not visible to non-admins.

Hope that helps!

@choldgraf
Copy link
Member

This makes sense to me, though with one exception, which is that (I think?) it is not currently possible for non-admins to write to any shared folder. If I recall, that was an intentional decision from @yuvipanda but maybe it was instead just a temporary decision? We should get clarity there.

@aculich
Copy link
Author

aculich commented Feb 22, 2021

As currently configured on the mills hub, it seems I can write to shared-readwrite but not to shared, however, I'm an admin so maybe that's why I can at least write to one of them? In which case, who has write-access to the shared folder if not the admins?

And weirder still... I just noticed that even though it told me the shared directory is a Read-only file system it actually created my test file in it! That seems like a bug?!

jovyan@jupyter-aculich-40berkeley-2eedu:~$ pwd
/home/jovyan
jovyan@jupyter-aculich-40berkeley-2eedu:~$ ls -la shared*
total 4
drwxr-xr-x 2 jovyan jovyan    6 Jan 21 17:21 shared
drwxr-xr-x 2 jovyan jovyan    6 Jan 21 17:21 shared-readwrite
jovyan@jupyter-aculich-40berkeley-2eedu:~$ ls -la shared
total 0
drwxr-xr-x 2 jovyan jovyan  6 Jan 21 17:21 .
drwxr-xr-x 6 jovyan jovyan 72 Jan 21 18:38 ..
jovyan@jupyter-aculich-40berkeley-2eedu:~$ touch shared/test
touch: cannot touch 'shared/test': Read-only file system
jovyan@jupyter-aculich-40berkeley-2eedu:~$ touch shared-readwrite/test
jovyan@jupyter-aculich-40berkeley-2eedu:~$ find shared*
shared
shared/test
shared-readwrite
shared-readwrite/test
jovyan@jupyter-aculich-40berkeley-2eedu:~$ ls -lah shared-readwrite/test
-rw-r--r-- 1 jovyan jovyan 0 Feb 22 18:45 shared-readwrite/test
jovyan@jupyter-aculich-40berkeley-2eedu:~$ ls -lah shared/test
-rw-r--r-- 1 jovyan jovyan 0 Feb 22 18:45 shared/test

@yuvipanda
Copy link
Member

So currently, admins can write to 'shared-readwrite', and it'll show up as 'shared' to everyone else. In other systems, when there was no path different, users have accidentally stepped on each others' foot often by accidentally deleting everything in shared. Hence the different naming conventions. I don't actually think this has been communicated to anyone, nor anyone is currently actively using it - so we can definitely re-engineer this as we wish.

@aculich
Copy link
Author

aculich commented Feb 22, 2021

Aha! So really are the same folder, just presented to admins as two different folders. That makes sense. So end users all see only shared.

To finalize setting up the Mills hub in 2i2c-org/infrastructure#178 I've requested that we also set up a private folder just for the admins to share files with other admins-only.

@yuvipanda
Copy link
Member

How about this sequence:

  1. shared/mine will be read/write where you can put in whatever you want, and it'll be visible to everyone else
  2. shared/others/<username> will be readonly, and show the shared folders of every other user!
  3. shared/public will be readonly for everyone, and read-write for admins.
  4. (much later) we can do shared/group/<group-name> for read-write access to members of a group

The particular names are up for change, but what do you think of this?

@aculich
Copy link
Author

aculich commented Feb 23, 2021

I think we should understand this on a per-use case basis:

Education hub for many official courses

  1. shared/<course-name> will be readonly for everyone, and read-write for admins (instructors and GSIs).
  2. private/<course-name> only visible to admins (instructors and GSIs) with write-access

Mills is currently this kind of hub. Mills has a single hub that serves multiple courses. In this case, all instructors would share a common private area that is only visible, readable, and writable to all of those instructors. All instructors could write in any course folder or even create a new course folder themselves— no technical safeguards so easy to configure and instructors police themselves and resolve issues through some social (not technical) mechanism.

This works fine for a small institution like Mills with 10s of instructors and GSIs, but might need to be adapted for larger institutions with 100s of instructors and 1000s of GSIs.

This use case is also the generic datahub use case where multiple courses and multiple instructors/GSIs all share the same datahub and filename spaces? Isolation

Education hub for a single official course

  1. shared/<course-name> will be readonly for everyone, and read-write for admins (instructors).
  2. private/<course-name> only visible to admins (instructors).

As I understand it, this is how some official UCB courses such as cs194 operate? They have their own hub (is the underlying cluster shared or is it separate?) with their own image and their own set of instructors/GSIs that is isolated from the rest of the datahub.

Training hub for workshops

  1. shared/<workshop-name> will be readonly for everyone, and read-write for admins (instructors).
  2. private/<workshop-name> only visible to admins (instructors).
  3. datasets/<dataset-name> will be readonly for everyone, and read-write for admins (instructors). Data set names are independent from the name of workshops which may reference the data sets.

This is the D-Lab use case. This is similar to the generic datahub/Mills use case in which many D-Lab workshops run on a single hub with all users having access to the share spaces of any workshop (whether or not they are taking the workshop). We add a special datasets directory separate from the individual workshop, as we may have some (especially large) datasets which live in a global namespace that may be referenced by multiple workshops.

Research hub for multiple group-based research projects

  1. (much later) we can do shared/group/<group-name>/<project-name> for read-write access to members of a group
  2. (much later) we can do shared/group/<group-name>/<dataset-name> for read-only access to members of a group for specific datasets shared/group/<group-name>/<dataset-name>
  3. private/datasets/<dataset-name> this will only by visible and read-write for admins (data curators). Data set names are independent from the name of projects which may reference the data sets. These are private datasets that get mounted to one or more groups such as
  4. datasets/<dataset-name> will be readonly for everyone, and read-write for admins (project maintainers). Data set names are independent from the name of projects which may reference the data sets. These are global data sets, whereas individual projects may have their own private datasets in their group-project directory from 1 above.

This is another D-Lab use case. We have a single hub (possibly combined together with the Training hub use case, as well).

Research hub for single group with multiple projects

  1. shared/<project-name> will be readonly for everyone, and read-write for admins (group member).
  2. private/<project-name> only visible to admins (group admins).
  3. datasets/<dataset-name> will be readonly for everyone, and read-write for admins (group admins). Data set names are independent from the name of workshops which may reference the data sets.

This is another D-Lab use case. This would also work for a Discovery or URAP use case, or faculty research project/group. This is also much simpler than the previous use case from a per-hub perspective. There is just more overhead in setting up a new hub for each new group who needs one. The one group in this case has multiple projects and the assumption is that all people in a group have 100% access to all the shared/global spaces. If a single group needs to have differential access for members, then we would create a separate hub for a different group to keep this a very simple model.

@yuvipanda
Copy link
Member

@aculich
Copy link
Author

aculich commented Mar 7, 2024

@yuvipanda when I try that link I get:

This board can't be found
It was either deleted or the link
you have might be broken.

So maybe the productboard is internal to 2i2c only?

@yuvipanda
Copy link
Member

@aculich yeah, we're working on figuring out how to make sure it's publicly visible! Hold on :)

@yuvipanda yuvipanda transferred this issue from 2i2c-org/infrastructure Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants