Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to turn an existing clone into a shared clone? #15

Open
2 of 4 tasks
bpoldrack opened this issue Apr 12, 2023 · 5 comments
Open
2 of 4 tasks

How to turn an existing clone into a shared clone? #15

bpoldrack opened this issue Apr 12, 2023 · 5 comments
Labels
support-tracker Track a support event that occurred elsewhere via-datalad-channel report origin is a datalad-specific channel (chat/email/office hour)

Comments

@bpoldrack
Copy link

bpoldrack commented Apr 12, 2023

Origin: Datalad office hour

There have been several occasions in the office hour where users realized that they need a clone that is initialized with --shared group, but wanted/needed to this after the fact in-place, because of the size of the dataset. The question usually shows that users would expect something like datalad create --force with the shared option passed to git-init would work. That is, however, insufficient and no concise answer was provided during the office hour.

  1. re-init via git init --shared only sets the respective config, but does not change existing permissions. This needs to be done separately. See https://stackoverflow.com/questions/3242282/how-to-configure-an-existing-git-repo-to-be-shared-by-a-unix-group
  2. .git/annex and its permissions need to be taken into account
  3. A knowledge base item on the topic should probably also point to git config --local receive.denyNonFastForwards true and its purpose.

TODO (not necessarily to be performed in this order)

  • Inform OP/Add reference to this issue at origin
  • Clarifying Qs asked or not needed
  • Nature of the issue is understood
  • Inform OP about resolution
@bpoldrack bpoldrack added the support-tracker Track a support event that occurred elsewhere label Apr 12, 2023
@mih
Copy link
Contributor

mih commented Apr 25, 2023

In order to turn this bit into a KBI, first some explorations. For each scenario the following commands ran

datalad create . <with some shared setting, see blow>
echo 123 > dummy
datalad save

below is a listing of the resulting permissions for

  • the worktree
  • .git (selected items)
  • a key directory in .git/annex (recursively)

The distinguished conditions are

  • --shared=0600 (user-only)
  • --shared=0640 (group-read)
  • --shared=0660 (group-write) -- git has alias group, also default for a plain --shared
  • --shared=0666 (world-read) -- git has alias world
  • --shared=0666 (world-write)

Effective umask setting here is 002.

git version 2.39.1
git-annex version: 10.20230126

path 0600 0640 0660 group 0664 world 0666
.datalad drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x
.gitattributes .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r--
dummy lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx
.git drwx------ drwxr-s--- drwxrws--- drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsrwx
.git/branches drwx------ drwxr-s--- drwxrws--- drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsrwx
.git/config .rw------- .rw-r----- .rw-rw---- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-rw-
.git/config.dataladlock .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r--
.git/HEAD .rw------- .rw-r----- .rw-rw---- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-rw-
.git/objects drwx------ drwxr-s--- drwxrws--- drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsrwx
.git/annex drwxrwxr-x drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsr-x
.git/annex/...keydir dr-xr-xr-x dr-xr-sr-x dr-xr-sr-x drwxrwsr-x dr-xr-sr-x drwxrwsr-x dr-xr-sr-x
.git/annex/...key .r--r--r-- .r--r--r-- .r--r--r-- .r--r--r-- .r--r--r-- .r--r--r-- .r--r--r--

TL/DR:

  • no impact of --shared on the worktree
  • no impact on .git/annex and below, apart from user-only vs any shared with octal permissions, only group works
  • no impact on datalad-artifact
  • consistent/expected impact on git-managed pieces under .git

@yarikoptic
Copy link
Contributor

git-annex does not support octal modes for --shared apparently. Some old TODO: https://git-annex.branchable.com/todo/sharedRepository_mode_not_supported_by_git-annex/

@mih
Copy link
Contributor

mih commented Apr 25, 2023

Thx @yarikoptic for the link. I have extended the table above to document the contrast between shared=group/660 and for shared=world/664.

I believe this should now result into at least two KBIs:

  1. How to set up (create) a shared dataset
  2. How to retroactively change the sharedness of an existing clone

The test above should be repeated with datalad clone <src> <dest> --shared=.... to see if the outcome pattern is the same at the one above.

Once we know that, it is worth pinging the git-annex issue linked above.

Ultimately, there should be a technical issue/proposal for dealing with permissions for file system items that DataLad is managing directly (not through git or git-annex). The situation currently presents itself to me as:

  • follow umask (ie. no specific permission management) for files in the worktree
  • follow core.sharedRepository for anything under .git -- git does that consistently, git-annex also, but limited to literal labels, and (largely) ignoring the octal permission declarations.

@mih
Copy link
Contributor

mih commented Apr 26, 2023

The test above should be repeated with datalad clone --shared=.... to see if the outcome pattern is the same at the one above.

This turns out to be difficult:

1. datalad clone http://127.0.0.1:8000 ds --config core.sharedRepository=0600

The config flag is set in the repo, but after init runs, hence it does not impact the setup in .git/ of the clone.

2. (1) but followed by another git init

This brings the anticipated permissions for the .git directory itself, but not for its content (i.e. .git/config)

3. datalad -c core.sharedRepository=0600 clone ...

Does not work, because datalad is not passing on configuration to subprocesses.

4. Use git-config ENV vars.

This has the desired effect:

GIT_CONFIG_COUNT=1 GIT_CONFIG_KEY_0=core.sharedRepository GIT_CONFIG_VALUE_0=0600 datalad clone http://127.0.0.1:8000

It is using a standard git-config mechanism to declare additional config for Git without having to alter config files. Given that this is done externally, there is nothing that datalad itself needs to do in order to pass this setup on to Git -- and thankfully, it also does not intercept it.

@mih
Copy link
Contributor

mih commented Apr 26, 2023

Here is the comparison table for datalad clone. The setup is identical to that of the comparison above. The commands are

GIT_CONFIG_COUNT=1 GIT_CONFIG_KEY_0=core.sharedRepository GIT_CONFIG_VALUE_0=<setting> datalad clone http://127.0.0.1:8000 ds --config annex.security.allowed-ip-addresses=127.0.0.1

followed by a datalad get . in the resulting clone. origin is the same dataset with the same key, as the one used for the create comparison above -- served via a local http server.

path 0600 0640 0660 group 0664 world 0666
.datalad drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x
.gitattributes .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-r--
dummy lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx lrwxrwxrwx
.git drwx------ drwxr-s--- drwxrws--- drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsrwx
.git/branches drwx------ drwxr-s--- drwxrws--- drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsrwx
.git/config .rw------- .rw-r----- .rw-rw---- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-rw-
.git/HEAD .rw------- .rw-r----- .rw-rw---- .rw-rw-r-- .rw-rw-r-- .rw-rw-r-- .rw-rw-rw-
.git/objects drwx------ drwxr-s--- drwxrws--- drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsrwx
.git/annex drwxrwxr-x drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsr-x drwxrwsr-x
.git/annex/...keydir dr-xr-xr-x dr-xr-sr-x dr-xr-sr-x drwxrwsr-x dr-xr-sr-x drwxrwsr-x dr-xr-sr-x
.git/annex/...key .r--r--r-- .r--r--r-- .r--r--r-- .r--r--r-- .r--r--r-- .r--r--r-- .r--r--r--

Interestingly, only git-annex distinguishes between group, 0664, and world.

This clone approach is also a likely candidate for a dedicated KBI.

@mih mih added the via-datalad-channel report origin is a datalad-specific channel (chat/email/office hour) label May 8, 2023
adswa added a commit that referenced this issue Jun 30, 2023
adswa added a commit that referenced this issue Jul 25, 2023
Add a KBI with insights from #15 on config overrides
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support-tracker Track a support event that occurred elsewhere via-datalad-channel report origin is a datalad-specific channel (chat/email/office hour)
Projects
None yet
Development

No branches or pull requests

3 participants