Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker-based tasks output ownership #922

Closed
gaow opened this issue Mar 12, 2018 · 19 comments
Closed

Docker-based tasks output ownership #922

gaow opened this issue Mar 12, 2018 · 19 comments

Comments

@gaow
Copy link
Member

gaow commented Mar 12, 2018

Currently when a task is based on docker the ownership for output is root.root. I am trying to figure out if there is a way to change it to user instead.

@gaow
Copy link
Member Author

gaow commented Mar 12, 2018

Looks like adding f"-u {os.getuid()}" by default might help? Currently it has to be an input option.

@BoPeng
Copy link
Contributor

BoPeng commented Mar 15, 2018

Can this be image dependent? I mean, the user of the system does not have to translate to user in the docker image, and the docker image might be run by a non-root user (?) so it will write file as that user?...

All these need to be clarified before we do anything.

@gaow
Copy link
Member Author

gaow commented Mar 15, 2018

Can this be image dependent?

I'm not sure, but a couple of docker images I recently used both writes files as root.

the docker image might be run by a non-root user (?) so it will write file as that user?

Even so, f"-u {os.getuid()}" by default would not seem harmful. In fact I'm not sure what is the reason that we even make user a configuration option, rather than just set it to f"-u {os.getuid()}" by default -- because this will be consistent with the file ownership as a result of running other command programs.

@BoPeng
Copy link
Contributor

BoPeng commented Mar 16, 2018

Even so, f"-u {os.getuid()}" by default would not seem harmful.

The problem is that I am not sure the docker image would work with --user {os.getuid()}. For example, if the docker image is written to run as root, all file system is root, the specified user might not be able to find the program (path problem?), or the program might not be able to write inside the image (cannot create file etc). The same problem exists if the image is designed to run as another user.

So without -u we are running the image with their designed user, which does not have problem in running the image but might not be able to write to mounted image.

With -u we might not be able to run the image, but should be able to write to mounted image.

Is this the correct summary of the situation?

@BoPeng
Copy link
Contributor

BoPeng commented Mar 16, 2018

On my mac, running

run: docker_image='ubuntu', user=0
  echo `pwd` > /Users/bpeng1/a.txt

will result in

[bpeng1@BCBMC07MX084DY3:~]$ ls -l a.txt
-rw-r--r--  1 bpeng1  895809667  18 Mar 15 19:46 a.txt

regardless of user settings. What is the case on your end?

@gaow
Copy link
Member Author

gaow commented Mar 16, 2018

Hmm, this is interesting ... here is mine:

-rw-r--r--  1 root root   29 Mar 15 20:32 a.txt

Note that I removed user=0. So the behavior is platform dependent?

I think I get your point and it is completely valid. It seems I had a wrong understanding of what -u does. But otherwise, how can I make sure the output file to the host system is not root locked?

@BoPeng
Copy link
Contributor

BoPeng commented Mar 16, 2018

I do not think there is a perfect solution here so a good default is important. I believe on MacOSX the docker image is running in a VM which is run in the user space so option --user does not matter outside, but will matter inside. The current behavior is good as it works with images with both root and non-root users. On Linux, allowing the default image user write to mounted drives is problematic because docker will either write root-owned files or cannot write at all, so a default non-root user makes more sense.

The decision is then if we should use --user {getuid()} only for Linux or for both systems...

@gaow
Copy link
Member Author

gaow commented Mar 16, 2018

I'm actually wondering what nextflow does for this. If it works with Mac I guess it is good for most desktop uses. On Linux because docker configuration requires desktop sudo permission anyways so users are likely to be able to solve the problem on their own, and more importantly the more typical usage of SoS with Linux is on HPC which does not support docker anyways (for many systems) and we may want to look at singularity instead.

@BoPeng
Copy link
Contributor

BoPeng commented Mar 16, 2018

According to nextflow doc, it allows engineOptions, and fixOwnership, and the latter appears to pertain to your problem. Because we can only fix ownership of known output files and SoS allows the execution of scripts without output inside docker, it is not easy for us to fix ownership as an aftereffect of docker execution.

@gaow
Copy link
Member Author

gaow commented Mar 16, 2018

Agreed. I do not think I can come up with good suggestions that works safely for all scenarios we considered. I'm cool to close the ticket and leave it as is for now.

@BoPeng
Copy link
Contributor

BoPeng commented Mar 16, 2018

One argument that supports user={os.getuid()} would be that it is more portable. I mean, if we set os.getuid() as default, users can override it with user=0 or user='image_user' which is image dependent. If we do not set this as default, users might have to use user='username' etc, which is user dependent. The notebook is therefore less portable.

@gaow
Copy link
Member Author

gaow commented Mar 16, 2018

But the difficulty is we do not know for sure beforehand whether or not the user ID on the host system will also exist in the Docker image, though for my case they are both 1001. That was a bad assumption I made when I proposed using os.getuid. So if we are to implement this we may need to create such user and user group on the fly if the uid does not exist, before running anything; then run under that user to ensure the outcome matches our system. Is that possible?

@BoPeng
Copy link
Contributor

BoPeng commented Mar 16, 2018

docker allows the use of arbitrary user id and will simply treat it as a new normal user. The advantage is that the user-id will be used to create files in mounted drives, which is what we need here.

@BoPeng
Copy link
Contributor

BoPeng commented Mar 16, 2018

According to docker doc

root (id = 0) is the default user within a container. The image developer can create additional users. Those users are accessible by name. When passing a numeric ID, the user does not have to exist in the container.

The developer can set a default user to run the first process with the Dockerfile USER instruction. When starting a container, the operator can override the USER instruction by passing the -u option.

@gaow
Copy link
Member Author

gaow commented Mar 16, 2018

When passing a numeric ID, the user does not have to exist in the container.

Okey then I was still accidentally correct about -u behavior :) Then maybe this now seems a good thing to do for all platform?

BoPeng pushed a commit that referenced this issue Mar 17, 2018
@BoPeng
Copy link
Contributor

BoPeng commented Mar 17, 2018

Let us see if this works reasonably well.

@gaow
Copy link
Member Author

gaow commented Mar 26, 2018

This new default -u works great on my end, though I still get group as root. Maybe there is a default gid option to set it to current gid?

@BoPeng
Copy link
Contributor

BoPeng commented Mar 26, 2018

Could you test if -u {os.getuid()}:{os.getgid()} works?

gaow added a commit that referenced this issue Mar 26, 2018
@gaow
Copy link
Member Author

gaow commented Mar 26, 2018

It does -- see patch above!

@BoPeng BoPeng closed this as completed Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants