Skip to content

Filesystem isolation layer #4794

@abitrolly

Description

@abitrolly

/kind feature

Description

podman + SELinux = Leaky Abstraction. In my story containers provide isolated environment to help users concentrate on getting their app logic right and not think about low level details and permissions required to keep their systems stable and secure.

That worked good with Docker + Ubuntu, except for the part that docker process is itself is run as root, and that made users like me feel uneasy. podman appearance solved exactly this problem for me.

What makes my think that podman on Fedora is a leaky abstraction is that after one year of trying to adopt podman into my workflow I know a lot of information that I don't need to know, and yet I am still not there.

Yesterday I was able to solve the problem I started with a year ago thanks to the knowledge that I acquired about SELinux labelling, :z and :Z prefixes, USER and uid/gid mappings (that are not directly related to this issue, but learning them was necessary to remove unfitting pieces of different puzzle from my head). The problem is that python function shutil.copystat copies extended attributes and because container uses the same filesystem as host, SELinux denies this operation when you, for example, copystat files from /bin to keep them executable or read-only as before. There was a mistake in my volume mount command, which resulted in copying files from /bin instead of from mounted /src/bin. But I could figure it out only a year later when I did the same mistake.

I thought that the problem is solved. I copy files for my build inside container only from /src/bin. But today I realized that problem is not solved, because the build system copies system installed libs to build subtree, and SELinux vs copystat problem popped up again. I don't see who can I fix this, and I don't think that going down this rabbit hole is right way anyway.

What I really want from podman is Filesystem isolation level where the filesystem in container is completely isolated from the host, and volumes are no different from any other isolated container dir. No filesystem operation from inside container should trigger SELinux or other additional filesystem or kernel drivers on the host, and hence no SELinux properties (or any other kernel drivers) should be visible in container. If volume on the host contains those labels, the modifications to these files and labels should be done as if those files are created and modified by any user level program, such as vim.

Steps to reproduce the issue:

  1. Build any snap with [stage-packages]
podman run --rm -it -v /home/user/linux:/src:Z -w /src/snapcrafting/amend yakshaveinc/snapcraft:core18 snapcraft

Describe the results you received:

...
  File "/snap/snapcraft/current/usr/lib/python3.5/shutil.py", line 252, in copy2
    copystat(src, dst, follow_symlinks=follow_symlinks)
  File "/snap/snapcraft/current/usr/lib/python3.5/shutil.py", line 219, in copystat
    _copyxattr(src, dst, follow_symlinks=follow)
  File "/snap/snapcraft/current/usr/lib/python3.5/shutil.py", line 159, in _copyxattr
    os.setxattr(dst, name, value, follow_symlinks=follow_symlinks)
PermissionError: [Errno 13] Permission denied: '/src/snapcrafting/amend/parts/amend/ubuntu/download/libgssapi3-heimdal_7.5.0+dfsg-1_amd64.deb'
...

Describe the results you expected:

...
Cleaning later steps and re-staging amend ('stage' property changed)
Priming amend 
'confinement' property not specified: defaulting to 'strict'
'grade' property not specified: defaulting to 'stable'
Snapping 'amend' |                                                                                                                                         
Snapped amend_0.1.0_amd64.snap

Additional information you deem important (e.g. issue happens only occasionally):

To keep the user story short - as a user of container I don't want to think about SELinux attributes on my host if my unprivileged container with a volume deep inside my home tries to play with some files.

The solution I tested for LXD on Ubuntu is to mount filesystem as 9p over the network through FUSE yakshaveinc/linux#32 I don't have money to keep focus and make a proper solution out of it (add encryption and integrate with LXD), but as proof of concept it works.

Output of podman version:

Version:            1.6.2
RemoteAPI Version:  1
Go Version:         go1.13.1
OS/Arch:            linux/amd64

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.locked - please file new issue/PRAssist humans wanting to comment on an old issue or PR with locked comments.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions