Support AppArmor profiles? #35

hugelgupf · 2018-05-05T04:01:03Z

Twitter conversation context:

I understand. I'm looking at protecting the container from external attacks, not container escapes (I can use gVisor to protect against this). I can do this on Ubuntu by running by Docker with --security-opt apparmor=<my_custom_profile> Will gVisor support this? -- @securityfoo

Supporting AppArmor profiles doesn't actually seem that hard:

runc does it at this step in the process: https://github.com/opencontainers/runc/blob/69663f0bd4b60df09991c08812a60108003fa340/libcontainer/standard_init_linux.go#L95
applying an apparmor profile is just writing the name of the profile you want applied to a proc file: https://github.com/opencontainers/runc/blob/69663f0bd4b60df09991c08812a60108003fa340/libcontainer/apparmor/apparmor.go#L48
not 100% sure how the name gets to runc (where in the OCI spec?) but you could easily do some code archaeology from one of these two links.

I don't know if this is something we want to support, but I'm going to leave all the bits here and let someone else decide.

hugelgupf · 2018-05-05T04:03:01Z

To be completely clear, this would be a profile applied to the sandbox, not to the application inside the sandbox.

hugelgupf · 2018-05-05T04:04:50Z

Which leaves open the question of -- when you specify an AppArmor profile to docker on the cmdline, do you expect that to be applied to the container/sandbox <-> process boundary or the container/sandbox <-> kernel boundary? I don't know if the spec even considers that or specifies it.

iangudger · 2018-05-05T06:07:58Z

What do you mean by external attacks? Protecting the sandbox from other processes on the host? Hardening the sandboxed app?

fvoznika · 2018-05-05T07:22:24Z

I think there are a few separate questions here:

Can I use AppArmor to protect the container from external attacks?
AppArmor is not meant for that. Instead, it limits what the container is allowed to access, similar to SELinux and seccomp-bpf. There are other technologies better suited to protect the container from external attacks.
Should I use AppArmor inside gVisor?
One of the main advantages of gVisor is that it isolates the container from the host without needing complex filters and policies. Thus, using AppArmor would be redundant and is not necessary.
Should I use AppArmor outside of gVisor?
The sandbox process runs as a low privileged user, inside isolated namespaces. This is more restrictive than common profiles. In addition, AppArmor profile should apply only to a single container, while gVisor sandbox runs multiple containers that can have conflicting configuration.

hugelgupf · 2018-05-05T21:31:35Z

One of the main advantages of gVisor is that it isolates the container from the host without needing complex filters and policies. Thus, using AppArmor would be redundant and is not necessary.

Right, that much we discussed on twitter, and obviously it doesn't make sense for gVisor to apply an AppArmor profile inside the sandbox. I'm asking the question from comment 2 from an API perspective.

When a user specifies --security-opt apparmor=foo, what do they expect? They probably expect that applications that don't fit the profile (make unallowed syscalls, etc) get killed (or whatever policy AppArmor applies). If you apply the profile to runsc rather than the application in the sandbox, that'd be unexpected behavior for some users.

So even if it was easy to support enforcing an AppArmor profile on the sandbox, I think it may lead to unexpected behavior and we should rather keep not supporting it and let it lead to an error message to explicitly say we don't support this use case.

Closes google#35 PiperOrigin-RevId: 195840128 Change-Id: I31c1ad9b51ec53abb6f0b485d35622d4e9764b29

Closes google#35 PiperOrigin-RevId: 195840128 Change-Id: I31c1ad9b51ec53abb6f0b485d35622d4e9764b29 Upstream-commit: e1b412d

Distributed training isn't working with PyTorch on certain A100 nodes. Adds the missing ioctl `UVM_UNMAP_EXTERNAL` allowing for certain NCCL operations to succeed when using [`torch.distributed`](https://pytorch.org/docs/stable/distributed.html), fixing distributed training. ## Reproduction This affects numerous A100 40GB and 80GB instances in our fleet. This reproduction requires 4 A100 GPUs, either 40GB or 80GB. - **NVIDIA Driver Version**: 550.54.15 - **CUDA Version**: 12.4 - **NVIDIA device**: NVIDIA A100 80GB PCIe ### Steps 1. **Install gvisor** ```bash URL="https://storage.googleapis.com/gvisor/releases/master/latest/${ARCH}" wget -nc "${URL}/runsc" "${URL}/runsc.sha512" chmod +x runsc sudo cp runsc /usr/local/bin/runsc sudo /usr/local/bin/runsc install sudo systemctl reload docker ``` 2. **Add GPU enabling gvisor options** ```json { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] }, "runsc": { "path": "/usr/local/bin/runsc", "runtimeArgs": ["--nvproxy", "--nvproxy-docker", "-debug-log=/tmp/runsc/", "-debug", "-strace"] } } } ``` Reload configs with `sudo systemctl reload docker`. 3. **Run reproduction NCCL test** This test creates one main process and N peer processes. Each peer process sends a torch `Tensor` to the main process using NCCL. ```Dockerfile # Dockerfile FROM python:3.9.15-slim-bullseye RUN pip install torch numpy COPY <<EOF repro.py import argparse import datetime import os import torch import torch.distributed as dist import torch.multiprocessing as mp def setup(rank, world_size): os.environ["MASTER_ADDR"] = "localhost" os.environ["MASTER_PORT"] = "12355" dist.init_process_group("nccl", rank=rank, world_size=world_size, timeout=datetime.timedelta(seconds=600)) torch.cuda.set_device(rank) def cleanup(): dist.destroy_process_group() def send_tensor(rank, world_size): try: setup(rank, world_size) # rank receiving all tensors target_rank = world_size - 1 dist.barrier() tensor = torch.ones(5).cuda(rank) if rank < target_rank: print(f"[RANK {rank}] sending tensor: {tensor}") dist.send(tensor=tensor, dst=target_rank) elif rank == target_rank: for other_rank in range(target_rank): tensor = torch.zeros(5).cuda(target_rank) dist.recv(tensor=tensor, src=other_rank) print(f"[RANK {target_rank}] received tensor from rank={other_rank}: {tensor}") print("PASS: NCCL working.") except Exception as e: print(f"[RANK {rank}] error in send_tensor: {e}") raise finally: cleanup() def main(world_size: int = 2): mp.spawn(send_tensor, args=(world_size,), nprocs=world_size, join=True) if __name__ == "__main__": parser = argparse.ArgumentParser(description="Run torch-based NCCL tests") parser.add_argument("world_size", type=int, help="number of GPUs to run test on") args = parser.parse_args() if args.world_size < 2: raise RuntimeError(f"world_size needs to be larger than 1 {args.world_size}") main(args.world_size) EOF ENTRYPOINT ["python", "repro.py", "4"] ``` Build image with: ``` docker build -f Dockerfile . ``` Then run it with: ``` sudo docker run -it --shm-size=2.00gb --runtime=runsc --gpus='"device=GPU-742ea7fc-dd4f-612c-e860-499bf200a815,GPU-94a801d8-7713-acf6-337d-338b7cfdf19e,GPU-0d19cef2-10ce-e445-a0be-3d330e36c1fd,GPU-ac5046fb-020c-93e8-2784-f44aedbc5bbd"' 040a44863fb1 ``` #### Failure (truncated) ``` ... Exception raised from recvBytes at ../torch/csrc/distributed/c10d/Utils.hpp:672 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7edda14cf897 in /usr/local/lib/python3.11/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x5b3a23e (0x7edd8d73a23e in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #2: c10d::TCPStore::doWait(c10::ArrayRef<std::string>, std::chrono::duration<long, std::ratio<1l, 1000l> >) + 0x2c7 (0x7edd8d734c87 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #3: c10d::TCPStore::doGet(std::string const&) + 0x32 (0x7edd8d734f82 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #4: c10d::TCPStore::get(std::string const&) + 0xa1 (0x7edd8d735fd1 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #5: c10d::PrefixStore::get(std::string const&) + 0x31 (0x7edd8d6ea371 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #6: c10d::PrefixStore::get(std::string const&) + 0x31 (0x7edd8d6ea371 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #7: c10d::PrefixStore::get(std::string const&) + 0x31 (0x7edd8d6ea371 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #8: c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, bool, std::string const&, int) + 0xa9 (0x7edd54da9189 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so) frame #9: c10d::ProcessGroupNCCL::getNCCLComm(std::string const&, c10::Device&, c10d::OpType, int, bool) + 0xc50 (0x7edd54db0610 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so) frame #10: c10d::ProcessGroupNCCL::recv(std::vector<at::Tensor, std::allocator<at::Tensor> >&, int, int) + 0x5f8 (0x7edd54dcf978 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so) frame #11: <unknown function> + 0x5adc309 (0x7edd8d6dc309 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #12: <unknown function> + 0x5ae6f10 (0x7edd8d6e6f10 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #13: <unknown function> + 0x5ae6fa5 (0x7edd8d6e6fa5 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #14: <unknown function> + 0x5124446 (0x7edd8cd24446 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #15: <unknown function> + 0x1acf4b8 (0x7edd896cf4b8 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #16: <unknown function> + 0x5aee004 (0x7edd8d6ee004 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #17: <unknown function> + 0x5af36b5 (0x7edd8d6f36b5 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so) frame #18: <unknown function> + 0xd2fe8e (0x7edda032fe8e in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_python.so) frame #19: <unknown function> + 0x47f074 (0x7edd9fa7f074 in /usr/local/lib/python3.11/site-packages/torch/lib/libtorch_python.so) <omitting python frames> frame #35: <unknown function> + 0x29d90 (0x7edda2029d90 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #36: __libc_start_main + 0x80 (0x7edda2029e40 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #37: <unknown function> + 0x108e (0x55f950b0c08e in /usr/local/bin/python) . This may indicate a possible application crash on rank 0 or a network set up issue. ... ``` ### Fix gvisor debug logs show: ``` W0702 20:36:17.577055 445833 uvm.go:148] [ 22: 84] nvproxy: unknown uvm ioctl 66 = 0x42 ``` I've implemented that ioctl in this PR. This is the output after the fix. ``` [RANK 2] sending tensor: tensor([1., 1., 1., 1., 1.], device='cuda:2') [RANK 0] sending tensor: tensor([1., 1., 1., 1., 1.], device='cuda:0') [RANK 1] sending tensor: tensor([1., 1., 1., 1., 1.], device='cuda:1') [RANK 3] received tensor from rank=0: tensor([1., 1., 1., 1., 1.], device='cuda:3') [RANK 3] received tensor from rank=1: tensor([1., 1., 1., 1., 1.], device='cuda:3') [RANK 3] received tensor from rank=2: tensor([1., 1., 1., 1., 1.], device='cuda:3') PASS: NCCL working. ``` FUTURE_COPYBARA_INTEGRATE_REVIEW=#10610 from luiscape:master ee88734 PiperOrigin-RevId: 649146570

shentubot closed this as completed in 94b0ab0 May 8, 2018

chanwit pushed a commit to chanwit/gvisor that referenced this issue May 10, 2018

Error if container requires AppArmor, SELinux or seccomp

e1b412d

Closes google#35 PiperOrigin-RevId: 195840128 Change-Id: I31c1ad9b51ec53abb6f0b485d35622d4e9764b29

tonistiigi pushed a commit to tonistiigi/gvisor that referenced this issue Jan 29, 2019

Error if container requires AppArmor, SELinux or seccomp

9bfd34c

Closes google#35 PiperOrigin-RevId: 195840128 Change-Id: I31c1ad9b51ec53abb6f0b485d35622d4e9764b29 Upstream-commit: e1b412d

tonistiigi pushed a commit to tonistiigi/gvisor that referenced this issue Jan 30, 2019

Error if container requires AppArmor, SELinux or seccomp

78c42bc

Closes google#35 PiperOrigin-RevId: 195840128 Change-Id: I31c1ad9b51ec53abb6f0b485d35622d4e9764b29 Upstream-commit: e1b412d

amscanne pushed a commit to amscanne/gvisor that referenced this issue May 6, 2020

Add variable controlling the go binary path. (google#35)

462bafb

ekzhang mentioned this issue Mar 13, 2023

Hard links in user namespace cause gVisor to ENOENT #8688

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support AppArmor profiles? #35

Support AppArmor profiles? #35

hugelgupf commented May 5, 2018

hugelgupf commented May 5, 2018

hugelgupf commented May 5, 2018

iangudger commented May 5, 2018

fvoznika commented May 5, 2018 •

edited

Loading

hugelgupf commented May 5, 2018 •

edited

Loading

Support AppArmor profiles? #35

Support AppArmor profiles? #35

Comments

hugelgupf commented May 5, 2018

hugelgupf commented May 5, 2018

hugelgupf commented May 5, 2018

iangudger commented May 5, 2018

fvoznika commented May 5, 2018 • edited Loading

hugelgupf commented May 5, 2018 • edited Loading

fvoznika commented May 5, 2018 •

edited

Loading

hugelgupf commented May 5, 2018 •

edited

Loading