Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable user namespaces and seccomp #1172

Closed
CTassisF opened this issue Oct 19, 2015 · 22 comments
Closed

Enable user namespaces and seccomp #1172

CTassisF opened this issue Oct 19, 2015 · 22 comments
Labels
Waiting for internal comment Waiting for comment from a member of the Raspberry Pi engineering team

Comments

@CTassisF
Copy link

Hello.

I am trying to use Firejail[0] on my RaspberryPi running Raspbian but it shows two warnings:

Warning: user namespaces not available in the current kernel.
Warning: seccomp disabled, it requires a Linux kernel version 3.5 or newer.

Can you enable this features on the next kernel release?

My current uname -a is:
Linux RaspberryPi 4.1.7+ #817 PREEMPT Sat Sep 19 15:25:36 BST 2015 armv6l GNU/Linux

Thanks in advance,
César

[0] https://l3net.wordpress.com/projects/firejail/

@Ferroin
Copy link
Contributor

Ferroin commented Oct 20, 2015

It is also worth noting that both of these are required for Chromium and Google Chrome to properly implement sandboxing of their various sub-components.

@Ruffio
Copy link

Ruffio commented Aug 17, 2016

@CTassisF has your issue been resolved? If so, please close this issue. Thanks.

@mpx
Copy link

mpx commented Aug 28, 2016

User namespaces are disabled from the current kernel:

root@pi4:~# uname -r
4.4.13-v7+
root@pi4:~# modprobe configs
root@pi4:~# zgrep -E 'CONFIG_(USER_NS|SECCOMP)' /proc/config.gz
# CONFIG_USER_NS is not set
CONFIG_SECCOMP=y

@pelwell
Copy link
Contributor

pelwell commented Aug 28, 2016

With all kernel options that aren't simply loadable modules we are concerned about increased kernel size and reduced performance. If you were to present "before and after" comparisons of free memory and performance (using some standard benchmarks) it may strengthen your case.

@jcberthon
Copy link

@pelwell what kind of benchmark are you after? Disk block IO, file system IO, CPU cycles, etc.?

I want seccomp for Docker on the Raspberry Pi, it is an important protection. And user namespace as well, this allow for mapping user ID within a container to different user ID on the host. This is important for example for user ID 0 (aka root)! This is not limited to Docker, but LXC and LXD requires user namespace for running unprivileged containers.

So I'm ready to measure memory before and after, but for benchmark, what do you expect?

@pelwell
Copy link
Contributor

pelwell commented Oct 7, 2016

How about running dd from /dev/zero to /dev/null with three block sizes - 1, 4k and 1024k - and a count set to make each one take at least 10 seconds? I'm concerned about performance for non-sandboxed processes, but I would be curious to compare with sandboxed as well, so a 3x3 grid of results (MB/s for each of three sizes in a kernel without seccomp, with seccomp, and sandboxed) would be great.

@jcberthon
Copy link

Ok @pelwell I can do this. Give me a week in order to build the kernel and do the benchmarks. I'm a young dad so I need time! ;-)
If time allows I will try to activate the AppArmor LSM as another possible benchmark.

@jcberthon
Copy link

Hi @pelwell

I haven't finish testing but I can already give a status update on the following:

  • Recompilation of the kernel with seccomp and user namespace was a success;
  • There was no noticeable increase in memory (I've been using dmesg | grep Memory and basically the Kernel code, rwdata, etc. are all less than 1% difference, it is in the order of a few KB);
  • Early tests with dd are identical for 1 and 1024k, but a slight decrease of performance with 4k (from 1.1GB/s to 0.9GB/s) however using apache bench (from my desktop) to a nginx static site (on the Raspberry Pi) there was no difference or improved performance (up to 22% in request/s)
  • BUT (and it is a big one), seccomp support for docker is only possible with the user space tool at version 2.2.1 or above which excludes Debian Jessie. Docker provides static binary for Debian Jessie with seccomp support (probably the seccomp lib is statically linked to this binary) but only for x86 (32 and 64 bit), not for ARMv7. So I would need to get the source and compile myself the seccomp user space binaries and then docker binaries. Too much work for me right now :-(
  • I will not drop this issue. Before this weekend, I will try to get a new Kernel with only the user namespace and AppArmor compiled in. That should be already a big improvement in terms of container security. With this new kernel I can perform again the benchmark as proposed.

PS: I've added a small benchmark which run a static web page (a simple HTML page) with nginx. The benchmark is done from my desktop using apache bench, my desktop is powerful enough to overcome the rpi if necessary ;-) I will try to test also with a generated website maybe something like ghost or wordpress, which ever is easier to install on the rpi and in a container. I will give more details when I'm done with testing.

@pelwell
Copy link
Contributor

pelwell commented Oct 10, 2016

That's looking good so far. The dd test is almost a worst case, with very large numbers of userspace-to-kernel round-trips. As such, a 10-20% performance drop doesn't sound too bad, but others may disagree - that isn't a final decision.

@jcberthon
Copy link

jcberthon commented Oct 13, 2016

It seems I cannot test further, I was successful at compiling a new kernel with AppArmor support and to run it. It works well but not with Docker (moby/moby#27351), I'm investigating with some support from Docker.

Anyway, the impact of having AppArmor installed has decreased the performance a bit further. Now with AppArmor, seccomp filtering and user namespace activated I am getting the following results:

  • The dd '1 byte' test:
    • Defaut kernel: 521 kB/s
    • My Kernel: 411 kB/s (-21%)
  • The dd '4kB' test:
    • Default kernel: 991 MB/s
    • My kernel: 854 MB/s (-16%)
  • The dd '1MB' test:
    • Default: 1.1GB/s
    • Mine: 1.1GB/s

My changes to the default config are in this branch on a fork I made: https://github.com/jcberthon/linux/tree/rpi-sec-apparmor-seccomp-userns

I will write later to described other tests I have conducted and also if I have any update on Docker with AppArmor on ARM.

@jcberthon
Copy link

Just a quick update. The problem of Docker when AppArmor is active on ARM has been solved and merged. The fix will be available in Docker 1.12.3. I've installed the patch and can now run Docker successfully on Raspberry Pi with my improved kernel.

In the coming days I'll provide a pull request with the changes. So if it is decided that this issue should be fixed, it will simply a matter of reviewing and possibly merging my changes.

I now have to find the time to do some benchmarking inside the container with the official and my kernel. Although trivial I'm lacking time, so do not expect much feedback from me in the next 10-20 days.

@pelwell
Copy link
Contributor

pelwell commented Oct 20, 2016

Thanks for the update - take your time, we'll still be here.

@jcberthon
Copy link

Just for information: the CONFIG_USER_NS=y is set in rpy-4.6.y and newer branches since Jul 28.

See these commits on 4.6 and 4.7 by @popcornmix: 4.6 39f02dd#diff-d578de903015b334ab3f9f22d7055058 and 4.7 c2b66ab#diff-d578de903015b334ab3f9f22d7055058

And it is in the baseline config from 4.8 on.

My other proposed changes are not included in newer branches (4.5 to 4.9)

@jcberthon
Copy link

Hi

I have concluded the benchmarking using dd as suggested by @pelwell.

I have tested dd in 3 settings with 1 byte (test1), 4kB (test2) and 1MB (test3) blocks and configure it so that each tests run within 20-30s. Each tests was run 3 times and I computed the average. My platform was a Raspberry Pi 2 headless (using SSH, no monitor or keyboard or X11).

The tests were run in 5 different environments, with the Raspbian vanilla kernel (4.4.27-v7+), with the Raspbian kernel configs and the User Namespace and SECCOMP filters active, and then adding also AppArmor. So 3 different kernels, and on the vanilla kernel and the kernel with UserNS+SECCOMP-filters+AppArmor, I run the benchmark in a Docker container. So in total that's 5 environments.

For Docker, I used 1.12.3-rc1 which contains a patch allowing it to run on ARM with AppArmor, and I used the Debian:jessie image from armhf (https://hub.docker.com/r/armhf/debian/). Note that since yesterday the final 1.12.3 has been published, but I did not retest it.

tldr; Performance impact is +-7% when using UserNS+SECCOMP-filters compare to the vanilla Kernel. But it is up to -23% impact when using UserNS+SECCOMP-filters+AppArmor compare to the vanilla Kernel. Within Docker, the benchmark is always about 2% faster than on the host itself.

Detailed results:

Bench Vanilla Vanilla Docker Impact +UserNS +SECCOMP Impact +AppArmor Impact +AA Docker Impact wrt host Impact wrt Vanilla Docker
Test1 (kB/s) 507,33 516,33 1,77 % 508,00 0,13 % 395,00 -22,14 % 399,33 1,10 % -22,66 %
Test2 (MB/s) 1.014,13 1.026,13 1,18 % 935,67 -7,74 % 771,00 -23,97 % 860,67 11,63 % -16,13 %
Test3 (MB/s) 1.058,13 1.092,27 3,23 % 1.126,40 6,45 % 1.126,40 6,45 % 1.126,40 0,00 % 3,13 %

The columns named "Impact" are the amount in percent of change between the previous column and the baseline which is the vanilla Raspbian kernel, except when specified otherwise.

Conclusion: including UserNS and SECCOMP filters does not seem to have much impact. Activating AppArmor can produce up to 22% performance impact in worth case scenarios, but in normal use, the impact should not be felt. Using these flags has not impacted the Kernel stability, my Raspberry Pi has been up and running during the last weeks with the self generated kernels and I did not have a single application or system crash or unexpectedly not running.

@JamesH65
Copy link
Contributor

@popcornmix @pelwell Not a huge impact to performance, do we want to include this?

@JamesH65 JamesH65 added the Waiting for internal comment Waiting for comment from a member of the Raspberry Pi engineering team label May 18, 2017
@xmycroftx
Copy link

xmycroftx commented Oct 16, 2017

Yes, please? Why the heck not?

@iam-TJ
Copy link

iam-TJ commented Oct 18, 2017

It seems this issue can be closed since the requested changes are present in the latest kernel:

$ cat /etc/issue; uname -r; apt-cache policy raspberrypi-kernel; zgrep 'SECCOMP|_NS=' /proc/config.gz
Raspbian GNU/Linux 8 \n \l

4.9.35-v7+
raspberrypi-kernel:
Installed: 1.20170703-1
Candidate: 1.20170703-1
Version table:
*** 1.20170703-1 0
500 http://archive.raspberrypi.org/debian/ jessie/main armhf Packages
100 /var/lib/dpkg/status
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_NF_CONNTRACK_NETBIOS_NS=m

@jcberthon
Copy link

Hi @iam-TJ

Very strange as it is not visible in the config file: https://github.com/raspberrypi/linux/blob/rpi-4.9.y/arch/arm/configs/bcm2835_defconfig

Perhaps it is now automatically included by other flags.

That's now almost a year that I'm maintaining and running on my own build Kernel. So I can't really check.

@JamesH65
Copy link
Contributor

I believe that this stuff is now included in our standard kernel. Closing.

@jcberthon
Copy link

Hi @JamesH65

I just checked again, I have now another Raspberry Pi and I installed a clean Raspbian. When checking if all the flags in my Push Request (PR) are there, that is not the case but this is true that the SECCOMP one are now activated.

Here is the output:

$ cat /etc/issue; uname -r; sudo modprobe configs; zegrep "SECCOMP|_NS=|CG_|CGROUP|APPARMOR" /proc/config.gz
Raspbian GNU/Linux 9 \n \l
4.14.34-v7+
CONFIG_CGROUPS=y
# CONFIG_MEMCG_SWAP is not set
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_SLUB_MEMCG_SYSFS_ON is not set
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
# CONFIG_NETFILTER_XT_MATCH_CGROUP is not set
CONFIG_NET_CLS_CGROUP=m
# CONFIG_CGROUP_NET_PRIO is not set
CONFIG_CGROUP_NET_CLASSID=y
# CONFIG_TCG_TPM is not set

Compare to my PR, we can see that the SECCOMP and NS (name spaces) are now set. However most control groups (CGROUP or MEMCG) are still not set.

For instance when running Docker, it is not possible to support many resource control. When doing docker info at the end it warns;

WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support

With my PR, Docker is happy. But this is not limited to Docker, other container technologies (rkt, cri-o, Kubernetes, etc.) make use of them and even things like systemd can use them (and potentially, but I could be mistaken, snap and flatpak could use them).

I could use the issue #1605 to update my PR and get it merged. Or you could re-open this issue and I update my PR. But before I am putting some effort in this PR, will it be considered? (I'm asking because I have 4 very young kids and my free time is often very limited)

@JamesH65
Copy link
Contributor

Probably best on another PR/Issue, this specific issue (NS and SECCOMP) on Firejail appears to be solved.

@jcberthon
Copy link

Alright, and #1605 is also solved w.r.t. systemd 231. So I will create a new issue and PR. Thank you for the feedback and advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Waiting for internal comment Waiting for comment from a member of the Raspberry Pi engineering team
Projects
None yet
Development

No branches or pull requests

9 participants