-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hvt_drop_privileges() behaviour #282
Comments
i think for FreeBSD a call to cap_enter(2) would be the way to go. |
turns out, that cap_enter(2) will not work because because ppoll(2) is not permitted in capability mode. |
I've been thinking about what to do here for the Linux case. There are two cases where the behaviour should be quite different:
Therefore, I think that for the first case (classic system) we should require running as non-root and make it the user's responsibility to ensure that access to In the second case, this requires defining what "running in a container" means and then being able to detect it. I'd prefer to avoid various heuristics (see e.g. here: https://github.com/genuinetools/bpfd/blob/master/proc/proc.go) and instead explicitly support only the case where the tender is running as the sole PID 1, which implies it's inside a PID namespace, which to me seems a "good enough" way to detect if it is running in a "container". So, if and only if:
returns true, we allow running as Will try to pass this by some Linux container experts for opinions... |
On the FreeBSD side, it looks like we might have to revert this entirely (see #312). Update on the Linux side -- I'm no longer convinced we should do anything along the lines of classic privilege dropping there. Rather, with the introduction of "spt" (#310) we should look into applying a seccomp sandbox to the hvt tender also. |
Ok, so, in the light of:
I'm going to make an executive decision here, which is: The FreeBSD privilege dropping introduced in #286 will be reverted on Also, I'm going to close all issues/PRs related to this except #282, since the multiple discussions are confusing. We can discuss how to proceed forward there. Note to self: TODO: The existing VM cleanup code for FreeBSD and OpenBSD should be audited and at |
This is a large change, and the majority of it is effectively a full re-write of the build system. The actual removal of hvt compile-time specialization is fairly straightforward. User-visible changes: - 'configure.sh' now needs to be run manually before running 'make', this is consistent with POLA. - conversely, 'make clean' no longer cleans Makeconf. Use 'distclean' or 'clobber' for that. - 'configure.sh' will now print the targets that can (will) be built on this system. The strategy is still "build everything we can", however I have disabled Genode on all systems except Linux due to toolchain issues. - You can now build a subset of targets from the top-level 'make', by specifying 'CONFIG_XXX=' (disable) or 'CONFIG_XXX=1' (enable) either on the command line, or editing the generated Makeconf. - Makefiles use silent rules by default. To get the old verbose ones back, use 'make V=1'. - The 'solo5-hvt' tender is no longer "specialized" to the unikernel. We build two tenders, 'solo5-hvt' with all non-debug modules configured and 'solo5-hvt-debug' with additional debug modules (gdb, dumpcore where available). - 'solo5-hvt-configure' is kept around for now for backward compatibility with OPAM/MirageOS but is essentially a NOP. Developer-visible changes: - The build system now has proper support for auto-generation of dependencies. This means you can safely edit source files, run make and be sure you will get a complete incremental build. - Makefiles have been refactored to use common best practices, remove repetition, consistent variable names and clear interfaces between configure.sh/Makeconf/Makefiles, all the while keeping them simple enough to understand for me on a Monday morning before coffee. I.e. limit use of macros, eval, etc. - hvt tender modules are no longer defined by compile-time flags, instead a dynamic array is placed into a special ELF section (.modules). This means that a hvt tender binary can be combined from an arbitrary set of hvt_module_XXX object files, which is the right way to do things going forward and also simplifies the build system (not needing to build multiple targets from the same set of sources). Shortcomings / TODOs: - Dependency files (*.d) are stored in-tree. I spent several days on trying to figure out how to get them to work out of tree, but in combination with the non-recursive use of subdirectories in 'bindings' I could not figure out the required Makefile magic. - HVT_DROP_PRIVILEGES=0 is non-functional with the new modules arrangement, but needs a re-design anyway. Other changes included as part of this PR: - Revert privilege dropping on FreeBSD (see discussion in Solo5#282). - The build system changes effectively implement option 1 in Solo5#292, i.e. on x86_64 -m no-red-zone is only used for bindings, not for application code. - tests/tests.bats has been refactored for DRY as it was getting totally unmaintainable.
#366 implements a capsicum(4) sandbox for Solo5/hvt on FreeBSD 12+. Removing the FreeBSD labels from this, as I consider this done there. |
#276 adds scaffolding for
hvt_drop_privileges()
and an OpenBSDpledge
implementation.hvt_drop_privileges()
is called by the tender just before entering the VCUP loop. i.e.We should decide what the behaviour should be on Linux and FreeBSD, to get "secure by default" behaviour. At this time I'm not prepared to add actual
seccomp()
filtering or Capsicum here, but I think that the behaviour should include at least:root
.chroot()
to a suitable directory.@hannesm: Presumably the FreeBSD case is easier, since there will be a suitable non-privileged UID always available and
/var/empty
could be used for the chroot?For the Linux case, I'm not sure what the best default behaviour that is guaranteed to work cross-distribution is -- could
nobody
be used as the unprivileged user? I don't know of any equivalent of/var/empty
, but there might be something in the LSB/FHS.A second case is when this is run out of a container in Linux -- can we guarantee that a
nobody
will always be available there?/cc @adamsteen
The text was updated successfully, but these errors were encountered: