Skip to content
WALDEMAR KOZACZUK edited this page Nov 27, 2022 · 20 revisions

Introduction

By design, OSv has always been a "fat" unikernel and by default has provided a large subset of glibc functionality and has included full standard C++ library (libstdc++), the ZFS implementation, drivers for many devices, and has supported many hypervisors. On one hand, it makes running arbitrary applications on any hypervisor very easy using a single universal kernel. On another hand, such universality comes with the price of the bloated kernel with many symbols and drivers and possibly ZFS unused, thus causing inefficient memory usage, longer boot time, and potential security vulnerabilities. In addition, the C++ applications linked against a version of libstdc++ different than the version the kernel was linked against, may simply not work.

To address these issues, release 0.57 introduced a new experimental build mode to hide the non-glibc symbols and libstdc++ and extracted ZFS code out of the kernel in form of a dynamically linked library. It also provided another build option to tailor the kernel to a set of specific device drivers - 'driver profiles', and finally allowed building a version of the kernel with a subset of glibc symbols needed to support a specific application. The following paragraphs explain these changes and features in more detail.

Hiding most symbols and standard C++ library

As issue 97 explains, OSv by default exports all symbols including full standard C++ library. This has many negative implications - overly large kernel ELF, thus slower boot time, and potential for a collision of the kernel symbols with the application ones. On top of this, the C++ applications may not work if linked with the version of libstdc++ different than the OSv kernel.

To address these, release 0.57 added a new build option conf_hide_symbols that would hide most non-glibc and the libstdc++ symbols if enabled. In essence, most source files except the ones under the musl/ and libc/ directories would be compiled with the special compiler flags - -fvisibility=hidden and -fvisibility-inlines-hidden (C++ only) if conf_hide_symbols enabled. On another hand, the symbols to be exposed as public (like the glibc ones), would be annotated with OSV_***_API macros that translate to __attribute__ ((visibility ("default"))). Finally, in order to remove all unneeded code - "garbage", all files would be compiled with the flags -ffunction-sections -fdata-sections and linked with the flag --gc-sections. For more details about symbol visibility please read this article.

./scripts/build image=native-example fs=rofs conf_hide_symbols=1 -j$(nproc)

The kernel ELF file, built with most symbols hidden as above is roughly 3.6M in size, compared to 6M - a reduction of ~ 40%. This great reduction stems from the fact that the libstdc++ is no longer linked with --whole-archive, the symbol table is much smaller and unsued code is garbage collected. Please note the resulting kernel is still universal as it exports all glibc symbols and includes all device drivers.

The conf_hide_symbols is disabled by default because there are still some OSv modules like httpserver-api and more importantly some unit tests that depend on some internal C++ ABI of OSv. For more details look at this issue.

The list of the public symbols exported by the kernel is actually enforced during the build process based on the symbol list files for each advertised library like libc.so.6 located and maintained under the directory /exported_symbols (for more details look at this readme). These files are concatenated using scripts/generate_version_script.sh to the $(out)/version_script and fed to the linker as an argument to --version-script flag.

For more insight into the technical details, please look at this commit and that one.

ZFS as a module

ZFS is one of the three filesystems that OSv implements. The original code comes from FreeBSD circa 2014 and has been since integrated and adapted to work within OSv. ZFS is one of the best file systems for persistent mutable data and what one needs to run mysql or elasticsearch on OSv for example. Although a lot of effort has gone into integrating ZFS into OSv, its sheer code base volume made OSv kernel bigger than it needs to be and there are a plethora of stateless use cases like redis, memcached, microservices, etc where ZFS is not needed and instead we can use ROFS now.

To allow to optionally include ZFS, we extracted the ZFS code out of the kernel in form of a dynamically shared library (aka module) as part of the 0.57 release. The ZFS-less kernel is now roughly 800K smaller, thus making OSv images smaller, using less memory, and booting slightly faster. The ZFS library - libsolaris.so - can be loaded from the ROFS filesystem and the ZFS filesystem mounted on the same or different disk as explained in this wiki and illustrated by this example:

# include zfs module that puts /usr/lib/fs/libsolaris.so on RoFS filesystem
./scripts/build image=native-example,zfs fs=rofs_with_zfs

Please look at this commit to better understand how ZFS code got extracted from the kernel and this one to get insight into the logic behind loading the libsolaris.so and mounting the ZFS filesystem.

Driver Profiles

The release 0.57 introduced a new build mechanism that allows the creation of a custom kernel with a specific list of device drivers intended to target a given hypervisor. Such kernel benefits from the smaller size and better security as all unneeded code is removed.

In essence, we introduce a new build script and makefile parameter: drivers_profile. This new parameter is intended to specify a drivers profile which is simply a list of device drivers to be linked into the kernel with some extra functionality like PCI or ACPI these drivers depend on. Each profile is specified in a tiny make include file (*.mk) under the conf/profiles/$(arch) directory and included by the main makefile as requested by the drivers_profile parameter. The main makefile has a number of ifeq expressions that add conditionally given driver object file to the linked objects list depending on the value (0 or 1) of the given conf_drivers_* variable specified in the relevant profile file.

The benefits of using drivers are most profound when building a kernel with most symbols hidden. Below you can see examples of some build commands along with the kernel size produced:

./scripts/build fs=rofs conf_hide_symbols=1 image=native-example #all drivers
3632K	build/release/kernel-stripped.elf

./scripts/build fs=rofs conf_hide_symbols=1 image=native-example drivers_profile=virtio-pci
3380K	build/release/kernel-stripped.elf

./scripts/build fs=rofs conf_hide_symbols=1 image=native-example drivers_profile=vmware
3308K	build/release/kernel-stripped.elf

./scripts/build fs=rofs conf_hide_symbols=1 image=native-example drivers_profile=virtio-mmio
3120K	build/release/kernel-stripped.elf

./scripts/build fs=rofs conf_hide_symbols=1 image=native-example drivers_profile=base #most drivers out
3036K	build/release/kernel-stripped.elf

It is also possible to enable or disable individual drivers on 
top of what given profiles defines like so:

./scripts/build fs=rofs conf_hide_symbols=1 image=native-example drivers_profile=base \
  conf_drivers_acpi=1 conf_drivers_virtio_fs=1 conf_drivers_virtio_net=1 conf_drivers_pvpanic=1

To get a better understanding of how the driver profiles are implemented please look at this commit.

App-specific kernel

The release 0.57 introduced another new build mechanism that allows the creation of a custom kernel by exporting only symbols required by a specific application. Such kernel benefits from the smaller size and better security as all unneeded code is removed. This new build mechanism is part of the modularization/librarization functionality explained by the issue 1110 and this part of the roadmap. This idea was also mentioned in the P99 OSv presentation - see slide 12.

In essence, this new mechanism relies on two new scripts that analyze the build manifest, detect application ELF files, identify symbols required from the OSv kernel, and finally, produce an application-specific version script under build/last/app_version_script:

  • scripts/list_manifest_files.py - reads build/last/usr.manifest and produces a list of file paths on the host filesystem
  • scripts/generate_app_version_script.sh - iterates over manifest files produced by list_manifest_files.py, identifies undefined symbols in the ELF files using objdump that are also exported by OSv kernel, and finally generates build/last/app_version_script

Please note that this new functionality only works when building a kernel with most symbols hidden (conf_hide_symbols=1).

To take advantage of this new feature one would follow these steps:

  1. Build an image for a given application.
  2. Run scripts/generate_app_version_script.sh to produce app_version_script.
  3. Re-build the image with kernel exporting only symbols needed by an app like so:
./scripts/build fs=rofs conf_hide_symbols=1 image=golang-pie-example \
 conf_version_script=build/last/app_version_script

In the example above, the version script generated for the golang ELF list only 30 symbols.

Some experiments show that for many apps this can reduce kernel size by close to 0.5MB. For example, the size of the kernel tailored to the golang app above is 3196K vs 3632K of the generic ones. Obviously, this feature can be used together with the driver profiles to further reduce kernel size. The kernel produced with the build command below to run the same golang app on firecracker is only 2688K in size:

./scripts/build fs=rofs conf_hide_symbols=1 image=golang-pie-example \
 drivers_profile=virtio-mmio conf_version_script=build/last/app_version_script

Please note that some application use dlsym() to dynamically resolve symbols which would be missed by this technique. In such scenarios such symbols would have to be manually added to app_version_script.

For more details please see this commit.

Future enhancements

In the future, we may componentize other functional elements of the kernel. For example, the DHCP lookup could be either loaded from a separate library or compiled in/out depending on the build option. There may be other examples of such functionality.

Clone this wiki locally