Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

vendor: Update libcontainer vendoring #490

Merged
merged 3 commits into from
Mar 25, 2019

Conversation

sboeuf
Copy link

@sboeuf sboeuf commented Mar 18, 2019

The agent vendoring needs to be updated regarding the libcontainer
dependency so that it does not run into the following issue:

failed to write to cgroups.proc

when running kata-runtime exec commands.

Fixes #476

Signed-off-by: Sebastien Boeuf sebastien.boeuf@intel.com

@sboeuf
Copy link
Author

sboeuf commented Mar 18, 2019

Supersedes #477

@sboeuf
Copy link
Author

sboeuf commented Mar 18, 2019

/test

@sboeuf
Copy link
Author

sboeuf commented Mar 18, 2019

/test

@amshinde
Copy link
Member

lgtm @sboeuf
Nit: I prefer to keep the vendoring and local changes in separate commits.

@sboeuf
Copy link
Author

sboeuf commented Mar 18, 2019

@amshinde ok let me split that :)

@sboeuf
Copy link
Author

sboeuf commented Mar 18, 2019

/test

@sboeuf
Copy link
Author

sboeuf commented Mar 18, 2019

@amshinde done! PTAL

@amshinde
Copy link
Member

thanks @sboeuf !

@jodh-intel
Copy link
Contributor

Thanks @sboeuf. Please can you add in the git shortlog showing the changes this re-vendoring introduces by inserting the output of the scriptlet here into the commit:

Sebastien Boeuf added 3 commits March 19, 2019 06:00
The agent vendoring needs to be updated regarding the libcontainer
dependency so that it does not run into the following issue:

  "failed to write to cgroups.proc"

when running "kata-runtime exec" commands.

Shortlog since last vendoring of github.com/opencontainers/runc:

    9fe7c939 Add a Travis-CI job for systemd cgroup driver
    5369f9ad Skip CRIU tests when $RUNC_USE_SYSTEMD for now
    d4586090 Update tests that depend on cgroupfs paths to consider systemd cgroups
    a9056a34 Add $RUNC_USE_SYSTEMD to use systemd cgroup driver in tests
    4b2b9782 Add cgroup name to error message
    6f714aa9 Use getenv not secure_getenv
    dbf6e48d README: link to /org/security/
    2d4a37b4 nsenter: cloned_binary: userspace copy fallback if sendfile fails
    16612d74 nsenter: cloned_binary: try to ro-bind /proc/self/exe before copying
    af9da0a4 nsenter: cloned_binary: use the runc statedir for O_TMPFILE
    2429d593 nsenter: cloned_binary: expand and add pre-3.11 fallbacks
    7cb3cde1 fix preserve-fds flag may cause runc hang
    5b775bf2 nsenter: cloned_binary: detect and handle short copies
    52f4e0fa exec: expose --preserve-fds
    f1da0d30 switched travis to xenial
    9edb5494 Use vendored in CRIU Go bindings
    bfca1e62 Vendor in go-criu
    bb7d8b1f nsexec (CVE-2019-5736): avoid parsing environ
    cd41feb4 Remove detection for scope properties, which have always been broken
    7354546c Create mountpoints also on restore
    f661e023 factor out bind mount mountpoint creation
    0a8e4117 nsenter: clone /proc/self/exe to avoid exposing host binary to container
    ec069fe3 Vendor opencontainers/runtime-spec 29686dbc
    4a600c04 Update vendored golang.org/x/sys to latest
    565325fc integration: fix mis-use of libcontainer.Factory
    dd50c7e3 Add 'org.criu.config' annotation documentation
    5f32bb94 Update runc-checkpoint man-page
    28a697cc rootfs: umount all procfs and sysfs with --no-pivot
    f0192337 systemd: fix setting kernel memory limit
    acb75d0e libcontainer: intelrdt: fix null intelrdt path issue in Destroy()
    403986c5 Add CRIU patch to fix checkpoint test
    6f3e13cc Added test for container specific CRIU configuration files
    e1579630 Enable CRIU configuration files
    360ba8a2 Update criurpc definition for latest features
    0855bce4 Fix .Fatalf() error message
    bdf3524b Retry adding pids to cgroups when EINVAL occurs
    769d6c4a Fix some typos
    dce70cdf cr: get pid from criu notify when restore
    8a4629f7 cgroups: nokmem: error out on explicitly-set kmemcg limits
    07d1ad44 kill: allow to signal paused containers
    30817421 Modify check-config.sh in accordance with Moby Project updates
    a0200001 MAINTAINERS: remove @vmarmol
    2efedb02 MAINTAINERS: remove @rjnagal
    87a18899 may kill other process when container has been stopped
    061dfe95 VERSION: back to development
    ccb5efd3 VERSION: release v1.0.0~rc6
    bc0b0471 Small fixes for CRIU based test cases
    37634277 Bump CRIU to 3.11
    48189715 add missing intelRdt parameters in 'runc update' manpage
    e2386860 libcontainer: Set 'status' in hook stdin
    95af9eff libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc
    714a4d46 rootless: fix potential panic in shouldUseRootlessCgroupManager
    16d55f17 libcontainer: fix potential panic if spec.Process is nil
    95d1aa18 test: fix TestDupNamespaces
    f1b1407e readme: add nokmem build tag
    1e0d04c6 Makefile: rm cgo tag
    6a2c1559 libcontainer: ability to compile without kmem
    df3fa115 Add support for cgroup namespace
    869add33 rootless: fix running with /proc/self/setgroups set to deny
    5c6b9c3c libcontainer: map PidsLimit to systemd's TasksMax property
    9a3a8a5e libcontainer: implement CLONE_NEWCGROUP
    630fb5b8 Bump Travis versions
    6c307f8f libcontainer: intelrdt: add user-friendly diagnostics for Intel RDT operation errors
    d59b17d6 libcontainer: intelrdt: Add more check if sub-features are enabled
    f0973392 libcontainer: intelrdt: add test cases for Intel RDT/MBA
    1ed597bf libcontainer: intelrdt: add update command support for Intel RDT/MBA
    27560ace libcontainer: intelrdt: add support for Intel RDT/MBA in runc
    c1cece7e libcontainer: intelrdt: add Intel RDT/MBA docs in SPEC.md
    bd905416 vendor: bump runtime-spec to 5684b8af48c1
    0b412e94 various cleanups to address linter issues
    0d011647 Fix travis Go: tip
    36f84720 fix build break
    1499c746 Move spec.Linux.IntelRdt check to spec.Linux != nil block
    26bdc0dc clarify license information
    a1d5398a Respect container's cgroup path
    5de99cd3 tty: clean up epollConsole closing
    ec0d23a9 tty: close epollConsole on errors
    40f14684 keyring: handle ENOSYS with keyctl(KEYCTL_JOIN_SESSION_KEYRING)
    5963cf2a test: add more test case for CleanPath
    06f789cf Disable rootless mode except RootlessCgMgr when executed as the root in userns
    feb90346 doc: fix typo
    4eb30fcd code optimization: use securejoin.SecureJoin and CleanPath
    4fae8fcc code optimization after review
    d2d226e8 fix unexpected delete bug when container id is ..
    3ce8fac7 libcontainer: add /proc/loadavg to the white list of bind mount
    636b6640 linux: drop check for /proc as invalid dest
    b34d6d8a libcontainer: CurrentGroupSubGIDs -> CurrentUserSubGIDs
    fe3d5c4c Remove unused veth setup code
    832ac8a5 tests: add external network namespace tests
    fa43a72a criu: restore into existing namespace when specified
    b399167f Add docker proxy settings for make test in a proxy environment
    62a4763a When doing a copyup, /tmp can not be a shared mount point
    4803faf0 cr: don't restore net namespace by default
    cb3e35b5 Add missing data to man page
    26ec8a97 Revert "libcontainer/rootfs_linux: minor cleanup"
    e389f575 Dockerfile: update criu to v3.10 + checkpoint-restore/criu@27034e7c
    34ed6269 Update outdated nsenter README content
    a2faaa13 Fix duplicate entries and missing entries in getCgroupMountsHelper
    0880503b Add an explanation for TESTPATH
    3321aa1a Fix regression with mounts with non-absolute source path
    b681b58e Fix the problem TESTFLAGS is not to be used in Makefile correctly
    8187fb74 cr: don't dump network devices and their configuration
    46221e39 criu tests: rename criu feature check
    7fb79f31 Add osusergo flag to static build
    53fddb54 Pass GOMAXPROCS to init processes
    472fcb30 docs: add information about terminals
    e5a7c61f Add test for testing cgroup mounts on bedrock linux
    5ee0648b Stop relying on number of subsystems for cgroups
    823c06ea libcontainer: improve "kernel.{domainname,hostname}" sysctl handling
    d18a45f6 Stop using unix.SIGUNUSED which has been removed from golang.org/x/sys
    a0e99e7a libcontainer: devices: fix mips builds
    39f679c4 travis: test cross compilation
    c205e9fb libcontainer: fix compilation on GOARCH=arm GOARM=6 (32 bits)
    cbcc85d3 runc: not require uid/gid mappings if euid()==0
    aa3fee6c SELinux labels are tied to the thread
    bd3c4f84 Fix race in runc exec
    63bb0fe9 Fix merge conflict
    939d5a37 cgroup: clean up isIgnorableError for skippable EROFS
    c9381573 libcontainer: remove extra CAP_SETGID check for SetgroupAttr
    b515963c systemd cpu quota ignores -1
    fd0febd3 Wrap error messages during init
    cdb7f23d main: add condition to isRootless()
    f103de57 main: support rootless mode in userns
    9c7d8bc1 libcontainer: add parser for /etc/sub{u,g}id and /proc/PID/{u,g}id_map
    40680b2d Make the setupSeccomp function public.
    1b27db67 libcontainer/rootfs_linux: minor cleanup
    165ee453 Make channel for StartTransientUnit buffered
    1a506462 nsexec.c: fix GCC 8 warning
    4521d4b1 Only configure networking when creating a net ns
    0e16bd9b Detect whether Delegate is available on both slices and scopes
    8ab251f2 Fix systemd.Apply() to check for DBus error before waiting on a channel.
    73f3dc63 libcontainer: allow setgroup in rootless mode
    ed58366c libcontainer: fix Boolmsg alignment
    fd3a6e6c libcontainer: handle unset oomScoreAdj corectly
    03e58598 rootless: cgroup: treat EROFS as a skippable error
    0aa6e4e5 libcontainer/specconv/spec_linux: Support empty 'type' for bind mounts
    5a46c2ba nsenter: move namespace creation after userns creation

Fixes kata-containers#476

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Commit bd3c4f844abed063a0d0a8575eb596159f33732c is included through
the new libcontainer vendoring:
    Fix race in runc exec

    There is a race in runc exec when the init process stops just before
    the check for the container status. It is then wrongly assumed that
    we are trying to start an init process instead of an exec process.

    This commit add an Init field to libcontainer Process to distinguish
    between init and exec processes to prevent this race.

In order to prevent from breaking Kata Containers with this commit,
we have to provide explicit information if the process is the init
process or not, depending if we're creating a new container or exec'ing
a process on an existing container.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Latest libcontainer vendoring update introduced a new function as part
of the Container Go interface, OCIState().
The mockContainer interface needs to implement this too, otherwise the
tests won't compile.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
@sboeuf
Copy link
Author

sboeuf commented Mar 19, 2019

/test

@sboeuf
Copy link
Author

sboeuf commented Mar 19, 2019

@jodh-intel commit message updated. Thanks for reminding me, I had totally forgotten about that...

devimc
devimc previously approved these changes Mar 19, 2019
@sboeuf
Copy link
Author

sboeuf commented Mar 19, 2019

@devimc
Even with this patch, the CI seems to be flaky... Could you take a quick look at what's happening here and if you find some other fixes that we need to land in order to make our CI more stable?

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sboeuf.

lgtm

Copy link

@devimc devimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf did you measure the memory footprint impact of this change? see opencontainers/runc@0a8e411#commitcomment-32280015

cc @grahamwhaley

@devimc devimc dismissed their stale review March 19, 2019 14:41

I have a question

@devimc
Copy link

devimc commented Mar 19, 2019

changing to dnm, I have a question

@grahamwhaley
Copy link
Contributor

good catch @devimc - that had escaped my brain - yes, we should check with the metrics report generator at least @sboeuf

@sboeuf
Copy link
Author

sboeuf commented Mar 19, 2019

@devimc @grahamwhaley good point, no I haven't done this check since I thought you had done it when you advise to move to a new version of libcontainer.

@chavafg
Copy link
Contributor

chavafg commented Mar 21, 2019

taking a look at the CI failures here:
I see a failure in the fedora job on docker commit test, which has been fixed by kata-containers/tests#1333.

The other issue I see is a failure in the shimv2 tests, which I haven't seen before:

STEP: test stop container
STEP: Stop container for containerID: 01a6cf6987d076eddc83f191dc663d75597c60b06efd7f28c030f3c6f1cd55e6
E0320 21:42:44.157256   17975 remote_runtime.go:229] StopContainer "01a6cf6987d076eddc83f191dc663d75597c60b06efd7f28c030f3c6f1cd55e6" from runtime service failed: rpc error: code = Unknown desc = failed to stop container "01a6cf6987d076eddc83f191dc663d75597c60b06efd7f28c030f3c6f1cd55e6": container not running: unknown
Mar 20 21:42:44.157: INFO: Unexpected error occurred: rpc error: code = Unknown desc = failed to stop container "01a6cf6987d076eddc83f191dc663d75597c60b06efd7f28c030f3c6f1cd55e6": container not running: unknown
Mar 20 21:43:44.138: INFO: stop container "01a6cf6987d076eddc83f191dc663d75597c60b06efd7f28c030f3c6f1cd55e6" timeout.

restarting jobs...

@sboeuf
Copy link
Author

sboeuf commented Mar 21, 2019

Thanks @chavafg for the quick feedback.
@grahamwhaley should we retrigger the metrics several times on this PR?

@grahamwhaley
Copy link
Contributor

@sboeuf - I think the best way to look at impact is to use the report tool, as that way you get to do a side-by-side comparison of before/after on your machine, and the report shows a lot more useful detail than the metrics CI will.
If you only want to look at boot and memory impact, then use -t -d to the grabdata.sh in the tool, and that will save you a lot of time (won't run the fio tests for instance, that take ages).

@sboeuf
Copy link
Author

sboeuf commented Mar 21, 2019

Thanks @grahamwhaley
Any chance you can run the report tool @chavafg?

@chavafg
Copy link
Contributor

chavafg commented Mar 21, 2019

I still see failures on shimv2 on fedora and initrd jobs.
From http://jenkins.katacontainers.io/job/kata-containers-agent-fedora-PR/428/console:

container_stats_test.go:318: Verify container stats for sandbox "bb773f8ec63ebb273a1d2a67957f1dfe8b3d84dc33fdcf150cb3fc5de62738c4" and container "139dd4b689a504a1859418084ec7fe40eb0ec64801428e38269ae1a3ed6a1886" filter
    assertions.go:237: 
                        
	Error Trace:	container_stats_test.go:274
        
			container_stats_test.go:323
        
	Error:		Received unexpected error rpc error: code = Unknown desc = failed to stop container "139dd4b689a504a1859418084ec7fe40eb0ec64801428e38269ae1a3ed6a1886": container not running: unknown
        
    assertions.go:237: 
                        
	Error Trace:	container_stats_test.go:270
        
			container_stats_test.go:323
        
	Error:		Received unexpected error rpc error: code = Unknown desc = failed to set removing state for container "139dd4b689a504a1859418084ec7fe40eb0ec64801428e38269ae1a3ed6a1886": container is still running, to stop first
        
FAIL
./hack/test-utils.sh: line 53: 105027 Terminated              keepalive "sudo PATH=${PATH} ${ROOT}/_output/containerd ${CONTAINERD_FLAGS}" 

from http://jenkins.katacontainers.io/job/kata-containers-agent-ubuntu-18-04-PR-initrd/362/console

    container_stats_test.go:318: Verify container stats for sandbox "db9e5827c204ca1310261749c40d1b2afcafb5653f4d8a2e7fe01e634c477a40" and container "39afc5e17c7bcc39cc638e71c66b8c2d2ea0ec19a683f420b4d18ff734fba67b" filter
    assertions.go:237: 
                        
	Error Trace:	container_stats_test.go:274
        
			container_stats_test.go:323
        
	Error:		Received unexpected error rpc error: code = Unknown desc = failed to stop container "39afc5e17c7bcc39cc638e71c66b8c2d2ea0ec19a683f420b4d18ff734fba67b": container not running: unknown
        
    assertions.go:237: 
                        
	Error Trace:	container_stats_test.go:270
        
			container_stats_test.go:323
        
	Error:		Received unexpected error rpc error: code = Unknown desc = failed to set removing state for container "39afc5e17c7bcc39cc638e71c66b8c2d2ea0ec19a683f420b4d18ff734fba67b": container is still running, to stop first

@sboeuf I'll run the tool and get back with results when got them.

@chavafg
Copy link
Contributor

chavafg commented Mar 22, 2019

@sboeuf @grahamwhaley @devimc
I don't see much difference in footprint and boot time. But here is the report, can you ptal?
metrics_report.pdf

@grahamwhaley
Copy link
Contributor

thx @chavafg - if anything, agent_pr490 looks a touch better - 5% drop in KSM footprint and 1.86% drop in time to workload. so, lgtm!

@sboeuf
Copy link
Author

sboeuf commented Mar 22, 2019

Thanks @chavafg for running this, very useful input!
So now it's just a matter of identifying what is going wrong with initrd and fedora jobs.

@chavafg is the shimv2 being tested in other jobs? Or are those jobs the only ones testing it?

@chavafg
Copy link
Contributor

chavafg commented Mar 22, 2019

@sboeuf shimv2 is also being tested on the other jobs, but on these 2 jobs seem to fail constantly. Not sure if we are introducing some instabilities here... Will try to reproduce locally.

@sboeuf
Copy link
Author

sboeuf commented Mar 24, 2019

Removing do-not-merge label as the metrics results look good!

@sboeuf
Copy link
Author

sboeuf commented Mar 24, 2019

@lifupan could you please take a look at the issue related to the shim-v2 and reported by the CI as highlighted by @chavafg here

@bergwolf
Copy link
Member

Removing do-not-merge label as the metrics results look good!

Hi @sboeuf, I don't think it can be measured with a simple boot metrics as we did before. With the cve-2019-5736 fix, each exec will copy one agent binary into memory. So if we launch a container and do 100 exec there, it can consume 100*(binary size of agent) MB of memory space (and it's not even counted as container memory, cc @egernst as it can also affect sandbox overhead even for runc). I don't think we are that kind of luxury in terms of extra memory in the guest. Runc guys also shared the concern of memory consumption but host memory is more available than the guest so it affects kata more than runc.

Adding back dnm until the concern of memory footprint is addressed.

@gnawux
Copy link
Member

gnawux commented Mar 25, 2019

@bergwolf we need an in-depth discussion on this topic and have a decision on it.

@bergwolf
Copy link
Member

@gnawux @sboeuf As I was trying to summarize the current situation, I found that the runc(libcontainer) has modified its fix to cve-2019-5736 from opencontainers/runc@0a8e411 to opencontainers/runc#1984.

The current fix of protecting the host runc binary works as follows:

  1. Create a readonly bindmount (on container rootfs) to /proc/self/exe, open it readonly and umount it
  2. If above doesn't work, fallback to copying to a memfd
  3. If above doesn't work, fallback to copying to a O_TMPFILE file
  4. If above doesn't work, fallback to copying to a local file with open-unlinking

Since the very first method does not involve any data copy, it consumes almost no additional memory overhead. Unless it falls for some reason, we won't see any visible memory impact for kata containers. For that reason, I'm giving the PR LGTM.

OTOH, if some day we implement readonly container rootfs support, the fix can still have memory footprint impact. A simple workaround is to create the bindmount to a file under /tmp directory if failing to create it under _LIBCONTAINER_STATEDIR. We might want to push such a change to libcontainer before adding readonly container rootfs support in kata containers.

@teawater teawater merged commit 0718096 into kata-containers:master Mar 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants