Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker with kata-containers fail to start #1585

Closed
felixg3 opened this issue Dec 22, 2019 · 26 comments
Closed

docker with kata-containers fail to start #1585

felixg3 opened this issue Dec 22, 2019 · 26 comments

Comments

@felixg3
Copy link

felixg3 commented Dec 22, 2019

Any docker container with kata does not start.
As an example, here the output for docker run hello-world:

docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing: unknown.

Using kata-collect-data, I gathered a significant amount of error information.
There is an issue for the same error rasied at kata-containers/kata-containers#28, however the troubleshooting steps do not help.

Show installed bundles

$ swupd bundle-list Babel NetworkManager NetworkManager-extras R-basic R-datasets R-extras R-rstudio R-stan Remmina Solaar Solaar-gui Sphinx acpica-unix2 akonadi alsa-utils aria2 ark atom baobab bc bcc binutils bison boot-encrypted bootloader bootloader-extras bpftrace c-basic c-basic-legacy calc cheese cloc cloud-api clr-network-troubleshooter cockpit columbiad containers-basic containers-virt cpio cryptography cryptoprocessor-management curl darktable desktop desktop-apps desktop-apps-extras desktop-assets desktop-gnomelibs desktop-kde desktop-kde-apps desktop-kde-libs desktop-locales dev-utils devpkg-LVM2 devpkg-R devpkg-acl devpkg-at-spi2-atk devpkg-at-spi2-core devpkg-atk devpkg-attr devpkg-audit devpkg-base devpkg-bzip2 devpkg-cairo devpkg-cryptsetup devpkg-curl devpkg-dbus devpkg-e2fsprogs devpkg-elfutils devpkg-expat devpkg-fontconfig devpkg-freetype devpkg-fribidi devpkg-fuse devpkg-gcr devpkg-gdk-pixbuf devpkg-glib devpkg-gnu-efi devpkg-gnutls devpkg-gobject-introspection devpkg-graphite devpkg-gtk-doc devpkg-gtk3 devpkg-harfbuzz devpkg-icu4c devpkg-iptables devpkg-json-c devpkg-json-glib devpkg-kmod devpkg-libX11 devpkg-libXau devpkg-libXcursor devpkg-libXdmcp devpkg-libXft devpkg-libXtst devpkg-libcap devpkg-libcap-ng devpkg-libcgroup devpkg-libdrm devpkg-libepoxy devpkg-libevent devpkg-libffi devpkg-libgcrypt devpkg-libgpg-error devpkg-libidn devpkg-libidn2 devpkg-libjpeg-turbo devpkg-libmicrohttpd devpkg-libmnl devpkg-libnetfilter_conntrack devpkg-libnfnetlink devpkg-libnftnl devpkg-libpng devpkg-libpsl devpkg-libpthread-stubs devpkg-libseccomp devpkg-libsoup devpkg-libtasn1 devpkg-libtirpc devpkg-libunwind devpkg-libusb devpkg-libxcb devpkg-libxkbcommon devpkg-libxml2 devpkg-llvm devpkg-lz4 devpkg-mesa devpkg-ncurses devpkg-nettle devpkg-openssl devpkg-p11-kit devpkg-pango devpkg-pciutils devpkg-pcre devpkg-pcre2 devpkg-pixman devpkg-popt devpkg-readline devpkg-shared-mime-info devpkg-sqlite-autoconf devpkg-systemd devpkg-talloc devpkg-util-linux devpkg-util-macros devpkg-wayland devpkg-wayland-protocols devpkg-webkitgtk devpkg-xapian-core devpkg-xcb-proto devpkg-xorgproto devpkg-xtrans devpkg-xz devpkg-zlib diffutils digikam dnf docbook-utils docker-compose docutils dolphin dosfstools doxygen dpdk editors emacs emacs-x11 eog ethtool evince evolution extremetuxracer feh file file-roller findutils firefox firmware-update flatpak flex fonts-basic fuse fwupdate games gdb geany geary gedit ghostscript gimp git gjs glibc-locale gnome-base-libs gnome-boxes gnome-calculator gnome-characters gnome-clocks gnome-color-manager gnome-disk-utility gnome-font-viewer gnome-logs gnome-music gnome-photos gnome-screenshot gnome-system-monitor gnome-todo gnome-weather go-basic gpgme gphoto2 graphviz gstreamer gtk-vnc gvim gwenview gzip hardware-bluetooth hardware-printing hardware-uefi hardware-wifi hexchat htop icdiff inkscape inotify-tools intltool iotop iperf iproute2 iptables irssi iwd java-runtime joe jq jupyter kamera kate kbd kcalc kde-frameworks5 kdiff3 keepassxc kernel-install kernel-native kleopatra konqueror konsole kontact konversation krita ksysguard kvm-host less lib-imageformat lib-opengl lib-openssl lib-qt5webengine lib-samba libX11client libglib libstdcpp libva-utils libxslt linux-dev linux-firmware linux-firmware-extras linux-firmware-wifi linux-tools llvm lm-sensors mail-utils make maker-basic man-pages mariadb minetest minetestserver minicom mpv mutt nasm nautilus neomutt neovim net-tools network-basic nfs-utils nim nodejs-basic notmuch okular openblas openldap openssh-client openssh-server openssl openvswitch os-core os-core-dev os-core-legacy os-core-plus os-core-search os-core-update os-core-webproxy p11-kit package-utils pandoc parallel parted patch performance-tools perl-basic perl-basic-dev perl-extras pidgin pmdk polkit powertop procps-ng productivity pulseaudio pygobject python-data-science python-extras python2-basic python3-basic python3-tcl qbittorrent qemu-guest-additions qt-basic qt-core quassel redshift rsync rust-basic rxvt-unicode samba sddm seahorse shells smartmontools spectacle spice-gtk storage-utils strace sudo supertuxkart suricata syndication sysadmin-basic sysadmin-remote syslinux sysstat tcl-basic telemetrics texinfo thermal_daemon thunderbird tigervnc tmux totem tzdata unzip user-basic valgrind vim vinagre virt-manager virt-manager-gui virt-viewer vlc vnc-server webkitgtk weechat wget which wine wpa_supplicant x11-server x11-tools x11vnc xfsprogs xterm xz yakuake yasm zenity znc zsh zstd

Show kata-collect-data.sh details

Meta details

Running kata-collect-data.sh version 1.9.1 (commit ) at 2019-12-22.13:16:07.150966092+0100.


Runtime is /usr/bin/kata-runtime.

kata-env

Output of "/usr/bin/kata-runtime kata-env":

[Meta]
  Version = "1.0.23"

[Runtime]
  Debug = false
  Trace = false
  DisableGuestSeccomp = true
  DisableNewNetNs = false
  SandboxCgroupOnly = false
  Path = "/usr/bin/kata-runtime"
  [Runtime.Version]
    Semver = "1.9.1"
    Commit = ""
    OCI = "1.0.1-dev"
  [Runtime.Config]
    Path = "/usr/share/defaults/kata-containers/configuration-qemu.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.11.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/kata-qemu-lite-system-x86_64"
  BlockDeviceDriver = "virtio-scsi"
  EntropySource = "/dev/urandom"
  Msize9p = 8192
  MemorySlots = 10
  Debug = false
  UseVSock = false
  SharedFS = "virtio-9p"

[Image]
  Path = "/usr/share/kata-containers/kata-containers-image_clearlinux_1.9.1_agent_d4bbd8007f.img"

[Kernel]
  Path = "/usr/share/kata-containers/vmlinuz-4.19.87-88.container"
  Parameters = "systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket"

[Initrd]
  Path = ""

[Proxy]
  Type = "kataProxy"
  Version = "kata-proxy version 1.9.1"
  Path = "/usr/libexec/kata-containers/kata-proxy"
  Debug = false

[Shim]
  Type = "kataShim"
  Version = "kata-shim version 1.9.1"
  Path = "/usr/libexec/kata-containers/kata-shim"
  Debug = false

[Agent]
  Type = "kata"
  Debug = false
  Trace = false
  TraceMode = ""
  TraceType = ""

[Host]
  Kernel = "5.4.5-882.native"
  Architecture = "amd64"
  VMContainerCapable = true
  SupportVSocks = true
  [Host.Distro]
    Name = "Clear Linux OS"
    Version = "31960"
  [Host.CPU]
    Vendor = "GenuineIntel"
    Model = "Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz"

[Netmon]
  Version = "kata-netmon version 1.9.1"
  Path = "/usr/libexec/kata-containers/kata-netmon"
  Debug = false
  Enable = false

Runtime config files

Runtime default config files

/etc/kata-containers/configuration.toml
/usr/share/defaults/kata-containers/configuration.toml

Runtime config file contents

Config file /etc/kata-containers/configuration.toml not found
Output of "cat "/usr/share/defaults/kata-containers/configuration.toml"":

# Copyright (c) 2017-2019 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#

# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "cli/config/configuration-qemu.toml.in"
# XXX: Project:
# XXX:   Name: Kata Containers
# XXX:   Type: kata

[hypervisor.qemu]
path = "/usr/bin/kata-qemu-lite-system-x86_64"
kernel = "/usr/share/kata-containers/vmlinuz.container"
image = "/usr/share/kata-containers/kata-containers.img"
machine_type = "pc"

# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per SB/VM:
# unspecified or 0                --> will be set to 1
# < 0                             --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores
default_vcpus = 1

# Default maximum number of vCPUs per SB/VM:
# unspecified or == 0             --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when
# the actual number of physical cores is greater than it.
# WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU
# the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs
# can be added to a SB/VM, but the memory footprint will be big. Another example, with
# `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of
# vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable,
# unless you know what are you doing.
default_maxvcpus = 0

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per SB/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for SB/VM.
# If unspecified then it will be set 2048 MiB.
default_memory = 2048
#
# Default memory slots per SB/VM.
# If unspecified then it will be set 10.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = 10

# The size in MiB will be plused to max memory of hypervisor.
# It is the memory address space for the NVDIMM devie.
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's 
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons. 
# This flag prevents the block device from being passed to the hypervisor, 
# 9pfs is used instead to pass the rootfs.
disable_block_device_use = false

# Shared file system type:
#   - virtio-9p (default)
#   - virtio-fs
shared_fs = "virtio-9p"

# Path to vhost-user-fs daemon.
virtio_fs_daemon = "/usr/bin/virtiofsd"

# Default size of DAX cache in MiB
virtio_fs_cache_size = 1024

# Extra args for virtiofsd daemon
#
# Format example:
#   ["-o", "arg1=xxx,arg2", "-o", "hello world", "--arg3=yyy"]
#
# see `virtiofsd -h` for possible options.
virtio_fs_extra_args = []

# Cache mode:
#
#  - none
#    Metadata, data, and pathname lookup are not cached in guest. They are
#    always fetched from host and any changes are immediately pushed to host.
#
#  - auto
#    Metadata and pathname lookup cache expires after a configured amount of
#    time (default is 1 second). Data is cached while the file is open (close
#    to open consistency).
#
#  - always
#    Metadata, data, and pathname lookup are cached in guest and never expire.
virtio_fs_cache = "always"

# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is virtio-scsi, virtio-blk
# or nvdimm.
block_device_driver = "virtio-scsi"

# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true

# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true

# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true

# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
#
enable_iothreads = false

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically 
# result in memory pre allocation
#enable_hugepages = true

# Enable file based guest memory support. The default is an empty string which
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = ""

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. This extra output is added
# to the proxy logs, but only when proxy debug is also enabled.
# 
# Default false
#enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
# 
#disable_nesting_checks = true

# This is the msize used for 9p shares. It is the number of bytes 
# used for 9p packet payload.
#msize_9p = 8192

# If true and vsocks are supported, use vsocks to communicate directly
# with the agent and no proxy is started, otherwise use unix
# sockets and start a proxy to communicate with the agent.
# Default false
#use_vsock = true

# VFIO devices are hotplugged on a bridge by default. 
# Enable hotplugging on root bus. This may be required for devices with
# a large PCI bar, as this is a current limitation with hotplugging on 
# a bridge. This value is valid for "pc" machine type.
# Default false
#hotplug_vfio_on_root_bus = true

# If host doesn't support vhost_net, set to true. Thus we won't create vhost fds for nics.
# Default false
#disable_vhost_net = true
#
# Default entropy source.
# The path to a host source of entropy (including a real hardware RNG)
# /dev/urandom and /dev/random are two main options.
# Be aware that /dev/random is a blocking source of entropy.  If the host
# runs out of entropy, the VMs boot time will increase leading to get startup
# timeouts.
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "/dev/urandom"

# Path to OCI hook binaries in the *guest rootfs*.
# This does not affect host-side hooks which must instead be added to
# the OCI spec passed to the runtime.
#
# You can create a rootfs with hooks by customizing the osbuilder scripts:
# https://github.com/kata-containers/osbuilder
#
# Hooks must be stored in a subdirectory of guest_hook_path according to their
# hook type, i.e. "guest_hook_path/{prestart,postart,poststop}".
# The agent will scan these directories for executable files and add them, in
# lexicographical order, to the lifecycle of the guest container.
# Hooks are executed in the runtime namespace of the guest. See the official documentation:
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered will scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"

[factory]
# VM templating support. Once enabled, new VMs are created from template
# using vm cloning. They will share the same initial kernel, initramfs and
# agent memory by mapping it readonly. It helps speeding up new container
# creation and saves a lot of memory if there are many kata containers running
# on the same host.
#
# When disabled, new VMs are created from scratch.
#
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true

# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"

# The number of caches of VMCache:
# unspecified or == 0   --> VMCache is disabled
# > 0                   --> will be set to the specified number
#
# VMCache is a function that creates VMs as caches before using it.
# It helps speed up new container creation.
# The function consists of a server and some clients communicating
# through Unix socket.  The protocol is gRPC in protocols/cache/cache.proto.
# The VMCache server will create some VMs and cache them by factory cache.
# It will convert the VM to gRPC format and transport it when gets
# requestion from clients.
# Factory grpccache is the VMCache client.  It will request gRPC format
# VM and convert it back to a VM.  If VMCache function is enabled,
# kata-runtime will request VM from factory grpccache when it creates
# a new sandbox.
#
# Default 0
#vm_cache_number = 0

# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"

[proxy.kata]
path = "/usr/libexec/kata-containers/kata-proxy"

# If enabled, proxy messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[shim.kata]
path = "/usr/libexec/kata-containers/kata-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
#enable_debug = true

# If enabled, the shim will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
#
# Note: By default, the shim runs in a separate network namespace. Therefore,
# to allow it to send trace details to the Jaeger agent running on the host,
# it is necessary to set 'disable_new_netns=true' so that it runs in the host
# network namespace.
#
# (default: disabled)
#enable_tracing = true

[agent.kata]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true

# Enable agent tracing.
#
# If enabled, the default trace mode is "dynamic" and the
# default trace type is "isolated". The trace mode and type are set
# explicity with the `trace_type=` and `trace_mode=` options.
#
# Notes:
#
# - Tracing is ONLY enabled when `enable_tracing` is set: explicitly
#   setting `trace_mode=` and/or `trace_type=` without setting `enable_tracing`
#   will NOT activate agent tracing.
#
# - See https://github.com/kata-containers/agent/blob/master/TRACING.md for
#   full details.
#
# (default: disabled)
#enable_tracing = true
#
#trace_mode = "dynamic"
#trace_type = "isolated"

# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
# The following example can be used to load two kernel modules with parameters
#  - kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915 enable_ppgtt=0"]
# The first word is considered as the module name and the rest as its parameters.
# Container will not be started when:
#  * A kernel module is specified and the modprobe command is not installed in the guest
#    or it fails loading the module.
#  * The module is not available in the guest or it doesn't met the guest kernel
#    requirements, like architecture and version.
#
kernel_modules=[]


[netmon]
# If enabled, the network monitoring process gets started when the
# sandbox is created. This allows for the detection of some additional
# network being added to the existing network namespace, after the
# sandbox has been created.
# (default: disabled)
#enable_netmon = true

# Specify the path to the netmon binary.
path = "/usr/libexec/kata-containers/kata-netmon"

# If enabled, netmon messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
#   - bridged (Deprecated)
#     Uses a linux bridge to interconnect the container interface to
#     the VM. Works for most cases except macvlan and ipvlan.
#     ***NOTE: This feature has been deprecated with plans to remove this
#     feature in the future. Please use other network models listed below.
#
#   - macvtap
#     Used when the Container network interface can be bridged using
#     macvtap.
#
#   - none
#     Used when customize network. Only creates a tap device. No veth pair.
#
#   - tcfilter
#     Uses tc filter rules to redirect traffic from the network interface
#     provided by plugin to a tap interface connected to the VM.
#
internetworking_model="tcfilter"

# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=true

# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true

# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
# `disable_new_netns` conflicts with `enable_netmon`
# `disable_new_netns` conflicts with `internetworking_model=bridged` and `internetworking_model=macvtap`. It works only
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# If you are using docker, `disable_new_netns` only works with `docker run --net=none`
# (default: false)
#disable_new_netns = true

# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
# The sandbox cgroup is not constrained by the runtime
# The runtime caller is free to restrict or collect cgroup stats of the overall Kata sandbox.
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# See: https://godoc.org/github.com/kata-containers/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=false

# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# They may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# 1. "newstore": new persist storage driver which breaks backward compatibility,
#				expected to move out of experimental in 2.0.0.
# (default: [])
experimental=[]

KSM throttler

version

Output of " --version":

/usr/bin/kata-collect-data.sh: Zeile 178: --version: Kommando nicht gefunden.

systemd service

Image details

---
osbuilder:
  url: "https://github.com/kata-containers/osbuilder"
  version: "unknown"
rootfs-creation-time: "2019-11-06T04:45:14.191056602+0000Z"
description: "osbuilder rootfs"
file-format-version: "0.0.2"
architecture: "x86_64"
base-distro:
  name: "Clear"
  version: "31470"
  packages:
    default:
      - "chrony"
      - "iptables-bin"
      - "kmod-bin"
      - "libudev0-shim"
      - "systemd"
      - "util-linux-bin"
    extra:

agent:
  url: "https://github.com/kata-containers/agent"
  name: "kata-agent"
  version: "1.9.1-d4bbd8007fddd06616f81d1069126ab28bd8c9b5"
  agent-is-init-daemon: "no"

Initrd details

No initrd


Logfiles

Runtime logs

Recent runtime problems found in system journal:

time="2019-12-22T13:11:35.233210504+01:00" level=error msg="Unable to determine if running rootless" arch=amd64 command=create container=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c error="Failed to parse uid map file /proc/self/uid_map" name=kata-runtime pid=5428 source=rootless
time="2019-12-22T13:11:35.773676356+01:00" level=warning msg="load sandbox devices failed" arch=amd64 command=create container=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c error="open /run/vc/sbs/664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c/devices.json: no such file or directory" name=kata-runtime pid=5428 sandbox=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c sandboxid=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c source=virtcontainers subsystem=sandbox
time="2019-12-22T13:11:37.02259763+01:00" level=info msg="sanner return error: read unix @->/run/vc/vm/664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c/qmp.sock: use of closed network connection" arch=amd64 command=create container=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c name=kata-runtime pid=5428 source=virtcontainers subsystem=qmp
time="2019-12-22T13:11:57.116639814+01:00" level=info msg="sanner return error: read unix @->/run/vc/vm/664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c/qmp.sock: read: connection reset by peer" arch=amd64 command=create container=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c name=kata-runtime pid=5428 source=virtcontainers subsystem=qmp
time="2019-12-22T13:11:57.129197841+01:00" level=warning msg="sandox cgroups path is empty" arch=amd64 command=create container=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c name=kata-runtime pid=5428 sandbox=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c source=virtcontainers subsystem=sandbox
time="2019-12-22T13:11:57.129544372+01:00" level=warning msg="failed to cleanup netns" arch=amd64 command=create container=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c error="failed to get netns /var/run/netns/cni-4e3d8e6b-5eb2-fcb1-c1e3-24428029a2c8: failed to Statfs \"/var/run/netns/cni-4e3d8e6b-5eb2-fcb1-c1e3-24428029a2c8\": no such file or directory" name=kata-runtime path=/var/run/netns/cni-4e3d8e6b-5eb2-fcb1-c1e3-24428029a2c8 pid=5428 source=katautils
time="2019-12-22T13:11:57.12958904+01:00" level=error msg="Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing" arch=amd64 command=create container=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c name=kata-runtime pid=5428 source=runtime
time="2019-12-22T13:12:36.16251942+01:00" level=error msg="Unable to determine if running rootless" arch=amd64 command=create container=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 error="Failed to parse uid map file /proc/self/uid_map" name=kata-runtime pid=5688 source=rootless
time="2019-12-22T13:12:36.323698633+01:00" level=warning msg="load sandbox devices failed" arch=amd64 command=create container=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 error="open /run/vc/sbs/49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5/devices.json: no such file or directory" name=kata-runtime pid=5688 sandbox=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 sandboxid=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 source=virtcontainers subsystem=sandbox
time="2019-12-22T13:12:36.387680916+01:00" level=info msg="sanner return error: read unix @->/run/vc/vm/49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5/qmp.sock: use of closed network connection" arch=amd64 command=create container=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 name=kata-runtime pid=5688 source=virtcontainers subsystem=qmp
time="2019-12-22T13:12:56.396620592+01:00" level=info msg="sanner return error: read unix @->/run/vc/vm/49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5/qmp.sock: read: connection reset by peer" arch=amd64 command=create container=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 name=kata-runtime pid=5688 source=virtcontainers subsystem=qmp
time="2019-12-22T13:12:56.417214574+01:00" level=warning msg="sandox cgroups path is empty" arch=amd64 command=create container=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 name=kata-runtime pid=5688 sandbox=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 source=virtcontainers subsystem=sandbox
time="2019-12-22T13:12:56.417613861+01:00" level=warning msg="failed to cleanup netns" arch=amd64 command=create container=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 error="failed to get netns /var/run/netns/cni-b49eebd3-d822-3199-b992-0d845b89772e: failed to Statfs \"/var/run/netns/cni-b49eebd3-d822-3199-b992-0d845b89772e\": no such file or directory" name=kata-runtime path=/var/run/netns/cni-b49eebd3-d822-3199-b992-0d845b89772e pid=5688 source=katautils
time="2019-12-22T13:12:56.417667862+01:00" level=error msg="Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing" arch=amd64 command=create container=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 name=kata-runtime pid=5688 source=runtime

Proxy logs

Recent proxy problems found in system journal:

time="2019-12-22T13:11:57.106695838+01:00" level=fatal msg="channel error" error="accept unix /run/vc/sbs/664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c/proxy.sock: use of closed network connection" name=kata-proxy pid=5478 sandbox=664c785ab9b76a284e7855530b28210a4d70ad0e26ed5b2c6dc956c34135369c source=proxy
time="2019-12-22T13:12:56.389697923+01:00" level=fatal msg="failed to handle exit signal" error="close unix @->/run/vc/vm/49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5/kata.sock: use of closed network connection" name=kata-proxy pid=5724 sandbox=49e38af651913a68f58082b9906c3dfed346e7aeb2a4fc460b8a22e1eda17cd5 source=proxy

Shim logs

No recent shim problems found in system journal.

Throttler logs

No recent throttler problems found in system journal.


Container manager details

Have docker

Docker

Output of "docker version":

Client:
 Version:           19.03.2
 API version:       1.40
 Go version:        go1.13.5
 Git commit:        6a30dfca03664a0b6bf0646a7d389ee7d0318e6e
 Built:             Tue Dec 10 19:31:02 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.2
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.5
  Git commit:       6a30dfca03664a0b6bf0646a7d389ee7d0318e6e
  Built:            Tue Dec 10 19:31:31 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.0
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:        

Output of "docker info":

Client:
 Debug Mode: false

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 9
 Server Version: 19.03.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: kata-runtime runc
 Default Runtime: kata-runtime
 Init Binary: docker-init
 containerd version: 
 runc version: N/A
 init version: 
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.4.5-882.native
 Operating System: Clear Linux OS
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 15.61GiB
 Name: clr
 ID: L6OF:WUVF:RWFD:NZAM:A4FW:ZTL5:F3UY:NRF3:SW4J:BYOG:ITSY:FBE7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Output of "systemctl show docker":

Type=notify
Restart=always
NotifyAccess=main
RestartUSec=2s
TimeoutStartUSec=infinity
TimeoutStopUSec=infinity
TimeoutAbortUSec=infinity
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=4376
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success
UID=[not set]
GID=[not set]
NRestarts=0
OOMPolicy=continue
ExecMainStartTimestamp=Sun 2019-12-22 13:08:26 CET
ExecMainStartTimestampMonotonic=678087020
ExecMainExitTimestampMonotonic=0
ExecMainPID=4376
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/usr/bin/dockerd ; argv[]=/usr/bin/dockerd $DOCKER_EXTRA_RUNTIMES $DOCKER_DEFAULT_RUNTIME $DOCKER_EXTRA_OPTS --storage-driver=overlay2 ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartEx={ path=/usr/bin/dockerd ; argv[]=/usr/bin/dockerd $DOCKER_EXTRA_RUNTIMES $DOCKER_DEFAULT_RUNTIME $DOCKER_EXTRA_OPTS --storage-driver=overlay2 ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecReload={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecReloadEx={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroup=/system.slice/docker.service
MemoryCurrent=486850560
CPUUsageNSec=[not set]
EffectiveCPUs=
EffectiveMemoryNodes=
TasksCurrent=12
IPIngressBytes=[no data]
IPIngressPackets=[no data]
IPEgressBytes=[no data]
IPEgressPackets=[no data]
IOReadBytes=18446744073709551615
IOReadOperations=18446744073709551615
IOWriteBytes=18446744073709551615
IOWriteOperations=18446744073709551615
Delegate=yes
DelegateControllers=cpu cpuacct cpuset io blkio memory devices pids bpf-firewall bpf-devices
CPUAccounting=no
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
CPUQuotaPeriodUSec=infinity
AllowedCPUs=
AllowedMemoryNodes=
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=infinity
IPAccounting=no
Environment=[unprintable] [unprintable]
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=infinity
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=infinity
LimitNOFILESoft=infinity
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=infinity
LimitNPROCSoft=infinity
LimitMEMLOCK=65536
LimitMEMLOCKSoft=65536
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=63555
LimitSIGPENDINGSoft=63555
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=0
Nice=0
IOSchedulingClass=0
IOSchedulingPriority=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinity=
NUMAPolicy=n/a
NUMAMask=
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardInputData=
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read
AmbientCapabilities=
DynamicUser=no
RemoveIPC=no
MountFlags=
PrivateTmp=no
PrivateDevices=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=0
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
TimeoutCleanUSec=infinity
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
ProtectHostname=no
KillMode=process
KillSignal=15
RestartKillSignal=15
FinalKillSignal=9
SendSIGKILL=yes
SendSIGHUP=no
WatchdogSignal=6
Id=docker.service
Names=docker.service
Requires=docker.socket sysinit.target system.slice
Wants=network-online.target
BindsTo=containerd.service
WantedBy=multi-user.target
ConsistsOf=docker.socket
Conflicts=shutdown.target
Before=shutdown.target multi-user.target
After=basic.target containerd.service sysinit.target docker.socket systemd-journald.socket firewalld.service network-online.target system.slice
TriggeredBy=docker.socket
Documentation=https://docs.docker.com
Description=Docker Application Container Engine
LoadState=loaded
ActiveState=active
SubState=running
FragmentPath=/usr/lib/systemd/system/docker.service
DropInPaths=/etc/systemd/system/docker.service.d/50-runtime.conf /usr/lib/systemd/system/docker.service.d/clearlinux.conf
UnitFileState=enabled
UnitFilePreset=disabled
StateChangeTimestamp=Sun 2019-12-22 13:08:27 CET
StateChangeTimestampMonotonic=678730681
InactiveExitTimestamp=Sun 2019-12-22 13:08:26 CET
InactiveExitTimestampMonotonic=678087305
ActiveEnterTimestamp=Sun 2019-12-22 13:08:27 CET
ActiveEnterTimestampMonotonic=678730681
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=yes
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Sun 2019-12-22 13:08:26 CET
ConditionTimestampMonotonic=678086170
AssertTimestamp=Sun 2019-12-22 13:08:26 CET
AssertTimestampMonotonic=678086173
Transient=no
Perpetual=no
StartLimitIntervalUSec=1min
StartLimitBurst=3
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=31fc6e2891d6436d872cbb8fc9cbc1d3
CollectMode=inactive

No kubectl
Have crio

crio

Output of "crio --version":

crio version 1.16.1

Output of "systemctl show crio":

Type=notify
Restart=on-abnormal
NotifyAccess=main
RestartUSec=10s
TimeoutStartUSec=infinity
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=0
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success
UID=[not set]
GID=[not set]
NRestarts=0
OOMPolicy=stop
ExecMainStartTimestampMonotonic=0
ExecMainExitTimestampMonotonic=0
ExecMainPID=0
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/usr/bin/crio ; argv[]=/usr/bin/crio $CRIO_CONFIG_OPTIONS $CRIO_RUNTIME_OPTIONS $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartEx={ path=/usr/bin/crio ; argv[]=/usr/bin/crio $CRIO_CONFIG_OPTIONS $CRIO_RUNTIME_OPTIONS $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecReload={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecReloadEx={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
MemoryCurrent=[not set]
CPUUsageNSec=[not set]
EffectiveCPUs=
EffectiveMemoryNodes=
TasksCurrent=[not set]
IPIngressBytes=[no data]
IPIngressPackets=[no data]
IPEgressBytes=[no data]
IPEgressPackets=[no data]
IOReadBytes=18446744073709551615
IOReadOperations=18446744073709551615
IOWriteBytes=18446744073709551615
IOWriteOperations=18446744073709551615
Delegate=no
CPUAccounting=no
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
CPUQuotaPeriodUSec=infinity
AllowedCPUs=
AllowedMemoryNodes=
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=infinity
IPAccounting=no
Environment=GOTRACEBACK=crash
EnvironmentFiles=/etc/sysconfig/crio (ignore_errors=yes)
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=infinity
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=1048576
LimitNOFILESoft=1048576
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=1048576
LimitNPROCSoft=1048576
LimitMEMLOCK=65536
LimitMEMLOCKSoft=65536
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=63555
LimitSIGPENDINGSoft=63555
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=-999
Nice=0
IOSchedulingClass=0
IOSchedulingPriority=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinity=
NUMAPolicy=n/a
NUMAMask=
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardInputData=
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read
AmbientCapabilities=
DynamicUser=no
RemoveIPC=no
MountFlags=
PrivateTmp=no
PrivateDevices=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=0
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
TimeoutCleanUSec=infinity
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
ProtectHostname=no
KillMode=control-group
KillSignal=15
RestartKillSignal=15
FinalKillSignal=9
SendSIGKILL=yes
SendSIGHUP=no
WatchdogSignal=6
Id=crio.service
Names=cri-o.service crio.service
Requires=crio-wipe.service system.slice sysinit.target
Wants=crio-set-runtime.service network-online.target
Conflicts=shutdown.target
Before=shutdown.target
After=basic.target crio-wipe.service system.slice network-online.target systemd-journald.socket crio-set-runtime.service sysinit.target
Documentation=https://github.com/cri-o/cri-o
Description=Container Runtime Interface for OCI (CRI-O)
LoadState=loaded
ActiveState=inactive
SubState=dead
FragmentPath=/usr/lib/systemd/system/crio.service
DropInPaths=/usr/lib/systemd/system/crio.service.d/crio-clearlinux.conf
UnitFileState=disabled
UnitFilePreset=disabled
StateChangeTimestampMonotonic=0
InactiveExitTimestampMonotonic=0
ActiveEnterTimestampMonotonic=0
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=yes
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=no
AssertResult=no
ConditionTimestampMonotonic=0
AssertTimestampMonotonic=0
Transient=no
Perpetual=no
StartLimitIntervalUSec=2min
StartLimitBurst=6
StartLimitAction=none
FailureAction=none
SuccessAction=none
CollectMode=inactive

Output of "cat /etc/crio/crio.conf":

cat: /etc/crio/crio.conf: Datei oder Verzeichnis nicht gefunden

Have containerd

containerd

Output of "containerd --version":

containerd github.com/containerd/containerd 1.3.0 

Output of "systemctl show containerd":

Type=simple
Restart=always
NotifyAccess=none
RestartUSec=100ms
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=4375
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success
UID=[not set]
GID=[not set]
NRestarts=0
OOMPolicy=continue
ExecMainStartTimestamp=Sun 2019-12-22 13:08:26 CET
ExecMainStartTimestampMonotonic=678085089
ExecMainExitTimestampMonotonic=0
ExecMainPID=4375
ExecMainCode=0
ExecMainStatus=0
ExecStartPre={ path=/sbin/modprobe ; argv[]=/sbin/modprobe overlay ; ignore_errors=yes ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartPreEx={ path=/sbin/modprobe ; argv[]=/sbin/modprobe overlay ; flags=ignore-failure ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStart={ path=/usr/bin/containerd ; argv[]=/usr/bin/containerd ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStartEx={ path=/usr/bin/containerd ; argv[]=/usr/bin/containerd ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroup=/system.slice/containerd.service
MemoryCurrent=128610304
CPUUsageNSec=[not set]
EffectiveCPUs=
EffectiveMemoryNodes=
TasksCurrent=11
IPIngressBytes=[no data]
IPIngressPackets=[no data]
IPEgressBytes=[no data]
IPEgressPackets=[no data]
IOReadBytes=18446744073709551615
IOReadOperations=18446744073709551615
IOWriteBytes=18446744073709551615
IOWriteOperations=18446744073709551615
Delegate=yes
DelegateControllers=cpu cpuacct cpuset io blkio memory devices pids bpf-firewall bpf-devices
CPUAccounting=no
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
CPUQuotaPeriodUSec=infinity
AllowedCPUs=
AllowedMemoryNodes=
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=infinity
IPAccounting=no
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=infinity
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=1048576
LimitNOFILESoft=1048576
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=infinity
LimitNPROCSoft=infinity
LimitMEMLOCK=65536
LimitMEMLOCKSoft=65536
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=63555
LimitSIGPENDINGSoft=63555
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=0
Nice=0
IOSchedulingClass=0
IOSchedulingPriority=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
CPUAffinity=
NUMAPolicy=n/a
NUMAMask=
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardInputData=
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read
AmbientCapabilities=
DynamicUser=no
RemoveIPC=no
MountFlags=
PrivateTmp=no
PrivateDevices=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
PrivateMounts=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=0
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
TimeoutCleanUSec=infinity
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
ProtectHostname=no
KillMode=process
KillSignal=15
RestartKillSignal=15
FinalKillSignal=9
SendSIGKILL=yes
SendSIGHUP=no
WatchdogSignal=6
Id=containerd.service
Names=containerd.service
Requires=sysinit.target system.slice
BoundBy=docker.service
Conflicts=shutdown.target
Before=docker.service shutdown.target
After=systemd-journald.socket network.target sysinit.target system.slice basic.target
Documentation=https://containerd.io
Description=containerd container runtime
LoadState=loaded
ActiveState=active
SubState=running
FragmentPath=/usr/lib/systemd/system/containerd.service
UnitFileState=disabled
UnitFilePreset=disabled
StateChangeTimestamp=Sun 2019-12-22 13:08:26 CET
StateChangeTimestampMonotonic=678085159
InactiveExitTimestamp=Sun 2019-12-22 13:08:26 CET
InactiveExitTimestampMonotonic=678077587
ActiveEnterTimestamp=Sun 2019-12-22 13:08:26 CET
ActiveEnterTimestampMonotonic=678085159
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Sun 2019-12-22 13:08:26 CET
ConditionTimestampMonotonic=678076546
AssertTimestamp=Sun 2019-12-22 13:08:26 CET
AssertTimestampMonotonic=678076547
Transient=no
Perpetual=no
StartLimitIntervalUSec=10s
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=221a21293dcb4e5981db7f911c95f816
CollectMode=inactive

Output of "cat /etc/containerd/config.toml":

cat: /etc/containerd/config.toml: Datei oder Verzeichnis nicht gefunden

Packages

No dpkg
Have rpm
Output of "rpm -qa|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-ksm-throttler|kata-containers-image|linux-container|qemu-)"":



@eadamsintel
Copy link

I am having similar issues with Clear Linux 31960 with a similar error. That error message is a pretty generic one and does not reveal a lot about what the problem might be.

$ docker run -it --runtime=kata-runtime clearlinux bash
docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing: unknown.

I have another machine with Clear Linux I just updated to 31960 and it launched the containers just fine with kata. I can't really explain why though. I checked the output of kata-runtime kata-env and everything matches between the machines. I even checked the sha256 values of every binary used and they exactly match. Docker appears to be configured the same. The only difference is one machine requires setting the proxy and the other does not. The machine with the proxy is the one that is working.

As a workaround you can install Kata using kata-deploy. It installs Kata into /opt/kata and installs all the runtimes to support Qemu, Firecracker, Virtio-FS, etc.. and it updates your /etc/docker/daemon.json to add in the runtimes. Your docker command would be docker run --runtime=kata-qemu ...

@ahkok
Copy link
Contributor

ahkok commented Jan 8, 2020

$ docker run -it --runtime=kata-runtime clearlinux bash
docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing: unknown.

reproduced? I don't know what to do about this though, or who can take on this issue...

@bryteise
Copy link
Member

bryteise commented Jan 8, 2020

Hrm I haven't been able to reproduce this on 32020 so far (I did need a little cleanup, having to remove /etc/kata-containers/configuration.toml that I got from somewhere).

@mythi
Copy link

mythi commented Jan 8, 2020

Hrm I haven't been able to reproduce this on 32020 so far

cannot reproduce either.

@eadamsintel
Copy link

I can reproduce on 32020 on two different systems but not on a third. The only difference I can see between the systems is that the one where I use a proxy and configure a proxy it works just fine. The other two systems where it fails (one is my personal system at home) I don't have a proxy configured and don't need to configure a proxy on those systems. Using runc on those failing systems works just fine. I turned on full debug. The attached file has the full debug output.
katafail.txt

@bryteise
Copy link
Member

bryteise commented Jan 8, 2020

Jan 08 13:41:42 kata2 kata-runtime[11652]: time="2020-01-08T13:41:42.457163281-08:00" level=info msg="Stopping Sandbox" arch=amd64 command=create container=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 name=kata-runtime pid=11652 source=virtcontainers subsystem=qemu
Jan 08 13:41:42 kata2 kata-proxy[11692]: time="2020-01-08T13:41:42.457332931-08:00" level=fatal msg="channel error" error="accept unix /run/vc/sbs/0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6/proxy.sock: use of closed network connection" name=kata-proxy pid=11692 sandbox=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 source=proxy
Jan 08 13:41:42 kata2 kata-proxy[11692]: time="2020-01-08T13:41:42.457360207-08:00" level=fatal msg="failed to handle exit signal" error="close unix @->/run/vc/vm/0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6/kata.sock: use of closed network connection" name=kata-proxy pid=11692 sandbox=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 source=proxy
Jan 08 13:41:42 kata2 kata-runtime[11652]: time="2020-01-08T13:41:42.457473185-08:00" level=info msg="{\"QMP\": {\"version\": {\"qemu\": {\"micro\": 0, \"minor\": 11, \"major\": 2}, \"package\": \"\"}, \"capabilities\": []}}" arch=amd64 command=create container=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 name=kata-runtime pid=11652 source=virtcontainers subsystem=qmp
Jan 08 13:41:42 kata2 kata-runtime[11652]: time="2020-01-08T13:41:42.457573055-08:00" level=info msg="{\"execute\":\"qmp_capabilities\"}" arch=amd64 command=create container=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 name=kata-runtime pid=11652 source=virtcontainers subsystem=qmp
Jan 08 13:41:42 kata2 kata-runtime[11652]: time="2020-01-08T13:41:42.457985679-08:00" level=info msg="{\"return\": {}}" arch=amd64 command=create container=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 name=kata-runtime pid=11652 source=virtcontainers subsystem=qmp
Jan 08 13:41:42 kata2 kata-runtime[11652]: time="2020-01-08T13:41:42.458050567-08:00" level=info msg="{\"execute\":\"quit\"}" arch=amd64 command=create container=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 name=kata-runtime pid=11652 source=virtcontainers subsystem=qmp
Jan 08 13:41:42 kata2 kata-runtime[11652]: time="2020-01-08T13:41:42.458297019-08:00" level=info msg="{\"return\": {}}" arch=amd64 command=create container=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 name=kata-runtime pid=11652 source=virtcontainers subsystem=qmp
Jan 08 13:41:42 kata2 kata-runtime[11652]: time="2020-01-08T13:41:42.458327472-08:00" level=info msg="{\"timestamp\": {\"seconds\": 1578519702, \"microseconds\": 458290}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}" arch=amd64 command=create container=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 name=kata-runtime pid=11652 source=virtcontainers subsystem=qmp
Jan 08 13:41:42 kata2 kata-runtime[11652]: time="2020-01-08T13:41:42.458429322-08:00" level=info msg="cleanup vm path" arch=amd64 command=create container=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 dir=/run/vc/vm/0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 link=/run/vc/vm/0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 name=kata-runtime pid=11652 source=virtcontainers subsystem=qemu
Jan 08 13:41:42 kata2 kata-runtime[11652]: time="2020-01-08T13:41:42.458529288-08:00" level=info msg="Detaching endpoint" arch=amd64 command=create container=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 endpoint-type=virtual name=kata-runtime pid=11652 source=virtcontainers subsystem=network
Jan 08 13:41:42 kata2 kata-runtime[11652]: time="2020-01-08T13:41:42.463200553-08:00" level=info msg="sanner return error: read unix @->/run/vc/vm/0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6/qmp.sock: read: connection reset by peer" arch=amd64 command=create container=0d869446c2fac1d5ff0c357f43f0c46eaad35560840ad1e58bd706901d788cd6 name=kata-runtime pid=11652 source=virtcontainers subsystem=qmp

I'd say something looks suspicious in that region. @amshinde any advice if this looks reasonable or horribly broken?

@eadamsintel
Copy link

eadamsintel commented Jan 8, 2020

I did some more experimentation to help isolate where this might be an issue. I used kata-deploy to install the same 1.9.1 version of kata-runtime to /opt and it worked just fine on that failing system. If however I change the kata kernel to point to /usr/share/kata-containers/vmlinuz-4.19.93-91.container which is the kata kernel that clear installs then it fails to start. If I keep the kernel from /opt/kata/share/kata-containers/vmlinuz-4.19.75-55 but change the image to be the one that clear installs /usr/share/kata-containers/kata-containers-image_clearlinux_1.9.1_agent_d4bbd8007f.img then it launches just fine. I believe the issue is probably due to a recent kernel patch to the kata kernel. The easy fix might be to revert linux-kata.spec back to 4.19.87

Edit: I tried this on a Fedora system that is the same hardware as the Clear Linux system that is failing. I installed kata 1.9.1 to /opt with kata-deploy and I used the same kernel and image that failed above vmlinuz-4.19.93-91.container and kata-containers-image_clearlinux_1.9.1_agent_d4bbd8007f.img and it works just fine. The host kernel for Clear Linux is 5.4.8-886.native and for Fedora is 5.4.7-100.fc30.x86_64 Every combination of kernel and image that I tried on Fedora worked.

@amshinde
Copy link

amshinde commented Jan 9, 2020

I'd say something looks suspicious in that region. @amshinde any advice if this looks reasonable or horribly broken?

@bryteise Looks ok to me, nothing suspicious there.

@eadamsintel So is this an issue with the latest kata kernel from clearlinux. Do you think you can try with a few different kernels and zero in the one that is causing issue.
Would be good to see if this is actually a kernel issue. You mentioned about proxy earlier, but I dont see any reason that could cause any issue wrt kata.

@eadamsintel
Copy link

@amshinde I am not sure where the issue is. I have two failing systems in the office and one failing personal system. I downgraded one of them multiple versions back to 31640 right before the runtime updated from 1.8.2 to 1.9.1 trying out multiple kernels and none of them worked. I have even used swupd repair --picky --force to ensure the files were correct. I also tried multiple different Clear Linux kata kernels on the up to date Clear Linux system and they all fail to launch. I have two different kernels in /opt/kata/share/kata-containers from different installs from kata-deploy and those both work with the Clear Linux kata-containers.img

I moved the failing system to operate behind a proxy and it continued to fail with those failing kernels so I don't think that is related. It is just coincidental it failed.

Using Clear Linux 32050 with the built-in kata-runtime 1.9.1 using its /usr/share/kata-containers/kata-containers-image_clearlinux_1.9.1_agent_d4bbd8007f.img but swapping different kernels here is what I have tested. I kept the image the same but went back through every Clear Linux kata kernel until right before the switch to kata-runtime 1.9.1


Kata kernels from Clear Linux install

vmlinuz-4.19.83-83.container  FAILS
vmlinuz-4.19.84-84.container  FAILS
vmlinuz-4.19.85-85.container  FAILS
vmlinuz-4.19.86-86.container  FAILS
vmlinuz-4.19.86-87.container  FAILS
vmlinuz-4.19.93-91.container  FAILS

Kata kernels from kata-deploy

vmlinuz-4.19.86-59 WORKS
vmlinuz-4.19.75-55 WORKS

I will reinstall Clear on the failing system I downgraded and see if it fails as well.

@eadamsintel
Copy link

I reinstalled Clear on that failing system accepting all the defaults. I then installed containers-basic and containers-virt It still fails using the default install so this issue on my machine is easily reproducible. I tried the vmlinuz-4.19.86-59 kernel from kata-deploy and that did work so I can say the issue is with the kata kernel that Clear Linux creates.

@jodh-intel
Copy link

Tested with clear-32060-kvm.img in QEMU and Kata works fine with the defaults:

  • vmlinuz-4.19.93-91.container
  • CL guest kernel 5.4.8-411.kvm

We may get more detail on the problem if someone with the issue can enable full debug and post the resulting logs here.

@eadamsintel
Copy link

@jodh-intel Thanks for chiming in. If you look above there is a katafail.txt where I previously posted a full debug log from journalctl. I can generate another one if you want. Here is the output of kata-env.

[Meta]
Version = "1.0.23"

[Runtime]
Debug = true
Trace = false
DisableGuestSeccomp = true
DisableNewNetNs = false
SandboxCgroupOnly = false
Path = "/usr/bin/kata-runtime"
[Runtime.Version]
Semver = "1.9.1"
Commit = ""
OCI = "1.0.1-dev"
[Runtime.Config]
Path = "/etc/kata-containers/configuration.toml"

[Hypervisor]
MachineType = "pc"
Version = "QEMU emulator version 2.11.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
Path = "/usr/bin/kata-qemu-lite-system-x86_64"
BlockDeviceDriver = "virtio-scsi"
EntropySource = "/dev/urandom"
Msize9p = 8192
MemorySlots = 10
Debug = true
UseVSock = false
SharedFS = "virtio-9p"

[Image]
Path = "/usr/share/kata-containers/kata-containers-image_clearlinux_1.9.1_agent_d4bbd8007f.img"

[Kernel]
Path = "/usr/share/kata-containers/vmlinuz-4.19.93-91.container"
Parameters = "systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket agent.log=debug agent.log=debug initcall_debug"

[Initrd]
Path = ""

[Proxy]
Type = "kataProxy"
Version = "kata-proxy version 1.9.1"
Path = "/usr/libexec/kata-containers/kata-proxy"
Debug = true

[Shim]
Type = "kataShim"
Version = "kata-shim version 1.9.1"
Path = "/usr/libexec/kata-containers/kata-shim"
Debug = true

[Agent]
Type = "kata"
Debug = true
Trace = false
TraceMode = ""
TraceType = ""

[Host]
Kernel = "5.4.8-886.native"
Architecture = "amd64"
VMContainerCapable = true
SupportVSocks = true
[Host.Distro]
Name = "Clear Linux OS"
Version = "32060"
[Host.CPU]
Vendor = "GenuineIntel"
Model = "Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz"

[Netmon]
Version = "kata-netmon version 1.9.1"
Path = "/usr/libexec/kata-containers/kata-netmon"
Debug = true
Enable = false

@hongzhanchen
Copy link

I have tested on both Clear 31960 and 32080 but can not duplicate the issue.
In addition , I found that on my platform actually SupportVSocks set false by default . After I modprobe -i vhost_vsock, SupportVSocks get true with kata-runtime kata-env. But still can not duplicate it.

@eadamsintel
Copy link

I wiped my Skull Canyon and did a fresh install of 32080, and kata still fails to run with the default setup.

@hongzhanchen
Copy link

The issue seems similar to https://gitmemory.com/issue/kata-containers/tests/1531/495270869. There are both runtime error "level=error msg="Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing" arch=amd64 command=run container=h5aHcd3RjnkvqmlV6StT name=kata-runtime pid=7575 source=runtime"
and proxy error "time="2019-04-29T09:05:08.028576456Z" level=fatal msg="channel error" error="accept unix /run/vc/sbs/h5aHcd3RjnkvqmlV6StT/proxy.sock: use of closed network connection" name=kata-proxy pid=7594 sandbox=h5aHcd3RjnkvqmlV6StT source=proxy" ,which I can find same errors from katafail.txt.

@hongzhanchen
Copy link

hongzhanchen commented Jan 15, 2020

@eadamsintel , there is “ kata2 systemd-timesyncd[413]: Timed out waiting for reply from 184.105.182.7:123 (3.clearlinux.pool.ntp.org).” before the two errors happen, which means there should be network issue on your system?

@eadamsintel
Copy link

eadamsintel commented Jan 15, 2020

@hongzhanchen That occurs on a lot of my Clear Linux servers for some reason. I just reproduced this failure on my personal clear linux system and the last systemd-timesyncd error was from January 13th. On the other failing system, the last timesyncd failure was 20 minutes before I ran docker with the failure.

I should note that on all the failing systems if I just replace the kata kernel used to be the one from kata-deploy then the built-in kata-runtime 1.9.1 works.

/etc/kata-containers/configuration.toml
Doesn't work kernel = "/usr/share/kata-containers/vmlinuz-4.19.96-94.container"
Works kernel = "/opt/kata/share/kata-containers/vmlinuz-4.19.86-59"

@eadamsintel
Copy link

The reason why this might be so hard to reproduce is that it only happens on certain hardware for me. I have reproduced this on two Hades Canyon's and two Skull Canyon's but I can't reproduce it on two other types of NUC's with different processors or a server. Everything works if I use the kernel from kata-deploy for version 1.9.1 instead of the kata kernel at /usr/share/kata-containers

Everything tested on Clear Linux 32100
Fails
Hades Canyon Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz * 2 systems
Skull Canyon Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz * 2 systems
Works
NUC Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
NUC Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
SERVER Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

@eadamsintel
Copy link

I just did a quick test on 32180 and confirmed this is still an issue.

@bryteise
Copy link
Member

Hrm I'm still not able to reproduce (on both my Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz and Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz) with 32190. I note that the kernel version from kata-deploy (4.19.87) is about a month old at this point and updating involved being able to drop a number of CVE patches. I am not sure putting in the time to try and see if one of the CVE fixes are causing the behavior you are seeing is the right move (and since I can't reproduce on any of my systems difficult to do).

@eadamsintel
Copy link

I spent some time today rewinding my Clear Linux system using swupd repair. I found where it breaks on one of my systems.

30050 works
31050 works
31230 works
31240 Missing manifest for swupd repair
31250 Missing manifest for swupd repair
31260 Doesn't work
31300 Doesn't work
31350 Doesn't work
31510 Doesn't work

The 31240 release notes had the following. Something in that transition has issues with my Skull Canyon NUC with that kata kernel.
kata-image 1.6.2-15 -> 1.8.2-17
kata-proxy 1.6.2-16 -> 1.8.2-17
kata-runtime 1.6.2-45 -> 1.8.2-46
kata-shim 1.6.2-18 -> 1.8.2-19
linux-kata 4.19.77-75 -> 4.19.78-76

@eadamsintel
Copy link

I did some more testing with kata-deploy at this 31260 version of Clear Linux where it first quit working on this skylake NUC. Every kata kernel I tested from kata-deploy but using the built in Clear Linux kata-runtime all worked. None of the built in Clear Linux kernels worked across different versions of Clear Linux.

Kernels from kata-deploy but tested with Clear's kata-runtime using /etc/kata-containers/configuration.toml Only the kernel line was changed but otherwise is the default Clear Linux setup.
vmlinuz-4.19.73-51-katadeploy-1.8.2 WORKS
vmlinuz-4.19.75-54-katadeploy-1.9.3 WORKS
vmlinuz-4.19.75-55-katadeploy-1.9.4 WORKS
vmlinuz-4.19.86-60-katadeploy-1.10 WORKS

Built in Clear Linux kernels tested on Clear Linux 31260
vmlinuz-4.19.78-76.container-31260 DOES NOT WORK
vmlinuz-4.19.98-96.container-32190 DOES NOT WORK

@eadamsintel
Copy link

The new Kata containers 5.4 kernel is merged now and will soon be available. I suggest we test again when this kernel comes out.

@eadamsintel
Copy link

I saw a newer Kata 4.19 kernel was out. I tested on my failing systems and suddenly it started working on one of them but not the others. After close inspection, I saw that my host kernel was different between them. I did some more testing, and between two failing systems, it only worked with the v4.19 host kernel.

Host Kernel
org.clearlinux.native.5.5.13-925 - Fails
org.clearlinux.lts2018.4.19.113-126 -- Works
org.clearlinux.lts2019.5.4.28-22 -- Fails

@ahkok ahkok removed the new label Apr 22, 2020
@anselmolsm
Copy link

I'm also getting the same error on CL 33040, using kernel org.clearlinux.native.5.6.10-947

docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
0e03bdcc26d7: Pull complete 
Digest: sha256:8e3114318a995a1ee497790535e7b88365222a21771ae7e53687ad76563e8e76
Status: Downloaded newer image for hello-world:latest
docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing: unknown.

@karthikprabhu17
Copy link

karthikprabhu17 commented Jan 13, 2021

We have this failing on skylakex systems consistently. Very reproducible on any new clear version from past 3 months & skylakex systems

@felixg3 felixg3 closed this as completed Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests