-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker with kata-containers fail to start #1585
Comments
I am having similar issues with Clear Linux 31960 with a similar error. That error message is a pretty generic one and does not reveal a lot about what the problem might be.
I have another machine with Clear Linux I just updated to 31960 and it launched the containers just fine with kata. I can't really explain why though. I checked the output of As a workaround you can install Kata using kata-deploy. It installs Kata into /opt/kata and installs all the runtimes to support Qemu, Firecracker, Virtio-FS, etc.. and it updates your |
reproduced? I don't know what to do about this though, or who can take on this issue... |
Hrm I haven't been able to reproduce this on 32020 so far (I did need a little cleanup, having to remove |
cannot reproduce either. |
I can reproduce on 32020 on two different systems but not on a third. The only difference I can see between the systems is that the one where I use a proxy and configure a proxy it works just fine. The other two systems where it fails (one is my personal system at home) I don't have a proxy configured and don't need to configure a proxy on those systems. Using |
I'd say something looks suspicious in that region. @amshinde any advice if this looks reasonable or horribly broken? |
I did some more experimentation to help isolate where this might be an issue. I used Edit: I tried this on a Fedora system that is the same hardware as the Clear Linux system that is failing. I installed kata |
@bryteise Looks ok to me, nothing suspicious there. @eadamsintel So is this an issue with the latest kata kernel from clearlinux. Do you think you can try with a few different kernels and zero in the one that is causing issue. |
@amshinde I am not sure where the issue is. I have two failing systems in the office and one failing personal system. I downgraded one of them multiple versions back to 31640 right before the runtime updated from 1.8.2 to 1.9.1 trying out multiple kernels and none of them worked. I have even used I moved the failing system to operate behind a proxy and it continued to fail with those failing kernels so I don't think that is related. It is just coincidental it failed. Using Clear Linux 32050 with the built-in Kata kernels from Clear Linux install
Kata kernels from kata-deploy
I will reinstall Clear on the failing system I downgraded and see if it fails as well. |
I reinstalled Clear on that failing system accepting all the defaults. I then installed |
Tested with
We may get more detail on the problem if someone with the issue can enable full debug and post the resulting logs here. |
@jodh-intel Thanks for chiming in. If you look above there is a katafail.txt where I previously posted a full debug log from journalctl. I can generate another one if you want. Here is the output of kata-env. [Meta] [Runtime] [Hypervisor] [Image] [Kernel] [Initrd] [Proxy] [Shim] [Agent] [Host] [Netmon] |
I have tested on both Clear 31960 and 32080 but can not duplicate the issue. |
I wiped my Skull Canyon and did a fresh install of 32080, and kata still fails to run with the default setup. |
The issue seems similar to https://gitmemory.com/issue/kata-containers/tests/1531/495270869. There are both runtime error "level=error msg="Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing" arch=amd64 command=run container=h5aHcd3RjnkvqmlV6StT name=kata-runtime pid=7575 source=runtime" |
@eadamsintel , there is “ kata2 systemd-timesyncd[413]: Timed out waiting for reply from 184.105.182.7:123 (3.clearlinux.pool.ntp.org).” before the two errors happen, which means there should be network issue on your system? |
@hongzhanchen That occurs on a lot of my Clear Linux servers for some reason. I just reproduced this failure on my personal clear linux system and the last systemd-timesyncd error was from January 13th. On the other failing system, the last timesyncd failure was 20 minutes before I ran docker with the failure. I should note that on all the failing systems if I just replace the kata kernel used to be the one from /etc/kata-containers/configuration.toml |
The reason why this might be so hard to reproduce is that it only happens on certain hardware for me. I have reproduced this on two Hades Canyon's and two Skull Canyon's but I can't reproduce it on two other types of NUC's with different processors or a server. Everything works if I use the kernel from Everything tested on Clear Linux 32100 |
I just did a quick test on 32180 and confirmed this is still an issue. |
Hrm I'm still not able to reproduce (on both my Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz and Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz) with 32190. I note that the kernel version from kata-deploy (4.19.87) is about a month old at this point and updating involved being able to drop a number of CVE patches. I am not sure putting in the time to try and see if one of the CVE fixes are causing the behavior you are seeing is the right move (and since I can't reproduce on any of my systems difficult to do). |
I spent some time today rewinding my Clear Linux system using 30050 works The 31240 release notes had the following. Something in that transition has issues with my Skull Canyon NUC with that kata kernel. |
I did some more testing with Kernels from Built in Clear Linux kernels tested on Clear Linux 31260 |
The new Kata containers 5.4 kernel is merged now and will soon be available. I suggest we test again when this kernel comes out. |
I saw a newer Kata 4.19 kernel was out. I tested on my failing systems and suddenly it started working on one of them but not the others. After close inspection, I saw that my host kernel was different between them. I did some more testing, and between two failing systems, it only worked with the v4.19 host kernel. Host Kernel |
I'm also getting the same error on CL 33040, using kernel org.clearlinux.native.5.6.10-947
|
We have this failing on skylakex systems consistently. Very reproducible on any new clear version from past 3 months & skylakex systems |
Any docker container with kata does not start.
As an example, here the output for docker run hello-world:
docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing: unknown.
Using kata-collect-data, I gathered a significant amount of error information.
There is an issue for the same error rasied at kata-containers/kata-containers#28, however the troubleshooting steps do not help.
Show installed bundles
$ swupd bundle-list Babel NetworkManager NetworkManager-extras R-basic R-datasets R-extras R-rstudio R-stan Remmina Solaar Solaar-gui Sphinx acpica-unix2 akonadi alsa-utils aria2 ark atom baobab bc bcc binutils bison boot-encrypted bootloader bootloader-extras bpftrace c-basic c-basic-legacy calc cheese cloc cloud-api clr-network-troubleshooter cockpit columbiad containers-basic containers-virt cpio cryptography cryptoprocessor-management curl darktable desktop desktop-apps desktop-apps-extras desktop-assets desktop-gnomelibs desktop-kde desktop-kde-apps desktop-kde-libs desktop-locales dev-utils devpkg-LVM2 devpkg-R devpkg-acl devpkg-at-spi2-atk devpkg-at-spi2-core devpkg-atk devpkg-attr devpkg-audit devpkg-base devpkg-bzip2 devpkg-cairo devpkg-cryptsetup devpkg-curl devpkg-dbus devpkg-e2fsprogs devpkg-elfutils devpkg-expat devpkg-fontconfig devpkg-freetype devpkg-fribidi devpkg-fuse devpkg-gcr devpkg-gdk-pixbuf devpkg-glib devpkg-gnu-efi devpkg-gnutls devpkg-gobject-introspection devpkg-graphite devpkg-gtk-doc devpkg-gtk3 devpkg-harfbuzz devpkg-icu4c devpkg-iptables devpkg-json-c devpkg-json-glib devpkg-kmod devpkg-libX11 devpkg-libXau devpkg-libXcursor devpkg-libXdmcp devpkg-libXft devpkg-libXtst devpkg-libcap devpkg-libcap-ng devpkg-libcgroup devpkg-libdrm devpkg-libepoxy devpkg-libevent devpkg-libffi devpkg-libgcrypt devpkg-libgpg-error devpkg-libidn devpkg-libidn2 devpkg-libjpeg-turbo devpkg-libmicrohttpd devpkg-libmnl devpkg-libnetfilter_conntrack devpkg-libnfnetlink devpkg-libnftnl devpkg-libpng devpkg-libpsl devpkg-libpthread-stubs devpkg-libseccomp devpkg-libsoup devpkg-libtasn1 devpkg-libtirpc devpkg-libunwind devpkg-libusb devpkg-libxcb devpkg-libxkbcommon devpkg-libxml2 devpkg-llvm devpkg-lz4 devpkg-mesa devpkg-ncurses devpkg-nettle devpkg-openssl devpkg-p11-kit devpkg-pango devpkg-pciutils devpkg-pcre devpkg-pcre2 devpkg-pixman devpkg-popt devpkg-readline devpkg-shared-mime-info devpkg-sqlite-autoconf devpkg-systemd devpkg-talloc devpkg-util-linux devpkg-util-macros devpkg-wayland devpkg-wayland-protocols devpkg-webkitgtk devpkg-xapian-core devpkg-xcb-proto devpkg-xorgproto devpkg-xtrans devpkg-xz devpkg-zlib diffutils digikam dnf docbook-utils docker-compose docutils dolphin dosfstools doxygen dpdk editors emacs emacs-x11 eog ethtool evince evolution extremetuxracer feh file file-roller findutils firefox firmware-update flatpak flex fonts-basic fuse fwupdate games gdb geany geary gedit ghostscript gimp git gjs glibc-locale gnome-base-libs gnome-boxes gnome-calculator gnome-characters gnome-clocks gnome-color-manager gnome-disk-utility gnome-font-viewer gnome-logs gnome-music gnome-photos gnome-screenshot gnome-system-monitor gnome-todo gnome-weather go-basic gpgme gphoto2 graphviz gstreamer gtk-vnc gvim gwenview gzip hardware-bluetooth hardware-printing hardware-uefi hardware-wifi hexchat htop icdiff inkscape inotify-tools intltool iotop iperf iproute2 iptables irssi iwd java-runtime joe jq jupyter kamera kate kbd kcalc kde-frameworks5 kdiff3 keepassxc kernel-install kernel-native kleopatra konqueror konsole kontact konversation krita ksysguard kvm-host less lib-imageformat lib-opengl lib-openssl lib-qt5webengine lib-samba libX11client libglib libstdcpp libva-utils libxslt linux-dev linux-firmware linux-firmware-extras linux-firmware-wifi linux-tools llvm lm-sensors mail-utils make maker-basic man-pages mariadb minetest minetestserver minicom mpv mutt nasm nautilus neomutt neovim net-tools network-basic nfs-utils nim nodejs-basic notmuch okular openblas openldap openssh-client openssh-server openssl openvswitch os-core os-core-dev os-core-legacy os-core-plus os-core-search os-core-update os-core-webproxy p11-kit package-utils pandoc parallel parted patch performance-tools perl-basic perl-basic-dev perl-extras pidgin pmdk polkit powertop procps-ng productivity pulseaudio pygobject python-data-science python-extras python2-basic python3-basic python3-tcl qbittorrent qemu-guest-additions qt-basic qt-core quassel redshift rsync rust-basic rxvt-unicode samba sddm seahorse shells smartmontools spectacle spice-gtk storage-utils strace sudo supertuxkart suricata syndication sysadmin-basic sysadmin-remote syslinux sysstat tcl-basic telemetrics texinfo thermal_daemon thunderbird tigervnc tmux totem tzdata unzip user-basic valgrind vim vinagre virt-manager virt-manager-gui virt-viewer vlc vnc-server webkitgtk weechat wget which wine wpa_supplicant x11-server x11-tools x11vnc xfsprogs xterm xz yakuake yasm zenity znc zsh zstd
Show kata-collect-data.sh details
Meta details
Running
kata-collect-data.sh
version1.9.1 (commit )
at2019-12-22.13:16:07.150966092+0100
.Runtime is
/usr/bin/kata-runtime
.kata-env
Output of "
/usr/bin/kata-runtime kata-env
":Runtime config files
Runtime default config files
Runtime config file contents
Config file
/etc/kata-containers/configuration.toml
not foundOutput of "
cat "/usr/share/defaults/kata-containers/configuration.toml"
":KSM throttler
version
Output of "
--version
":systemd service
Image details
Initrd details
No initrd
Logfiles
Runtime logs
Recent runtime problems found in system journal:
Proxy logs
Recent proxy problems found in system journal:
Shim logs
No recent shim problems found in system journal.
Throttler logs
No recent throttler problems found in system journal.
Container manager details
Have
docker
Docker
Output of "
docker version
":Output of "
docker info
":Output of "
systemctl show docker
":No
kubectl
Have
crio
crio
Output of "
crio --version
":Output of "
systemctl show crio
":Output of "
cat /etc/crio/crio.conf
":Have
containerd
containerd
Output of "
containerd --version
":Output of "
systemctl show containerd
":Output of "
cat /etc/containerd/config.toml
":Packages
No
dpkg
Have
rpm
Output of "
rpm -qa|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-ksm-throttler|kata-containers-image|linux-container|qemu-)"
":The text was updated successfully, but these errors were encountered: