Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

TestSocketProxyd is racy #2432

Open
alban opened this issue Apr 15, 2016 · 17 comments · Fixed by #2436
Open

TestSocketProxyd is racy #2432

alban opened this issue Apr 15, 2016 · 17 comments · Fixed by #2436

Comments

@alban
Copy link
Member

alban commented Apr 15, 2016

The new test TestSocketProxyd was recently implemented by #2257.

It fails on CentOS 7 and Fedora 22:

--- FAIL: TestSocketProxyd (0.25s)
    rkt_socket_proxyd_test.go:192: read tcp [::1]:60169->[::1]:39719: read: connection reset by peer

I suspect that those distributions have an old systemd version without systemd-proxyd. The test should be skipped when systemd-proxyd is not installed.

@alban
Copy link
Member Author

alban commented Apr 15, 2016

/cc @woodbor

@alban
Copy link
Member Author

alban commented Apr 15, 2016

It seems to still fail on Fedora 22. Reopening.

It is actually installed:

$ /lib/systemd/systemd-socket-proxyd --version
systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN

I found the following in journalctl:

rkt[6897]: networking: loading networks from /tmp/localdir-448504904/net.d
rkt[6897]: stage1: failed to setup network
rkt[6897]: stage0: Writing image manifest
rkt[6897]: stage0: Writing pod manifest
rkt[6897]: stage0: Setting up stage1
rkt[6897]: stage0: Wrote filesystem to /tmp/datadir-248454989/pods/run/a3131d2a-5dc2-4897-9f15-751c1b3b3
rkt[6897]: stage0: Pivoting to filesystem /tmp/datadir-248454989/pods/run/a3131d2a-5dc2-4897-9f15-751c1b
rkt[6897]: stage0: Execing /init
audit: <audit-1325> table=filter family=2 entries=0
audit: <audit-1325> table=nat family=2 entries=0
rkt[6897]: networking: loading networks from /tmp/localdir-448504904/net.d
rkt[6897]: stage1: failed to setup network
rkt[6897]: stage0: Writing pod manifest
rkt[6897]: stage0: Setting up stage1
rkt[6897]: stage0: Wrote filesystem to /tmp/datadir-248454989/pods/run/a3131d2a-5dc2-4897-9f15-751c1b3b3
rkt[6897]: stage0: Pivoting to filesystem /tmp/datadir-248454989/pods/run/a3131d2a-5dc2-4897-9f15-751c1b
rkt[6897]: stage0: Execing /init
audit: <audit-1325> table=filter family=2 entries=0
audit: <audit-1325> table=nat family=2 entries=0
rkt[6897]: networking: loading networks from /tmp/localdir-448504904/net.d
rkt[6897]: stage1: failed to setup network
rkt[6897]: └─error loading network definitions
rkt[6897]: └─networks not found: "ptp0
systemd[1]: rkt-testing-socket-activation-4672351928322017000.service: main process exited, code=exited,
systemd[1]: Unit rkt-testing-socket-activation-4672351928322017000.service entered failed state.
systemd[1]: rkt-testing-socket-activation-4672351928322017000.service failed.
audit[1]: <audit-1131> pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='
sudo[6925]:   fedora : TTY=pts/0 ; PWD=/var/tmp/rkt/builds/build-rkt-coreos ; USER=root ; COMMAND=/bin/j
audit[6925]: <audit-1123> pid=6925 uid=1000 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:
audit[6925]: <audit-1110> pid=6925 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-
systemd-logind[923]: Existing logind session ID 1 used by new audit session, ignoring
systemd[1]: Started Session c3 of user root.
systemd-logind[923]: New session c3 of user root.
systemd[1]: Starting Session c3 of user root.
sudo[6925]: pam_unix(sudo:session): session opened for user root by fedora(uid=0)
audit[6925]: <audit-1105> pid=6925 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-
sudo[6925]: pam_unix(sudo:session): session closed for user root
audit[6925]: <audit-1106> pid=6925 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-
audit[6925]: <audit-1104> pid=6925 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-
systemd-logind[923]: Removed session c3.

@alban alban reopened this Apr 15, 2016
@borqosky
Copy link
Contributor

rkt[6897]: networking: loading networks from /tmp/localdir-448504904/net.d
rkt[6897]: stage1: failed to setup network
rkt[6897]: └─error loading network definitions
rkt[6897]: └─networks not found: "ptp0
@alban : Could it be a root error of test ?

@alban
Copy link
Member Author

alban commented Apr 15, 2016

The file net.d/ptp0.conf seems to be correct?

{"Name":"ptp0","Type":"ptp","master":"eth0","IpMasq":true,"IsGateway":false,"Ipam":{"Type":"host-local","subnet":"192.168.0.0/24","routes":[{"dst":"0.0.0.0/0"}]}}

@woodbor what do you mean with a root error of test?

@jonboulle jonboulle modified the milestones: v1.6.0, v1.5.0 Apr 26, 2016
@jonboulle
Copy link
Contributor

@woodbor are you able to take a look at this?

@jellonek
Copy link
Contributor

But name of network seems incorrect, I mean: "ptp0 seems wrong.
Maybe this is sign of regression somewhere around providing option to read network configuration from custom location?

@jellonek
Copy link
Contributor

@alban could You provide there command line with which rkt was called in this situation? Looks like there was somewhere first " escaped and so - added as a start of network name. Error message comes from networking/podenv.go:88 where format string uses %v, so my question is - how rtk was called?

@jellonek
Copy link
Contributor

FYI - removing \" in --net parameters in https://github.com/coreos/rkt/blob/master/tests/rkt_socket_proxyd_test.go#L105 resolves issue, so IMO systemd in different ways provides this parameter in different versions of fedora, i mean - in old one - it passess " as part of string; in new one - " is used on systemd side as way to provide white space separated string as one parameter, but it strips these characters from command line passed to process...

@alban
Copy link
Member Author

alban commented Apr 26, 2016

@jellonek well spotted. Removing the " seems to be the right thing to do since the command is not sent to a shell.

@alban
Copy link
Member Author

alban commented Apr 26, 2016

https://github.com/coreos/rkt/blob/master/Documentation/networking/overriding-defaults.md

The usage of double quotes is mandatory due to the ; being used as separator within the arguments for a single network.

This should be reworded: rkt does not expect any double quotes. This assumes that rkt is executed from a shell (like sh or bash) that removes the double quotes before exec()ing.

borqosky pushed a commit to intelsdi-x/rkt that referenced this issue Apr 27, 2016
Fixes rkt#2432. Command is not sent to a shell, and not remove double
quotes by systemd on some old distros (fedora22).
alban pushed a commit to kinvolk/rkt that referenced this issue Apr 27, 2016
Fixes rkt#2432. Command is not sent to a shell, and not remove double
quotes by systemd on some old distros (fedora22).
borqosky pushed a commit to intelsdi-x/rkt that referenced this issue Apr 27, 2016
The command is executed as a systemd unit and is not sent to
a shell and the double quotes were not interpreted by systemd-v219
on some old distros (like Fedora 22).

Fixes rkt#2432
@iaguis
Copy link
Member

iaguis commented May 23, 2016

This seems to happen again on rawhide.

@mzylowski
Copy link
Contributor

mzylowski commented Jun 30, 2016

Hey!
Today I experienced the same (or similar) problem like @alban wrote in April (on ubuntu 16.04).
=== RUN TestSocketProxyd
Writing ptp0 to /tmp/localdir-743924341/net.d/ptp0.conf
--- FAIL: TestSocketProxyd (127.55s)
rkt_socket_proxyd_test.go:200: read tcp 127.0.0.1:38908->127.0.0.1:44289: read: connection reset by peer
FAIL

And from journalctl on the host:
Jun 30 18:03:02 mzylowski-devel rkt[16691]: image: using image from file /home/mzylowski/Repos/rkt/build-rkt-1.9.1+git/tmp/functional/test-tmp/rkt-inspect-echo.aci
Jun 30 18:03:03 mzylowski-devel rkt[16691]: stage0: Preparing stage1
Jun 30 18:03:03 mzylowski-devel rkt[16691]: stage0: Writing image manifest
Jun 30 18:03:03 mzylowski-devel rkt[16691]: stage0: Loading image sha512-e21855cc1ef6f3430e7493a96846256f105d98a789f204c71dc5a0896a5a71d4
Jun 30 18:03:03 mzylowski-devel rkt[16691]: stage0: Writing image manifest
Jun 30 18:03:03 mzylowski-devel rkt[16691]: stage0: Writing pod manifest
Jun 30 18:03:03 mzylowski-devel rkt[16691]: run: group "rkt" not found, will use default gid when rendering images
Jun 30 18:03:03 mzylowski-devel rkt[16691]: stage0: Setting up stage1
Jun 30 18:03:03 mzylowski-devel rkt[16691]: stage0: Wrote filesystem to /tmp/datadir-644895720/pods/run/e575ab33-7043-46ad-9de2-15f69e235649
Jun 30 18:03:03 mzylowski-devel rkt[16691]: stage0: Pivoting to filesystem /tmp/datadir-644895720/pods/run/e575ab33-7043-46ad-9de2-15f69e235649
Jun 30 18:03:03 mzylowski-devel rkt[16691]: stage0: Execing /init
Jun 30 18:03:03 mzylowski-devel rkt[16691]: networking: loading networks from /tmp/localdir-799515175/net.d
Jun 30 18:03:03 mzylowski-devel systemd-udevd[16740]: Could not generate persistent MAC address for rkt-e575-tap0: No such file or directory
Jun 30 18:03:03 mzylowski-devel NetworkManager[888]: <info> [1467302583.9445] manager: (rkt-e575-tap0): new Tun device (/org/freedesktop/NetworkManager/Devices/8)
Jun 30 18:03:03 mzylowski-devel kernel: IPv6: ADDRCONF(NETDEV_UP): rkt-e575-tap0: link is not ready
Jun 30 18:03:03 mzylowski-devel rkt[16691]: stage1: failed to setup network
Jun 30 18:03:03 mzylowski-devel rkt[16691]: └─problem executing network plugin "host-local" ("rkt-e575-tap0")
Jun 30 18:03:03 mzylowski-devel rkt[16691]: └─exit status 1
Jun 30 18:03:03 mzylowski-devel systemd[1]: rkt-testing-socket-activation-8441707359698410096.service: Main process exited, code=exited, status=1/FAILURE
Jun 30 18:03:03 mzylowski-devel systemd[1]: rkt-testing-socket-activation-8441707359698410096.service: Unit entered failed state.
Jun 30 18:03:03 mzylowski-devel systemd[1]: rkt-testing-socket-activation-8441707359698410096.service: Failed with result 'exit-code'.
Jun 30 18:03:03 mzylowski-devel NetworkManager[888]: <info> [1467302583.9468] devices added (path: /sys/devices/virtual/net/rkt-e575-tap0, iface: rkt-e575-tap0)
Jun 30 18:03:03 mzylowski-devel NetworkManager[888]: <info> [1467302583.9468] device added (path: /sys/devices/virtual/net/rkt-e575-tap0, iface: rkt-e575-tap0): no ifupdown configuration found.

And ptp0.conf file:
{"Name":"ptp0","Type":"ptp","master":"eno1","IpMasq":true,"IsGateway":false,"Ipam":{"Type":"host-local","subnet":"192.168.0.0/24","routes":[{"dst":"0.0.0.0/0"}]}}
It looks like ubuntu-specific problem, but I don't have idea where is the problem with system configuration (or something else).

This problem occurs on kvm and coreos falvors.

/cc @iaguis @pskrzyns

@alban alban removed this from the v1.6.0 milestone Jul 20, 2016
@alban
Copy link
Member Author

alban commented Jul 20, 2016

It happened again on Fedora-23: #2860 (comment)
Reopening...

@iaguis
Copy link
Member

iaguis commented Jul 29, 2016

@alban forgot to reopen

@iaguis iaguis reopened this Jul 29, 2016
@iaguis iaguis changed the title TestSocketProxyd fails on old distributions TestSocketProxyd is racy Aug 10, 2016
@krnowak
Copy link
Collaborator

krnowak commented Aug 10, 2016

The failures of the TestSocketProxy test on the fedora rawhide distro are likely because of a race between the echo server and the systemd-socket-proxyd. The latter is set to require the echo server and to start after echo server is ready. But currently nspawn sends the ready notification before inspect binary in the echo server has a chance to run and set up communication channels. So proxy daemon tries to connect to inspect, but inspect didn't start listening so proxyd bails out… To fix we will likely need patches from #2826 merged and implement sending ready notifications in inspect. The test would need to be fixed to make the echo server of "notify" type then.

@iaguis
Copy link
Member

iaguis commented Aug 10, 2016

I'm disabling the test for now: #3049

@krnowak
Copy link
Collaborator

krnowak commented Aug 10, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants