Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket activation works with TCP socket but not with Unix Domain socket #10443

Closed
eriksjolund opened this issue May 24, 2021 · 8 comments · Fixed by #11316
Closed

Socket activation works with TCP socket but not with Unix Domain socket #10443

eriksjolund opened this issue May 24, 2021 · 8 comments · Fixed by #11316
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@eriksjolund
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I tested socket activation with /usr/lib/systemd/systemd-socket-proxyd in four different ways.

Running in Podman Socket type Systemd type Working?
yes TCP socket Type=forking yes
yes Unix Domain socket Type=forking no
no TCP socket Type=exec yes
no Unix Domain socket Type=exec yes

The tests were performed in Systemd user services (i.e. running rootless).

I also tried adding the --privileged flag but it didn't help.

Steps to reproduce the issue:

TCP socket with Podman : Success

[erikdev@laptop ~]$ mkdir systemd_container
[erikdev@laptop ~]$ cd systemd_container/
[erikdev@laptop systemd_container]$ emacs -nw Dockerfile
[erikdev@laptop systemd_container]$ cat Dockerfile
FROM registry.fedoraproject.org/fedora:34
RUN dnf -y install systemd && dnf clean all
CMD ["/usr/lib/systemd/systemd-socket-proxyd","fedoramagazine.org:443"]

[erikdev@laptop systemd_container]$ podman build -t systemd .
STEP 1: FROM registry.fedoraproject.org/fedora:34
STEP 2: RUN dnf -y install systemd && dnf clean all
--> Using cache ed085871341244571e0f8d655039e5fa1f6d47b90054e9665dfa866b51bf6253
--> ed085871341
STEP 3: CMD ["/usr/lib/systemd/systemd-socket-proxyd","fedoramagazine.org:443"]
--> Using cache 035bd01dfa88ab15f3e637ddc8a257d00b575dd490e48985304093ed9c178a21
STEP 4: COMMIT systemd
--> 035bd01dfa8
035bd01dfa88ab15f3e637ddc8a257d00b575dd490e48985304093ed9c178a21
[erikdev@laptop systemd_container]$ cd ~/.config/systemd/user
[erikdev@laptop user]$ podman create --name testing localhost/systemd
d69c504e11c880532c66bf9adb699499a880dbd481a1df7cb7f7b0fd1f3d3605
[erikdev@laptop user]$ podman generate systemd --new --name testing > testing.service
[erikdev@laptop user]$ emacs -nw testing.socket
[erikdev@laptop user]$ cat testing.socket
[Unit]
Description=Socket for %N

[Socket]
ListenStream=9999

[Install]
WantedBy=sockets.target

[erikdev@laptop user]$ systemctl --user start testing.socket
[erikdev@laptop user]$ curl -s --resolve fedoramagazine.org:localhost:9999 https://fedoramagazine.org | head -1
<!DOCTYPE html>
[erikdev@laptop user]$ 

Unix domain socket with Podman : Failure

[erikdev@laptop user]$ systemctl --user stop testing.socket
[erikdev@laptop user]$ systemctl --user stop testing.service
[erikdev@laptop user]$ emacs -nw testing.socket
[erikdev@laptop user]$ cat testing.socket
[Unit]
Description=Socket for %N

[Socket]
ListenStream=%t/%N.sock

[Install]
WantedBy=sockets.target

[erikdev@laptop user]$ systemctl --user start testing.socket
[erikdev@laptop user]$ curl --unix-socket $XDG_RUNTIME_DIR/testing.sock https://fedoramagazine.org 
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to fedoramagazine.org:443 
[erikdev@laptop user]$ 

I also tried adding the --privileged flag to the podman run command but it didn't help.

[erikdev@laptop user]$ emacs -nw testing.service
[erikdev@laptop user]$ systemctl --user daemon-reload
[erikdev@laptop user]$ systemctl --user stop testing.service
[erikdev@laptop user]$ systemctl --user stop testing.socket
[erikdev@laptop user]$ systemctl --user reset-failed testing.service
[erikdev@laptop user]$ systemctl --user start testing.socket
[erikdev@laptop user]$ curl --unix-socket $XDG_RUNTIME_DIR/testing.sock https://fedoramagazine.org 
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to fedoramagazine.org:443 
[erikdev@laptop user]$ systemctl --user status --no-pager testing.service
× testing.service - Podman container-testing.service
     Loaded: loaded (/home/erikdev/.config/systemd/user/testing.service; disabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Mon 2021-05-24 18:13:16 CEST; 11s ago
TriggeredBy: × testing.socket
       Docs: man:podman-generate-systemd(1)
    Process: 43631 ExecStartPre=/bin/rm -f /run/user/1002/container-testing.pid /run/user/1002/container-testing.ctr-id (code=exited, status=0/SUCCESS)
    Process: 43632 ExecStart=/usr/bin/podman run --privileged --conmon-pidfile /run/user/1002/container-testing.pid --cidfile /run/user/1002/container-testing.ctr-id --cgroups=no-conmon -d --replace --name testing localhost/systemd (code=exited, status=0/SUCCESS)
    Process: 43686 ExecStopPost=/usr/bin/podman rm --ignore -f --cidfile /run/user/1002/container-testing.ctr-id (code=exited, status=0/SUCCESS)
   Main PID: 43666 (code=exited, status=1/FAILURE)
        CPU: 557ms

May 24 18:13:16 laptop systemd[1166]: testing.service: Scheduled restart job, restart counter is at 5.
May 24 18:13:16 laptop systemd[1166]: Stopped Podman container-testing.service.
May 24 18:13:16 laptop systemd[1166]: testing.service: Start request repeated too quickly.
May 24 18:13:16 laptop systemd[1166]: testing.service: Failed with result 'exit-code'.
May 24 18:13:16 laptop systemd[1166]: Failed to start Podman container-testing.service.
[erikdev@laptop user]$ 

TCP socket without Podman : Success

[erikdev@laptop user]$ emacs -nw testing2.service
[erikdev@laptop user]$ cat testing2.service

[Unit]
Description=Test systemd-socket-proxyd
Wants=network.target
After=network-online.target

[Service]
Restart=on-failure
TimeoutStopSec=70

ExecStart=/usr/lib/systemd/systemd-socket-proxyd fedoramagazine.org:443
Type=exec
[erikdev@laptop user]$ emacs -nw testing2.socket
[erikdev@laptop user]$ cat testing2.socket
[Unit]
Description=Socket for %N

[Socket]
ListenStream=9998

[Install]
WantedBy=sockets.target

[erikdev@laptop user]$ 
[erikdev@laptop user]$ systemctl --user start testing2.socket
[erikdev@laptop user]$ curl -s  --resolve fedoramagazine.org:localhost:9998 https://fedoramagazine.org | head -1
<!DOCTYPE html>
[erikdev@laptop user]$ 

Unix domain socket without Podman : Success

[erikdev@laptop user]$ systemctl --user stop testing2.socket
[erikdev@laptop user]$ systemctl --user stop testing2.service
[erikdev@laptop user]$ emacs -nw testing2.socket
[erikdev@laptop user]$ cat testing2.socket
[Unit]
Description=Socket for %N

[Socket]
ListenStream=%t/%N.sock

[Install]
WantedBy=sockets.target

[erikdev@laptop user]$ systemctl --user start testing2.socket
[erikdev@laptop user]$ curl -s --unix-socket $XDG_RUNTIME_DIR/testing2.sock https://fedoramagazine.org | head -1
<!DOCTYPE html>
[erikdev@laptop user]$ 

Describe the results you received:

The test Unix domain socket with Podman failed.

Describe the results you expected:

I expected it to work.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      3.1.2
API Version:  3.1.2
Go Version:   go1.16.3
Built:        Wed May 12 21:27:59 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.20.1
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.27-2.fc34.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.27, commit: '
  cpus: 8
  distribution:
    distribution: fedora
    version: "34"
  eventLogger: journald
  hostname: laptop
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 231072
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 231072
      size: 65536
  kernel: 5.12.5-300.fc34.x86_64
  linkmode: dynamic
  memFree: 22619365376
  memTotal: 33503195136
  ociRuntime:
    name: crun
    package: crun-0.19.1-2.fc34.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.19.1
      commit: 1535fedf0b83fb898d449f9680000f729ba719f5
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1002/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    selinuxEnabled: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.9-1.fc34.x86_64
    version: |-
      slirp4netns version 1.1.8+dev
      commit: 6dc0186e020232ae1a6fcc1f7afbc3ea02fd3876
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.0
  swapFree: 13958635520
  swapTotal: 13958635520
  uptime: 18h 30m 46.84s (Approximately 0.75 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/erikdev/.config/containers/storage.conf
  containerStore:
    number: 5
    paused: 0
    running: 0
    stopped: 5
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.5.0-1.fc34.x86_64
      Version: |-
        fusermount3 version: 3.10.3
        fuse-overlayfs: version 1.5
        FUSE library version 3.10.3
        using FUSE kernel interface version 7.31
  graphRoot: /home/erikdev/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 6
  runRoot: /run/user/1002/containers
  volumePath: /home/erikdev/.local/share/containers/storage/volumes
version:
  APIVersion: 3.1.2
  Built: 1620847679
  BuiltTime: Wed May 12 21:27:59 2021
  GitCommit: ""
  GoVersion: go1.16.3
  OsArch: linux/amd64
  Version: 3.1.2

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.1.2-3.fc34.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

I have checked the Podman Troubleshooting Guide.

Additional environment details (AWS, VirtualBox, physical, etc.):

Physical computer

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label May 24, 2021
@rhatdan
Copy link
Member

rhatdan commented May 24, 2021

@eriksjolund Nice job designing this tests, but do you have any idea what podman is doing incorrectly? Is there something about socket activated unix domain sockets that we are not passing in correctly? Perhaps the pack of the socket?
@vrothberg @nalind @giuseppe @jwhonce Any ideas?

@eriksjolund
Copy link
Contributor Author

Ooops, I made a mistake with the curl command. It shouldn't be

 curl -s  --resolve fedoramagazine.org:localhost:9997 https://fedoramagazine.org 

Probably

curl -s  --resolve fedoramagazine.org:9999:localhost localhost:9999

I'll see if this changes things.

@eriksjolund
Copy link
Contributor Author

I think I've now figured out how to use curl with the --resolve command-line option.

When I tried it once again, the test with Podman TCP socket also fails.

[erikdev@laptop user]$ systemctl --user status testing.socket
● testing.socket - Socket for testing
     Loaded: loaded (/home/erikdev/.config/systemd/user/testing.socket; disabled; vendor preset: disabled)
     Active: active (listening) since Mon 2021-05-24 20:02:53 CEST; 9s ago
   Triggers: ● testing.service
     Listen: [::]:443 (Stream)
      Tasks: 0 (limit: 38291)
     Memory: 4.0K
        CPU: 539us
     CGroup: /user.slice/user-1002.slice/user@1002.service/app.slice/testing.socket

May 24 20:02:53 laptop systemd[1166]: Listening on Socket for testing.
[erikdev@laptop user]$ curl --verbose --resolve fedoramagazine.org:443:127.0.0.1 https://fedoramagazine.org:443 
* Added fedoramagazine.org:443:127.0.0.1 to DNS cache
* Hostname fedoramagazine.org was found in DNS cache
*   Trying 127.0.0.1:443...
* Connected to fedoramagazine.org (127.0.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/pki/tls/certs/ca-bundle.crt
*  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: Connection reset by peer in connection to fedoramagazine.org:443 
* Closing connection 0
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to fedoramagazine.org:443 
[erikdev@laptop user]$ 

Testing the TCP socket without Podman works.

[erikdev@laptop user]$ systemctl --user status testing2.socket
● testing2.socket - Socket for testing2
     Loaded: loaded (/home/erikdev/.config/systemd/user/testing2.socket; disabled; vendor preset: disabled)
     Active: active (listening) since Mon 2021-05-24 20:04:05 CEST; 3s ago
   Triggers: ● testing2.service
     Listen: [::]:443 (Stream)
      Tasks: 0 (limit: 38291)
     Memory: 4.0K
        CPU: 918us
     CGroup: /user.slice/user-1002.slice/user@1002.service/app.slice/testing2.socket

May 24 20:04:05 laptop systemd[1166]: Listening on Socket for testing2.
[erikdev@laptop user]$ curl --verbose --resolve fedoramagazine.org:443:127.0.0.1 https://fedoramagazine.org:443 
* Added fedoramagazine.org:443:127.0.0.1 to DNS cache
* Hostname fedoramagazine.org was found in DNS cache
*   Trying 127.0.0.1:443...
* Connected to fedoramagazine.org (127.0.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/pki/tls/certs/ca-bundle.crt
*  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=fedoramagazine.org
*  start date: May 17 21:34:03 2021 GMT
*  expire date: Aug 15 21:34:03 2021 GMT
*  subjectAltName: host "fedoramagazine.org" matched cert's "fedoramagazine.org"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x561e15a42bf0)
> GET / HTTP/2
> Host: fedoramagazine.org
> user-agent: curl/7.76.1
> accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200 
< server: nginx
< date: Mon, 24 May 2021 18:04:17 GMT
< content-type: text/html; charset=UTF-8
< content-length: 48074
< vary: Accept-Encoding
< vary: Accept-Encoding
< expires: Thu, 19 Nov 1981 08:52:00 GMT
< pragma: no-cache
< link: <https://fedoramagazine.org/wp-json/>; rel="https://api.w.org/"
< link: <https://wp.me/3XX0v>; rel=shortlink
< x-powered-by: WP Engine
< x-cacheable: SHORT
< vary: Accept-Encoding,Cookie
< cache-control: max-age=600, must-revalidate
< x-cache: HIT: 2
< x-cache-group: normal
< accept-ranges: bytes
< 
<!DOCTYPE html>

(I truncated the output as it was quite long).

My goal was to create a minimal example of running a container with Podman in a
Systemd user service and make use of socket activation.
I thought /usr/lib/systemd/systemd-socket-proxyd could be used for this but now I see it is easy to get a bit lost. Probably I should try to find some other program to create a more minimal example.

@rhatdan
Copy link
Member

rhatdan commented May 24, 2021

Ok so we are not correctly passing the socket activation down to the container within podman.

@eriksjolund
Copy link
Contributor Author

I discovered one more thing.
These command-line options must be passed

--env LISTEN_PID --env LISTEN_FDS --env LISTEN_FDNAMES

I tried to "socket-activate" this Bash command:

ExecStart=/usr/bin/podman run --env LISTEN_PID --env LISTEN_FDS --env LISTEN_FDNAMES  --conmon-pidfile %t/container-testing4.pid --cidfile %t/container-testing4.ctr-id --cgroups=no-conmon -d --replace -v %h/data:/data:z --name testing4 registry.fedoraproject.org/fedora:34 /bin/bash -c "echo \
LISTEN_PID=$$LISTEN_PID pid=$$$$ > /data/output4"
[erikdev@laptop user]$ cat ~/data/output4
LISTEN_PID=65288 pid=1
[erikdev@laptop user]$

Hmm, should those PID:s be equal?

Regarding the many $ dollar signs. I am not quite sure why I need to provide so many. Probably there is some escaping going on.

I wonder if we should close this issue? I could open up a new issue when I know some more. (Hopefully I could find some time for this during the weekend)

@rhatdan
Copy link
Member

rhatdan commented May 24, 2021

I would argue that Podman should be handling the passing of these environment variables itself, when running podman run or podman start. IE It should assume that it is running as a service and should pass these onto the container PID1.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

rhatdan added a commit to rhatdan/podman that referenced this issue Jun 30, 2021
If a container is running within a systemd service and it is socket
activated, we need to leak the LISTEN_* environment variables into the
container.

Fixes: containers#10443

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

vrothberg added a commit to vrothberg/libpod that referenced this issue Aug 31, 2021
Make sure that Podman passes the LISTEN_* environment into containers.
Similar to runc, LISTEN_PID is set to 1.

Also remove conditionally passing the LISTEN_FDS as extra files.
The condition was wrong (inverted) and introduced to fix containers#3572 which
related to running under varlink which has been dropped entirely
with Podman 3.0.  Note that the NOTIFY_SOCKET and LISTEN_* variables
are cleared when running `system service`.

Fixes: containers#10443
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
2 participants