Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long k8s-file logs are truncated on 'podman logs --follow' output of stopped container #21914

Open
UniversalSuperBox opened this issue Mar 1, 2024 · 7 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@UniversalSuperBox
Copy link

UniversalSuperBox commented Mar 1, 2024

Issue Description

When a container using the k8s-file log driver outputs a lot of logs, retrieving them with podman logs --follow is unreliable. If you're lucky, the entire log file is output successfully. If you're not lucky, the log is truncated at some unspecified place depending on the speed of your hardware.

Retrieving logs without --follow is unaffected. Retrieving logs from a running container is unaffected.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Run a container that outputs a lot of logs with the k8s-file log driver. Allow it to stop completely.
  2. Run podman logs --follow on the container

I've provided a reproducer at https://gist.github.com/UniversalSuperBox/031f34ecb51c5bd04cb559542d9ba519. You can run 1test.sh and have it run podman logs until it fails, or just run podman logs --follow --latest | tail -n 1:

core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
scelerisque semper tortor. Sed commodo libero id posuere facilisis. Cras
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1

core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1

core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
torquent per conubia nostra, per inceptos himenaeos. Nullam elementum
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
Morbi id elit ultrices, pulvinar odio sit amet, elementum lectus.
core@localhost:~/stuff$ podman logs --follow strange_shockley | tail -n 1
leo ac rutrum ullamcorper. Phasellus ut ultricies ligula. Phasellus

Describe the results you received

The length of the log differs on every run of podman logs --follow

Describe the results you expected

The entire log is output.

podman info output

host:
  arch: arm64
  buildahVersion: 1.33.5
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc39.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 98.99
    systemPercent: 0.45
    userPercent: 0.56
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: coreos
    version: "39"
  eventLogger: journald
  freeLocks: 2017
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
    uidmap:
    - container_id: 0
      host_id: 502
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
  kernel: 6.7.5-200.fc39.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 2676797440
  memTotal: 4084744192
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-1.fc39.aarch64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-1.fc39.aarch64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.3-1.fc39.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.3
      commit: 1961d211ba98f532ea52d2e80f4c20359f241a98
      rundir: /run/user/502/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240220.g1e6f92b-1.fc39.aarch64
    version: |
      pasta 0^20240220.g1e6f92b-1.fc39.aarch64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/502/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc39.aarch64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 0h 50m 30.00s
  variant: v8
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 5
    paused: 0
    running: 0
    stopped: 5
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 106769133568
  graphRootUsed: 13598543872
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 62
  runRoot: /run/user/502/containers
  transientStore: false
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 4.9.3
  Built: 1708357248
  BuiltTime: Mon Feb 19 09:40:48 2024
  GitCommit: ""
  GoVersion: go1.21.7
  Os: linux
  OsArch: linux/arm64
  Version: 4.9.3

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

No response

Additional information

The issue occurs regardless of rootfulness.

This issue occurs whenever the log size gets above a certain point, that point seems dependent on hardware. Seems to be a race condition where the stop point is based on a certain time spent in the log reader, not an amount of data output. The issue occurs over the API, too, so you can use curl with the --limit-rate command to test that. When the rate is smaller, the amount of text downloaded is also smaller.

Command for flavor, overwrites binary-output2.bin in the current directory:

curl \
    --output 'binary-output2.bin' \
    --no-buffer \
    -XGET \
    --limit-rate=1000K
    --unix-socket "/run/user/${uid}/podman/podman.sock" \
    "http://localhost/v5.0.0/libpod/containers/${container}/logs?follow=True&stderr=True&stdout=True"
@UniversalSuperBox UniversalSuperBox added the kind/bug Categorizes issue or PR as related to a bug. label Mar 1, 2024
Copy link

github-actions bot commented Apr 1, 2024

A friendly reminder that this issue had no activity for 30 days.

@pspacek
Copy link

pspacek commented Jun 13, 2024

I confirm this observation. Observed on podman 5.1.1 running on Arch Linux x86_64.

@Luap99
Copy link
Member

Luap99 commented Jun 26, 2024

@UniversalSuperBox Thanks for the script, I managed to hit it after 1000 iterations.

Looking at our code I cannot see anything in particular wrong but we use github.com/nxadm/tail for the tailing functionality and there is nxadm/tail#67 reported so I assume this is a bug over there.

@Luap99
Copy link
Member

Luap99 commented Jun 26, 2024

Actually running more tests it seem to be releated to nxadm/tail#37 I think

@Luap99
Copy link
Member

Luap99 commented Jun 26, 2024

To a lot longer than it should have been but nxadm/tail#71 should fix the issue.
Locally I am at 40000 iterations now without seeing the bug. Without the patch I was hitting it after 100-1000 iterations

@Luap99 Luap99 self-assigned this Jun 26, 2024
@rhatdan
Copy link
Member

rhatdan commented Jun 26, 2024

Hopefully nxadm/tail is still active, last commit was 8 months ago.

@pspacek
Copy link

pspacek commented Sep 16, 2024

Is it time yet to fork nxadm/tail? Unreliable tail breaks automation scripts left and right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants