machine: Volume ops test: statfs /private/tmp/ci/ginkgoNNN: no such file or directory #22569

edsantiago · 2024-05-01T20:09:31Z

Not quite as frequent or annoying as #22551, but still causing wasted runs:

run basic podman commands
  Volume ops
....
  Trying to pull quay.io/libpod/alpine_nginx:latest...
  ...
  Writing manifest to image destination
  WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64)
  Error: statfs /private/tmp/ci/ginkgo630638030: no such file or directory

darwin : machine-mac podman darwin rootless host sqlite
- PR ExitWithError() - pod_xxx tests #22552
  - 05-01 12:16 in run basic podman commands Volume ops
- PR ExitWithError() - yet more low-hanging fruit #22489
  - 04-24 17:00 in run basic podman commands Volume ops
  - 04-24 12:06 in run basic podman commands Volume ops
- PR ExitWithError() - more low-hanging fruit #22486
  - 04-24 10:09 in run basic podman commands Volume ops
- PR fix(deps): update module github.com/docker/docker to v26.1.0+incompatible #22461
  - 04-22 18:43 in run basic podman commands Volume ops

x	x	x	x	x	x
machine-mac(5)	podman(5)	darwin(5)	rootless(5)	host(5)	sqlite(5)

The text was updated successfully, but these errors were encountered:

edsantiago · 2024-05-02T20:24:51Z

@cevich is there any chance whatsoever that https://github.com/containers/podman/blob/c9644ebccf14309a77769cba00833cd139509e4a/contrib/cirrus/mac_cleanup.sh is getting invoked in the middle of a running CI job? I just can't understand this bug and am grasping at straws.

Luap99 · 2024-05-03T11:15:10Z

Unlikely, there is a extra level of indirectness here given the dir is mounted in the machine VM. As such maybe the machine mount failed silently?

cevich · 2024-05-03T19:21:36Z

is there any chance whatsoever that

What Paul said. And the Mac's are single-task/single-user. Is it possible the running VM really is x86_64 via some emulation and/or is the pull command specifying --arch or --platform (just double-checking)?

Sorry no hack/get_ci_vm.sh support here, that's just way to complex with this environment to do safely. But in case it helps and is supported (I never checked), the re-run in terminal may be an option (with cleanup temporarily disabled).

Otherwise, there is a way to isolate (for a few hours) one of the Macs and dedicate it to servicing a single PR. In that PR, the end-of-task cleanup could be disabled, so that a human may ssh in and check out the state of things. This is all manual, and a bit of a chore to pull off, but it's technically possible.

Luap99 · 2024-05-15T11:29:05Z

@edsantiago Not sure if you are testing machine in your non flake retry testing PR but if you do could you give this a go:

diff --git a/pkg/machine/apple/apple.go b/pkg/machine/apple/apple.go
index 93201407e..04db7638b 100644
--- a/pkg/machine/apple/apple.go
+++ b/pkg/machine/apple/apple.go
@@ -124,7 +124,7 @@ func GenerateSystemDFilesForVirtiofsMounts(mounts []machine.VirtIoFs) ([]ignitio
        mountPrep.Add("Service", "Type", "oneshot")
        mountPrep.Add("Service", "ExecStartPre", "chattr -i /")
        mountPrep.Add("Service", "ExecStart", "mkdir -p '%f'")
-       mountPrep.Add("Service", "ExecStopPost", "chattr +i /")
+       // mountPrep.Add("Service", "ExecStopPost", "chattr +i /")
 
        mountPrep.Add("Install", "WantedBy", "remote-fs.target")
        mountPrepFile, err := mountPrep.ToString()

edsantiago · 2024-05-15T11:35:15Z

Oops, no, I long ago disabled machine tests in #17831. I will look into reenabling this one.

FWIW here's the current flake list. I don't think there's any useful info in this list, i.e., I haven't seen any logs that look different or provide interesting new data, but am posting anyway.

darwin : machine-mac podman darwin rootless host sqlite
- PR libpod: wait for healthy on main thread #22658
  - 05-13 16:18 in run basic podman commands Volume ops
- PR ExitWithError() - s files #22582
  - 05-02 20:13 in run basic podman commands Volume ops
- PR ExitWithError() - pod_xxx tests #22552
  - 05-01 12:16 in run basic podman commands Volume ops
- PR ExitWithError() - yet more low-hanging fruit #22489
  - 04-24 17:00 in run basic podman commands Volume ops
  - 04-24 12:06 in run basic podman commands Volume ops
- PR ExitWithError() - more low-hanging fruit #22486
  - 04-24 10:09 in run basic podman commands Volume ops
- PR fix(deps): update module github.com/docker/docker to v26.1.0+incompatible #22461
  - 04-22 18:43 in run basic podman commands Volume ops

x	x	x	x	x	x
machine-mac(7)	podman(7)	darwin(7)	rootless(7)	host(7)	sqlite(7)

Luap99 · 2024-05-15T11:46:07Z

The alternative is I instrument the tests to do some checks. Basically I it would have to ssh into the machine VM and run systemctl status on all the mount units. I think the race here is the most likely cause.

One interesting point would be the new machine init with volume test, if this never fails then I am sure this is a race due the parallel running chattr -i and chattr +i in different units. Reason this tests mounts only one path so there cannot be a race, however the default volumes are several paths thus the chance for the race.

github-actions · 2024-06-15T00:07:37Z

A friendly reminder that this issue had no activity for 30 days.

Luap99 · 2024-06-15T08:11:51Z

@edsantiago Any conclusions?

edsantiago · 2024-06-18T19:16:34Z

No. This isn't something I can look into, and our PR merge rate is too low these days; not many flakes to report.

edsantiago · 2024-06-18T19:30:22Z

However, I just ran my afternoon flake catchup, and here's a new one

Luap99 · 2024-06-19T12:12:07Z

I was mostly interested if you ever saw it in the no flake retry PR (edsantiago@28882ca)

However as I have a mac now I can try to reproduce locally and see where I go from there.

edsantiago · 2024-06-25T11:47:19Z

Still active

darwin : machine-mac podman darwin rootless host sqlite
- PR pkg/machine/e2e: Remove unnecessary copy of machine image. #23068
  - 06-21 15:03 in run basic podman commands Volume ops
- PR cirrus.yml: implement skips based on source changes #23030
  - 06-19 09:12 in run basic podman commands Volume ops
- PR Update module github.com/checkpoint-restore/checkpointctl to v1.2.1 #23021
  - 06-17 17:01 in run basic podman commands Volume ops
- PR New Windows makefile (winmake.ps1) targets and Windows build documentation update #22913
  - 06-06 08:52 in run basic podman commands Volume ops

Luap99 · 2024-06-26T16:59:21Z

So I have been running this script all day without luck. So either my script is wrong or I was not able to reproduce.

#!/bin/bash

set -e

while :; do
dirs=()
for i in {1..20}; do dir="$TMPDIR$i"; dirs+=($dir); mkdir -p $dir; done
args=() 
for dir in "${dirs[@]}"; do args+=("--volume" "$dir:$dir"); done
podman machine init --now "${args[@]}"
podman machine ssh ls "${dirs[@]}"
podman machine ssh mount | grep $TMPDIR
podman machine ssh systemctl list-units --failed | grep fail && break

podman machine rm -f
done

I did manage to hit a quay.io flake though

Error: reading manifest sha256:a7775864b05f6402c7ca071446f8a50ce94e456e85c3dbe8d94b3a8bf2a2c81d in quay.io/podman/machine-os: authentication required

So my best bet is to try to instrument the CI tests to give out more logs when it happens.

edsantiago · 2024-06-27T11:01:56Z

I still suspect this is related to the weird Macintosh CI setup, and as such will only trigger in CI. But I have no evidence to back that up.

Luap99 · 2024-06-27T11:12:19Z

Well my PR did triggered it first try so I can tell you now that the bug is in the VM
https://api.cirrus-ci.com/v1/artifact/task/4925116956540928/html/machine-mac-podman-darwin-rootless-host-sqlite.log.html#t--run-basic-podman-commands-Volume-ops--1

Luap99 · 2024-06-27T12:26:19Z

#23118 should contain a proper fix now

One problem on FCOS is that the root directory is immutable, as such in order to mount arbitrary paths from the host we must make it mutable again and create these dir on boot in order to be able to mount there. The current logic was racy as it used one unit for each path and they all did chattr -i /; mkdir -p $path; chattr -i / and systemd can run these units in parallel. That means it was possible for another unit to make / immutable before the unit could do the mkdir. I pointed this out on the original PR[1] but we never followed up on it... Now this here changes several things. First have one unit that does the chattr -i / (immutable-root-off.service), it is hooked into remote-fs-pre.target which means it is executed before the network mounts (virtiofs) are done. Then we have another unit that does chattr +i / (immutable-root-on.service) which turn the immutable root back on after remote-fs.target which means all mount are done at this point. Additionally the automount unit is removed because it does not add any value for us and it was borken anyway as it used the virtiofs tag as path so systemd just ignored it. [1] containers#20612 (comment) Fixes containers#22569 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

edsantiago added flakes Flakes from Continuous Integration macos MacOS (OSX) related machine labels May 1, 2024

github-actions bot added the stale-issue label Jun 15, 2024

github-actions bot removed the stale-issue label Jun 16, 2024

Luap99 mentioned this issue Jun 27, 2024

apple virtiofs: fix racy mount setup #23118

Merged

Luap99 self-assigned this Jun 27, 2024

openshift-merge-bot bot closed this as completed in #23118 Jun 27, 2024

openshift-merge-bot bot closed this as completed in fdb736d Jun 27, 2024

Luap99 mentioned this issue Jul 17, 2024

On MacOS with krunkit installed, the run basic podman commands [It] Volume ops test fails #23296

Closed

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2024

stale-locking-app bot locked as resolved and limited conversation to collaborators Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

machine: Volume ops test: statfs /private/tmp/ci/ginkgoNNN: no such file or directory #22569

machine: Volume ops test: statfs /private/tmp/ci/ginkgoNNN: no such file or directory #22569

edsantiago commented May 1, 2024

edsantiago commented May 2, 2024

Luap99 commented May 3, 2024

cevich commented May 3, 2024

Luap99 commented May 15, 2024

edsantiago commented May 15, 2024

Luap99 commented May 15, 2024

github-actions bot commented Jun 15, 2024

Luap99 commented Jun 15, 2024

edsantiago commented Jun 18, 2024

edsantiago commented Jun 18, 2024

Luap99 commented Jun 19, 2024

edsantiago commented Jun 25, 2024

Luap99 commented Jun 26, 2024

edsantiago commented Jun 27, 2024

Luap99 commented Jun 27, 2024

Luap99 commented Jun 27, 2024

machine: Volume ops test: statfs /private/tmp/ci/ginkgoNNN: no such file or directory #22569

machine: Volume ops test: statfs /private/tmp/ci/ginkgoNNN: no such file or directory #22569

Comments

edsantiago commented May 1, 2024

edsantiago commented May 2, 2024

Luap99 commented May 3, 2024

cevich commented May 3, 2024

Luap99 commented May 15, 2024

edsantiago commented May 15, 2024

Luap99 commented May 15, 2024

github-actions bot commented Jun 15, 2024

Luap99 commented Jun 15, 2024

edsantiago commented Jun 18, 2024

edsantiago commented Jun 18, 2024

Luap99 commented Jun 19, 2024

edsantiago commented Jun 25, 2024

Luap99 commented Jun 26, 2024

edsantiago commented Jun 27, 2024

Luap99 commented Jun 27, 2024

Luap99 commented Jun 27, 2024