Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On MacOS with krunkit installed, the run basic podman commands [It] Volume ops test fails #23296

Closed
cevich opened this issue Jul 16, 2024 · 14 comments · Fixed by #24163
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. machine stale-issue

Comments

@cevich
Copy link
Member

cevich commented Jul 16, 2024

Issue Description

On MacOS with krunkit installed, the run basic podman commands [It] Volume ops test fails

Steps to reproduce the issue

Steps to reproduce the issue

  1. On MacOS, brew tap slp/krunkit, brew install krunkit.
  2. Remove conflicting symlink rm -vf /opt/homebrew/bin/vfkit (ref. PR)
  3. brew tap cfergeau/crc then brew install vfkit
  4. Clone the podman repo
  5. export CONTAINERS_MACHINE_PROVIDER="libkrun"
  6. make localmachine

Describe the results you received

Example annotated log

Describe the results you expected

Test should pass

podman info output

Ref. related build CI task: https://cirrus-ci.com/task/5136030032986112

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Mac setup PR: Not in production use at the time this issue was open.

Additional information

Happens every run.

@cevich cevich added the kind/bug Categorizes issue or PR as related to a bug. label Jul 16, 2024
@rhatdan
Copy link
Member

rhatdan commented Jul 17, 2024

@slp FYI

@Luap99 Luap99 added the machine label Jul 17, 2024
@Luap99
Copy link
Member

Luap99 commented Jul 17, 2024

Testing locally I see that mounts are supported and the default mounts should be the same with applehv so I don't see a reason why this should fail with krun

@slp
Copy link
Contributor

slp commented Jul 17, 2024

/Users/cevichTesting-0-worker/ci/task-4648064202309632/bin/darwin/podman -r run -v /private/tmp/ci/ginkgo1303331317:/test:Z quay.io/libpod/alpine_nginx ls /test/attr-test-file
  Trying to pull quay.io/libpod/alpine_nginx:latest...
  Getting image source signatures
  Copying blob sha256:d2c7362ca710ad35a846a34571a7c3450ea3cce04efcbcb4d3af276eda154ade
  Copying blob sha256:df9b9388f04ad6279a7410b85cedfdcb2208c0a003da7ab5613af71079148139
  Copying blob sha256:71895e83ea49901b7b752bbf3ca19a54148a5f4ab5fdff3dca9bcd59d44c59e3
  Copying config sha256:ecea49d99daa5bd62ebaef1338f6bc4c948bf2651b139160404f9c1c48fcd85c
  Writing manifest to image destination
  WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64)
  Error: statfs /private/tmp/ci/ginkgo1303331317: no such file or directory

How is /private/tmp/ci/ginkgo1303331317 created? Is the test code publicly available somewhere?

@Luap99
Copy link
Member

Luap99 commented Jul 17, 2024

@slp on podman main, this is what I am using.

$ export CONTAINERS_MACHINE_PROVIDER=libkrun
$ TMPDIR=/private/tmp make localmachine FOCUS="Volume ops"

But this works for me so there is something special with the CI setup and likely not related to krun.

Possible the flake was not fixed or somehow special for libkrun: #22569

@cevich
Copy link
Member Author

cevich commented Jul 17, 2024

Possible the flake was not fixed or somehow special for libkrun

Correct, I've never seen this during my recent libkrun testing, when CONTAINERS_MACHINE_PROVIDER=. With libkrun, it's not a flake, it fails 100% of the time.

@cevich
Copy link
Member Author

cevich commented Jul 17, 2024

In case it matters, in this CI environment:

  • We're running as a regular user w/ any admin permissions.
  • $TMPDIR=/private/tmp/ci and $HOME=/home/$USER/ci.
  • A local SSD volume (root) is mounted on both $TMPDIR and /home/$USER.

Some details:

cevichTesting-0:~ ec2-user$ id cevichTesting-0-worker
uid=502(cevichTesting-0-worker) gid=20(staff) groups=20(staff),12(everyone),61(localaccounts),701(com.apple.sharepoint.group.1),100(_lpoperator)
cevichTesting-0:~ ec2-user$ stat /private/tmp/ci
  File: /private/tmp/ci
  Size: 128             Blocks: 0          IO Block: 4096   directory
Device: 1,31    Inode: 2           Links: 4
Access: (1770/drwxrwx--T)  Uid: (  502/cevichTesting-0-worker)   Gid: (   20/   staff)
Access: 2024-07-17 13:05:04.288840817 +0000
Modify: 2024-07-17 13:05:04.487994354 +0000
Change: 2024-07-17 13:05:04.487994354 +0000
 Birth: 2024-07-16 19:22:41.627913185 +0000
cevichTesting-0:~ ec2-user$ stat /Users/cevichTesting-0-worker/
  File: /Users/cevichTesting-0-worker/
  Size: 224             Blocks: 0          IO Block: 4096   directory
Device: 1,32    Inode: 2           Links: 7
Access: (0750/drwxr-x---)  Uid: (  502/cevichTesting-0-worker)   Gid: (   20/   staff)
Access: 2024-07-16 19:22:43.015498650 +0000
Modify: 2024-07-16 19:40:44.975385767 +0000
Change: 2024-07-16 19:40:44.975385767 +0000
 Birth: 2024-07-16 19:22:43.015498650 +0000

@cevich
Copy link
Member Author

cevich commented Jul 17, 2024

How is /private/tmp/ci/ginkgo1303331317 created? Is the test code publicly available somewhere?

I believe it's created in the test here:

https://github.com/containers/podman/blob/36bab759b25621ed459ed3c662aa70e27e2e90a6/pkg/machine/e2e/basic_test.go#L65

@slp
Copy link
Contributor

slp commented Jul 17, 2024

@cevich I think we're going to need some debugging to dig deeper into this issue. Is it possible to connect to the CI machine to run some tests? It'd be interesting doing a manual podman machine init && podman machine start, and then connecting to the VM with podman machine ssh and checking if the volumes /private and /Users are exposed to the guest.

@cevich
Copy link
Member Author

cevich commented Jul 17, 2024

Is it possible to connect to the CI machine

Yes, and in fact I'm using a Mac that's already isolated from the rest of our system. @slp I messaged you on slack.

@slp
Copy link
Contributor

slp commented Jul 19, 2024

Thanks to @cevich help, we were able to debug this issue. containers/libkrun#209 fixes this.

@cevich
Copy link
Member Author

cevich commented Jul 19, 2024

Thanks @slp for "getting your hands dirty" and figuring it out.

FYI- I'll be off on PTO next week, so won't be enabling libkrun testing in Podman until I return. There are changes needed on the Mac's used for CI and I don't want to risk breaking something while I'm away 😉

Luap99 added a commit to Luap99/libpod that referenced this issue Aug 6, 2024
Same issues as in the volume ops test, the libkrun volume is not working
properly (containers#23296).

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Copy link

A friendly reminder that this issue had no activity for 30 days.

Luap99 added a commit to Luap99/libpod that referenced this issue Sep 18, 2024
Let's see if this works now.

Fixes containers#23296

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@Luap99
Copy link
Member

Luap99 commented Sep 19, 2024

@slp Is there a new krunkit version with the fix we can use?

@slp
Copy link
Contributor

slp commented Oct 1, 2024

@slp Is there a new krunkit version with the fix we can use?

Yes, it also includes the ability to increase the SHM window of virtio-gpu for running larger AI models. We're testing it now and we'll send a PR to podman by the end of the week.

slp added a commit to slp/podman that referenced this issue Oct 7, 2024
Remove the skips introduced to work around containers#23296

Signed-off-by: Sergio Lopez <slp@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. machine stale-issue
Projects
None yet
4 participants