Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert /tmp to tmpfs #340

Merged
merged 1 commit into from
Apr 24, 2024
Merged

Conversation

edsantiago
Copy link
Member

@edsantiago edsantiago commented Mar 26, 2024

Podman really needs /tmp to be tmpfs, to detect and
handle reboots. Although there are (at this time) no
reboots involved in CI testing, it's still important
for CI hosts to reflect something close to a real-world
environment. And, there is work underway to check /tmp:

containers/podman#22141

This PR removes special-case Fedora code that was
disabling a tmpfs /tmp mount. History dates back to
PR #30 back in 2020.

Some of the image-build code in this repo performs
reboots and relies on persistent tmp files, so you'll
note a flurry of /tmp -> /var/tmp changes.

And, as a drive-by, document the Windows Chocolatey
install command. Link to Best Practices, and explain
why we disregard some of those.

Signed-off-by: Ed Santiago santiago@redhat.com

@edsantiago
Copy link
Member Author

Retried three times, it's not a flake:

�[0;32m    win-server-wsl:  - golang - golang not installed. The package was not found with the source(s) listed.�[0m

Any Windows experts know what's going on with Chocolaty and Golang?

@edsantiago
Copy link
Member Author

Ha ha. Now it's failing on git:

�[1;31m==> win-server-wsl:     + FullyQualifiedErrorId : Could not install package git�[0m

So, yeah, must be a flake, I'll just keep hammering on rerun.

@edsantiago
Copy link
Member Author

The git flake resolved itself, now it's back to golang failing. There's one difference which I think might be important, having to do with versions vs no-versions:

git flake:

win-server-wsl: Unable to resolve dependency: \
    Unable to find a version of 'git.install' that is compatible with 'git 2.44.0 constraint: \
    git.install (= 2.44.0)'.

go failure:

win-server-wsl: golang not installed. The package was not found with the source(s) listed.
(no version string at all)

Maybe the git flake was like a dnf cache delay, where packages just weren't refreshed in repos? git 2.44 is a month old, but maybe it only just got built for windows today. And maybe the golang problem is that golang just got completely removed from the chocolate thing?

@edsantiago
Copy link
Member Author

@containers/podman-maintainers I give up. This needs a Windows expert.

@jwhonce
Copy link
Member

jwhonce commented Mar 26, 2024

@l0rd Any ideas?

Copy link

Cirrus CI build successful. Found built image names and IDs:

Stage Image Name IMAGE_SUFFIX
base debian do-not-use
base fedora do-not-use
base fedora-aws do-not-use
base fedora-aws-arm64 do-not-use
base image-builder do-not-use
base prior-fedora do-not-use
cache build-push c20240326t173017z-f39f38d13
cache debian c20240326t173017z-f39f38d13
cache fedora c20240326t173017z-f39f38d13
cache fedora-aws c20240326t173017z-f39f38d13
cache fedora-netavark c20240326t173017z-f39f38d13
cache fedora-netavark-aws-arm64 c20240326t173017z-f39f38d13
cache fedora-podman-aws-arm64 c20240326t173017z-f39f38d13
cache fedora-podman-py c20240326t173017z-f39f38d13
cache prior-fedora c20240326t173017z-f39f38d13
cache rawhide c20240326t173017z-f39f38d13
cache win-server-wsl c20240326t173017z-f39f38d13

Copy link

Cirrus CI build successful. Found built image names and IDs:

Stage Image Name IMAGE_SUFFIX
base debian do-not-use
base fedora do-not-use
base fedora-aws do-not-use
base fedora-aws-arm64 do-not-use
base image-builder do-not-use
base prior-fedora do-not-use
cache build-push c20240326t173017z-f39f38d13
cache debian c20240326t173017z-f39f38d13
cache fedora c20240326t173017z-f39f38d13
cache fedora-aws c20240326t173017z-f39f38d13
cache fedora-netavark c20240326t173017z-f39f38d13
cache fedora-netavark-aws-arm64 c20240326t173017z-f39f38d13
cache fedora-podman-aws-arm64 c20240326t173017z-f39f38d13
cache fedora-podman-py c20240326t173017z-f39f38d13
cache prior-fedora c20240326t173017z-f39f38d13
cache rawhide c20240326t173017z-f39f38d13
cache win-server-wsl c20240326t173017z-f39f38d13

@edsantiago
Copy link
Member Author

YAY! Passed after enough retries!

debian prior-fedora fedora fedora-aws rawhide
base 13 38-1.6 39-1.5 ? 41-0
kernel 6.7.9-2 6.7.9-100 6.7.10-200 6.7.9-200 6.9.0-0.rc0.20240322git8e938e398669.14
6.9.0-0.rc0.20240318gitf6cef5f8c37f.10 ⇑
aardvark-dns 1.4.0-5 1.10.0-1 1.10.0-1 1.10.0-1 1.10.0-1
netavark 1.4.0-4 1.10.3-1 1.10.3-1 1.10.3-1 1.10.3-3
1.10.3-2 ⇑
buildah 1.33.5+ds1-4+b1 1.34.0-1 1.35.1-1 1.35.0-1 1.35.0-1
1.33.5+ds1-4 ⇑
conmon 2.1.10+ds1-1+b1 2.1.10-1 2.1.10-1 2.1.10-1 2.1.10-1
container-selinux ? 2.230.0-1 2.230.0-1 2.230.0-1 2.230.0-1
2.228.1-1 ⇑
containers-common ? 1-89 1-99 1-99 0.58.0-4
0.58.0-1 ⇑
criu 3.17.1-3 3.18-1 3.19-2 3.19-2 3.19-4
crun 1.14.4-1 1.14.4-1 1.14.4-1 1.14.4-1 1.14.4-1
golang 2:1.22~3 1.21.8-1 1.21.8-1 1.21.8-1 1.22.1-1
gvisor-tap-vsock ? 0.7.3-1 0.7.3-1 0.7.3-1 0.7.3-2
nmap-ncat 7.94+git20230807.3be01efb1+dfsg-3+b1 7.93-2 7.94-1 7.94-1 7.94-1
passt 2024-03-20 2024-03-20 2024-03-20 2024-03-20 2024-03-20
2024-02-20 ⇑
podman 4.9.3+ds1-1+b1 4.9.3-2 4.9.3-1 4.9.3-1 5.0.0~rc6-2
4.9.3+ds1-1 ⇑
runc 1.1.12+ds1-2 1.1.12-1 1.1.12-1 1.1.12-1 1.1.12-3
skopeo 1.13.3+ds1-2+b1 1.15.0-1 1.15.0-1 1.14.2-1 1.14.2-2
1.13.3+ds1-2 ⇑ 1.14.2-1 ⇑
slirp4netns 1.2.1-1+b1 1.2.2-1 1.2.2-1 1.2.2-1 1.2.2-2
systemd 255.4-1+b1 253.17-1 254.10-1 254.10-1 255.4-1
253.15-2 ⇑
tar 1.34+dfsg-1.2+deb12u1 1.34-8 1.35-2 1.35-2 1.35-3

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@l0rd
Copy link
Member

l0rd commented Mar 27, 2024

@l0rd Any ideas?

Packages golang and git are available from the default source (community.chocolatey.org) and the community services were operational yesterday. The troubleshooting guide suggests to check the source but we use the default one so that doesn't look to be the problem. Adding --debug --verbose --noop to the choco install command may help to get more information next time.

Anyway chocolately scripting best practices suggest to use a slightly different command to install packages:

-- choco install -y --allow-downgrade --execution-timeout=300 $pkg
++ choco upgrade $pkg -y --source="'https://community.chocolatey.org/api/v2'" --allow-downgrade --execution-timeout=300

I have opened a PR to change that but I don't think that this has anything to do with the problems above though.

@rhatdan
Copy link
Member

rhatdan commented Mar 27, 2024

/approve
/lgtm

@edsantiago
Copy link
Member Author

/hold

Let's not merge until we see if this works. My two Podman PRs with these VMs have stuck in CI, twice. I don't know if it's a Cirrus problem or a problem with the VMs.

@edsantiago
Copy link
Member Author

@cevich there's something broken somewhere. "Build for debian" is consistently hanging, not even getting started. So is Validate f39. AFAICS there is no indication of why they won't run. How can I debug this?

@edsantiago edsantiago force-pushed the tmp-should-be-tmpfs branch from 01cbbbd to 9e51093 Compare March 27, 2024 14:27
@cevich
Copy link
Member

cevich commented Mar 27, 2024

How can I debug this?

They're stuck in 'scheduled' mode. Off-hand, these are container-based Cirrus-CI compute-based tasks. We're not out of credits 🤔 It's running now, so only thing I can think of is: Some kind of Cirrus-CI hiccup.

@edsantiago
Copy link
Member Author

Not sure what you mean by "it's running now". Maybe you're referring to "this PR, the /tmp-tmpfs one in automation_images, is running now"? If so, that has nothing to do with anything. The not-running thing is two podman PRs using images I built yesterday. The running-now thing is maybe this PR (the one on which I'm commenting), which I re-pushed in desperation. Maybe yesterday's VMs didn't work because of the Windows outage. I dunno. I'll see what happens in a few minutes.

Copy link

Cirrus CI build successful. Found built image names and IDs:

Stage Image Name IMAGE_SUFFIX
base debian do-not-use
base fedora do-not-use
base fedora-aws do-not-use
base fedora-aws-arm64 do-not-use
base image-builder do-not-use
base prior-fedora do-not-use
cache build-push c20240327t142743z-f39f38d13
cache debian c20240327t142743z-f39f38d13
cache fedora c20240327t142743z-f39f38d13
cache fedora-aws c20240327t142743z-f39f38d13
cache fedora-netavark c20240327t142743z-f39f38d13
cache fedora-netavark-aws-arm64 c20240327t142743z-f39f38d13
cache fedora-podman-aws-arm64 c20240327t142743z-f39f38d13
cache fedora-podman-py c20240327t142743z-f39f38d13
cache prior-fedora c20240327t142743z-f39f38d13
cache rawhide c20240327t142743z-f39f38d13
cache win-server-wsl c20240327t142743z-f39f38d13

@edsantiago
Copy link
Member Author

  • 20240327t142743z-f39f38d13
  • 20240102t155643z-f39f38d13 ⇑
debian prior-fedora fedora fedora-aws rawhide
base 13 38-1.6 39-1.5 ? 41-0
40-0 ⇑
kernel 6.7.9-2 6.7.10-100 6.7.10-200 6.7.10-200 6.9.0-0.rc1.17
aardvark-dns 1.4.0-5 1.10.0-1 1.10.0-1 1.10.0-1 1.10.0-1
1.9.0-1 ⇑ 1.9.0-1 ⇑ 1.9.0-1 ⇑
netavark 1.4.0-4 1.10.3-1 1.10.3-1 1.10.3-1 1.10.3-3
1.9.0-1 ⇑ 1.9.0-1 ⇑ 1.9.0-1 ⇑
buildah 1.33.7+ds1-1 1.34.0-1 1.35.1-1 1.35.0-1 1.35.0-1
1.32.2+ds1-1 ⇑ 1.33.2-1 ⇑ 1.33.2-1 ⇑ 1.33.2-1 ⇑ 1.33.2-1 ⇑
conmon 2.1.10+ds1-1+b1 2.1.10-1 2.1.10-1 2.1.10-1 2.1.10-1
2.1.6+ds1-1 ⇑ 2.1.8-2 ⇑ 2.1.8-2 ⇑ 2.1.8-2 ⇑ 2.1.8-2 ⇑
container-selinux ? 2.230.0-1 2.230.0-1 2.230.0-1 2.230.0-1
2.226.0-1 ⇑ 2.226.0-1 ⇑ 2.226.0-1 ⇑ 2.226.0-1 ⇑
containers-common ? 1-89 1-99 1-99 0.58.0-5
1-95 ⇑ 1-95 ⇑ 1-101 ⇑
criu 3.17.1-3 3.18-1 3.19-2 3.19-2 3.19-4
3.19-2 ⇑
crun 1.14.4-1 1.14.4-1 1.14.4-1 1.14.4-1 1.14.4-1
1.12-1 ⇑ 1.12-1 ⇑ 1.12-1 ⇑ 1.12-1 ⇑ 1.12-1 ⇑
golang 2:1.22~3 1.21.8-1 1.21.8-1 1.21.8-1 1.22.1-4
2:1.21~2 ⇑ 1.20.12-1 ⇑ 1.21.5-1 ⇑ 1.21.5-1 ⇑ 1.21.5-1 ⇑
gvisor-tap-vsock ? 0.7.3-1 0.7.3-1 0.7.3-1 0.7.3-2
0.7.1-1 ⇑ 0.7.1-1 ⇑ 0.7.1-1 ⇑ 0.7.1-1 ⇑
nmap-ncat 7.94+git20230807.3be01efb1+dfsg-3+b1 7.93-2 7.94-1 7.94-1 7.94-1
7.94+git20230807.3be01efb1+dfsg-2 ⇑ 7.93-3 ⇑ 7.93-3 ⇑ 7.93-4 ⇑
passt 2024-03-26 2024-03-20 2024-03-26 2024-03-20 2024-03-26
2023-12-04 ⇑ 2023-12-04 ⇑ 2023-12-04 ⇑ 2023-12-04 ⇑
podman 4.9.3+ds1-1+b1 4.9.3-2 4.9.4-1 4.9.3-1 5.0.0-1
4.7.2+ds1-2 ⇑ 4.7.2-1 ⇑ 4.8.2-1 ⇑ 4.8.1-1 ⇑ 4.8.1-1 ⇑
runc 1.1.12+ds1-2 1.1.12-1 1.1.12-1 1.1.12-1 1.1.12-3
1.1.10+ds1-1 ⇑ 1.1.8-1 ⇑ 1.1.8-1 ⇑ 1.1.8-1 ⇑ 1.1.9-1 ⇑
skopeo 1.13.3+ds1-2+b1 1.15.0-1 1.15.0-1 1.14.2-1 1.14.2-2
1.13.3+ds1-2 ⇑ 1.14.0-1 ⇑ 1.14.0-1 ⇑ 1.14.0-1 ⇑ 1.14.0-1 ⇑
slirp4netns 1.2.1-1+b1 1.2.2-1 1.2.2-1 1.2.2-1 1.2.2-2
1.2.1-1 ⇑ 1.2.2-1 ⇑
systemd 255.4-1+b1 253.17-1 254.10-1 254.10-1 255.4-1
255.2-3 ⇑ 255.1-1 ⇑
tar 1.34+dfsg-1.2+deb12u1 1.34-8 1.35-2 1.35-2 1.35-3
1.35+dfsg-2 ⇑ 1.35-2 ⇑

@cevich
Copy link
Member

cevich commented Mar 27, 2024

Not sure what you mean by "it's running now"

Oh my bad, I assumed those links were tasks in this repo. I see they're in podman. I just saw a re-push and now the replacement tasks are running over there. This still seems like it was some temporary Cirrus-infra. hiccup.

@cevich
Copy link
Member

cevich commented Mar 27, 2024

BTW: I would strongly recommend testing these images beyond just podman CI. Esp. buildah and skopeo.

@edsantiago
Copy link
Member Author

@cevich nope, still broken. (That link right is to a podman debian-13 job, it is entirely gray, nothing happening on it whatsoever). Help please.

@cevich
Copy link
Member

cevich commented Mar 27, 2024

nope, still broken.

Gah! Hmmmm 🤔

@edsantiago
Copy link
Member Author

Debian grub changed. Not by much, but apparently enough to cause a boot failure.

debian prior-fedora fedora fedora-aws rawhide
grub2-common 2.12-1+b1 2.06-116 2.06-118 2.06-118 2.06-119
2.12~rc1-12 ⇑ 2.06-110 ⇑

@cevich
Copy link
Member

cevich commented Mar 27, 2024

Cause (not root): Boot failure


[0m[30m[47mWelcome to GRUB!
 
 
[0m[37m[40m[0m[30m[40m[2J[01;01H[0m[37m[40m[H[J[1;1Herror: file `/boot/grub/x86_64-efi/bli.mod' not found.
 
error: file `/boot/grub/x86_64-efi/bli.mod' not found.

IIRC, we're using a custom /usr/bin/version_find_latest for debian (see base_images/debian_base-setup.sh), could that somehow be involved?

Clearly the base-image is bootable (since cache-image stage is using it and not failing). So it must be something going wrong there. Anything jump out at you in the build logs?

@edsantiago edsantiago force-pushed the tmp-should-be-tmpfs branch from 9e51093 to df0afa7 Compare March 27, 2024 16:29
Copy link

Cirrus CI build successful. Found built image names and IDs:

Stage Image Name IMAGE_SUFFIX
base debian do-not-use
base fedora do-not-use
base fedora-aws do-not-use
base fedora-aws-arm64 do-not-use
base image-builder do-not-use
base prior-fedora do-not-use
cache build-push c20240327t162918z-f39f38d13
cache debian c20240327t162918z-f39f38d13
cache fedora c20240327t162918z-f39f38d13
cache fedora-aws c20240327t162918z-f39f38d13
cache fedora-netavark c20240327t162918z-f39f38d13
cache fedora-netavark-aws-arm64 c20240327t162918z-f39f38d13
cache fedora-podman-aws-arm64 c20240327t162918z-f39f38d13
cache fedora-podman-py c20240327t162918z-f39f38d13
cache prior-fedora c20240327t162918z-f39f38d13
cache rawhide c20240327t162918z-f39f38d13
cache win-server-wsl c20240327t162918z-f39f38d13

@edsantiago edsantiago marked this pull request as draft April 23, 2024 15:06
@edsantiago
Copy link
Member Author

And, back to draft. This must not merge until containers/podman#22207 merges.

@Luap99
Copy link
Member

Luap99 commented Apr 24, 2024

@edsantiago Did you test this out on all the other repos as well? skopeo, common, image, storage...
I think this has the risk to break on other repos as well and catching it only on the next (maybe urgent) update would be painful.

@edsantiago
Copy link
Member Author

I will start opening PRs now on other repos.

edsantiago added a commit to edsantiago/skopeo that referenced this pull request Apr 24, 2024
For the last long time, Fedora CI VMs have had a disk /tmp.
Real-world setups typically have tmpfs /tmp. This switches
to CI VMs that reflect the real world.

See containers/automation_images#340

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/storage that referenced this pull request Apr 24, 2024
For the last long time, Fedora CI VMs have had a disk /tmp.
Real-world setups typically have tmpfs /tmp. This switches
to CI VMs that reflect the real world.

See containers/automation_images#340

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/containers-common that referenced this pull request Apr 24, 2024
For the last long time, Fedora CI VMs have had a disk /tmp.
Real-world setups typically have tmpfs /tmp. This switches
to CI VMs that reflect the real world.

See containers/automation_images#340

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/aardvark-dns that referenced this pull request Apr 24, 2024
For the last long time, Fedora CI VMs have had a disk /tmp.
Real-world setups typically have tmpfs /tmp. This switches
to CI VMs that reflect the real world.

See containers/automation_images#340

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/netavark that referenced this pull request Apr 24, 2024
For the last long time, Fedora CI VMs have had a disk /tmp.
Real-world setups typically have tmpfs /tmp. This switches
to CI VMs that reflect the real world.

See containers/automation_images#340

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/netavark that referenced this pull request Apr 24, 2024
For the last long time, Fedora CI VMs have had a disk /tmp.
Real-world setups typically have tmpfs /tmp. This switches
to CI VMs that reflect the real world.

See containers/automation_images#340

Signed-off-by: Ed Santiago <santiago@redhat.com>
@edsantiago
Copy link
Member Author

repo status
podman pass
buildah pass
skopeo pass (except for blue-robot)
c-image pass
c-storage pass
c-common fails lint, for reasons unlikely to be my fault
aardvark FAIL integration - apparently a flake. Also fails all blue-robots
netavark FAIL - some expected (nc bug), some not (integration)

@Luap99
Copy link
Member

Luap99 commented Apr 24, 2024

Thanks @edsantiago, I comfortable merging then. I assume you want the podman PR merged first?

@edsantiago edsantiago marked this pull request as ready for review April 24, 2024 13:01
@edsantiago
Copy link
Member Author

I will merge this, because I'm now more confident that the podman one is likely to merge, and that there are no problems (yet) with the other containers repos. Thank you!

@edsantiago edsantiago merged commit cf72ba2 into containers:main Apr 24, 2024
39 checks passed
@edsantiago edsantiago deleted the tmp-should-be-tmpfs branch April 24, 2024 13:02
edsantiago added a commit to edsantiago/containers-common that referenced this pull request Apr 24, 2024
For the last long time, Fedora CI VMs have had a disk /tmp.
Real-world setups typically have tmpfs /tmp. This switches
to CI VMs that reflect the real world.

See containers/automation_images#340

Signed-off-by: Ed Santiago <santiago@redhat.com>
@cevich
Copy link
Member

cevich commented Apr 24, 2024

Thanks for all your efforts on this @edsantiago and @Luap99 😃

mtrmac pushed a commit to containers/image that referenced this pull request Apr 24, 2024
For the last long time, Fedora CI VMs have had a disk /tmp.
Real-world setups typically have tmpfs /tmp. This switches
to CI VMs that reflect the real world.

See containers/automation_images#340

Signed-off-by: Ed Santiago <santiago@redhat.com>
@edsantiago edsantiago mentioned this pull request Apr 29, 2024
edsantiago added a commit to edsantiago/buildah that referenced this pull request Apr 30, 2024
For the last long time, Fedora CI VMs have had a disk /tmp.
Real-world setups typically have tmpfs /tmp. This switches
to CI VMs that reflect the real world.

See containers/automation_images#340

Signed-off-by: Ed Santiago <santiago@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants