Cirrus: Bump Fedora to release 35 #11795

cevich · 2021-09-29T20:49:52Z

What this PR does / why we need it:

Update the "Fedora" VM used in testing to version 35 (beta), and the "Prior Fedora" version to 34. Stop testing with Fedora 33. Also resolve a TODO in setup - the needed packages are now bundled into the images.

NOTE: The Fedora 35 image is now based on a BTRFS root filesystem.

How to verify it

CI will pass

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Depends on: #11955 and #11979 and ##12110 and #12121 or #12120 and #12060 and #12162 and #12342 and #12343 and containers/automation_images#93

cevich · 2021-09-30T14:29:11Z

Sorry, forgot to mark this as WIP/Draft. There are some major challenges that need solving: Namely the kernel shouldn't panic when compiling podman 😢

cevich · 2021-10-01T15:01:16Z

@lsm5 I'm getting test-rpm build errors on the new F35 VM. Do you know how to fix these?

cevich · 2021-10-04T15:16:33Z

For int podman fedora-35 root and rootless, the tests are hanging. Both appear to be getting stuck on a checkpoint test (log).

Edit: My mistake, not "rootless" but "container" integration.

cevich · 2021-10-04T15:29:26Z

@adrianreber ping - any insights on the checkpoint hang?

adrianreber · 2021-10-04T15:37:57Z

@adrianreber ping - any insights on the checkpoint hang?

rootless tests should not run any checkpoint tests at all. The root test hangs while trying to establish a TCP connection to the container. I can run it locally on f35 tomorrow to see if I can reproduce it.

Not sure why it should hang with f35 right now:

                // Open a network connection to the redis server via initial port mapping
                conn, err := net.Dial("tcp", "localhost:1234")
                if err != nil {
                        os.Exit(1)
                }
                conn.Close()

Most checkpoint tests seem to be running. Just as an FYI, if all checkpoint tests start to fail: CRIU needs something like chattr +C /var/lib/containers to work with BTRFS.

cevich · 2021-10-04T15:47:03Z

CRIU needs something like chattr +C /var/lib/containers to work with BTRFS.

BTRFS will be the stock/default for F35 VMs, so should we preemptively change that at the packaging level? Or is that just a caution / might happen warning?

lsm5 · 2021-10-04T15:48:19Z

@lsm5 I'm getting test-rpm build errors on the new F35 VM. Do you know how to fix these?

disable debuginfo in specfile, set %global debug_package %{nil} . I thought this was disabled already in a prior PR from @Luap99 . But maybe not?

adrianreber · 2021-10-04T15:49:06Z

CRIU needs something like chattr +C /var/lib/containers to work with BTRFS.

BTRFS will be the stock/default for F35 VMs, so should we preemptively change that at the packaging level? Or is that just a caution / might happen warning?

Just that it might happen. If it works without the chattr, you can leave it as it is.

Luap99 · 2021-10-04T15:51:08Z

disable debuginfo in specfile, set %global debug_package %{nil} . I thought this was disabled already in a prior PR from @Luap99 . But maybe not?

My PR is not merged yet (#11565).

cevich · 2021-10-04T15:55:17Z

Okay great, thanks @lsm5 and @Luap99, and no rush merging that, there are plenty of other issues to address in this PR in the meantime.

cevich · 2021-10-06T16:00:29Z

I can run it locally on f35 tomorrow to see if I can reproduce it.

@adrianreber were you able to reproduce the hang at all?

adrianreber · 2021-10-06T16:03:25Z

No, unfortunately not. It just works in my VM.

cevich · 2021-10-06T20:12:54Z

Darn.

cevich · 2021-10-11T20:49:44Z

New images: I see a problem on F34, where / diskspace is not being expanded as it should. Oddly, it appears that F35 BTRFS / is being expanded. Both are experiencing test failures/hangs 😢

cevich · 2021-10-12T17:35:48Z

Second time in a row the podman checkpoint and restore container with different port mappings hangs on F35. I'm going to try and instrument the code to see where the hang is happening.

cevich · 2021-10-12T19:47:50Z

@adrianreber I instrumented the test with a bunch of fmt.Printf() and ran it manually. It seems to be hanging on the conn, err := net.Dial("tcp", "localhost:1234"). Should that be wrapped in some kind of timeout, and shouldn't it be connecting to port 6379?

adrianreber · 2021-10-13T06:54:52Z

@adrianreber I instrumented the test with a bunch of fmt.Printf() and ran it manually. It seems to be hanging on the conn, err := net.Dial("tcp", "localhost:1234"). Should that be wrapped in some kind of timeout, and shouldn't it be connecting to port 6379?

The timeout would be a good idea. The port is correct with 1234 because that test is trying to handle different port mappings after restore. I am confused, however, I do not see the broken test being run. There is still a timeout but I am also not sure why.

Can you also try a change like this (in addition to your timeout change):

                conn, err := net.Dial("tcp", "localhost:1234")
-               if err != nil {
-                       os.Exit(1)
-               }
+               Expect(err).To(BeNil())

The Exit(1) is probably a bad idea anyway. I also had a look at the audit log but it does not seem to be a SELinux denial.

edsantiago · 2021-11-18T18:34:10Z

They're flakes. One of the common ones. Every so often registry.redhat.io goes into a mode where it demands a login even for search. It's usually fixed in a few hours.

cevich · 2021-11-18T18:42:44Z

They're flakes. One of the common ones. Every so often registry.redhat.io goes into a mode where it demands a login even for search. It's usually fixed in a few hours.

Yeah I've seen that before too, and guessed it's what we were seeing. Thanks for confirming.

Hey, on that topic: What's your opinion on me adding a check to the Ext. services task (runs very early), to verify this? It would save people wasting time re-running due to a known issue we literally can't do anything about (AFAIK). I'm on the fence about it vs getting "some" useful testing completed.

Signed-off-by: Chris Evich <cevich@redhat.com>

These tasks run earlier on, so it's useful to have more detail about the test VM (in general) in case something goes terribly wrong. Signed-off-by: Chris Evich <cevich@redhat.com>

During initial testing of Fedora 35beta VM images in CI, the bindings task was timing out. In order to allow time for collection of system details (logs), execution needs to timeout earlier than the task. Under normal conditions, the bindings test finishes in about 10-minutes. Use the ginkgo timeout option to limit execution, so it times out after 30 minutes. Also add the `-progress` option so the output more closely resembles how ginkgo runs the integration tests. Signed-off-by: Chris Evich <cevich@redhat.com>

@edsantiago

Massive thanks to @edsantiago for tracking this down. Ref: containers#12175 Signed-off-by: Chris Evich <cevich@redhat.com>

In F35 the hard-coded default (from containers-common-1-32.fc35.noarch) is 'journald' despite the upstream repository having this line commented-out. Containerized integration tests cannot run with 'journald' as there is no daemon/process there to receive them. Signed-off-by: Chris Evich <cevich@redhat.com>

This reverts commit f35d7f4. Signed-off-by: Chris Evich <cevich@redhat.com>

VM Images created as of this commit contain the new/required version. Remove the `--force` install, but retain the hack script's ability to support this in the future. Signed-off-by: Chris Evich <cevich@redhat.com>

The Fedora 35 cloud images have switched to UEFI boot with a GPT partition. Formerly, all Fedora images included support for runtime re-partitioning. However, the requirement to test alternate storage has since been dropped/removed. Rather than maintain a disused feature, and supporting scripts, these Fedora VM images have reverted to the default: Automatically resize to 100% on boot. Signed-off-by: Chris Evich <cevich@redhat.com>

edsantiago · 2021-11-18T18:51:14Z

Well, it doesn't fit into the host/port model, so some hackery would be needed to try "podman search", which gets us into chicken/egg territory.

Thinking on it some more, though, I rarely see all int tests failing because of this. It's more typical to see just one or two. I don't know what that means about the nature of the flake, but from the perspective of CI, I think an early check would not be accurate enough: it could easily false-negative as well as false-positive.

cevich · 2021-11-18T18:51:43Z

Y'okay! Rebased/force-pushed "once last time" (note: this may not be the real "last time") 😁

it could easily false-negative as well as false-positive.

Yeah...that sounds anti-simple to deal with, esp. in a "Required by everything" check.

vrothberg · 2021-11-18T18:56:52Z

Fingers crossed 🤞 Thanks for your patience and your long breath pushing this forward, @cevich !

cevich · 2021-11-18T19:06:07Z

Thanks for your patience and your long breath pushing this forward

I absolutely could not have done this without the support from...basically everybody. So thank YOU!

I pray we never have to upgrade any OS, ever again...or at least for another 6 months 😭

Seriously though, I've been wondering recently if all these fresh-new gray-hairs are worth it: Concentrating all the pain into one PR for the stability of all/most other PRs. Or maybe we should just run with rawhide and continuous runtime updates to share the instability "love" to every PR - but with rapid "fixes" available (in theory).

Seems to me, no matter how we do CI...everyone still always hates it 😆

cevich · 2021-11-18T21:00:16Z

Ugh, two Ubuntu failures. Naively assuming they're flakes and re-running.

edsantiago · 2021-11-18T22:14:10Z

/lgtm
/hold

Quick, before CI changes its mind! (Tests are green as of the moment I'm writing this. I would not be in the least bit surprised if they turn red somehow in the next few minutes, just out of orneriness).

rhatdan · 2021-11-18T22:57:31Z

/approve
/hold cancel

openshift-ci · 2021-11-18T22:57:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cevich, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rhatdan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cevich · 2021-11-19T00:22:17Z

just out of orneriness

They are imbibed with the pure essential nature of "ornery" and will require every magic spell to be set right 😞

cevich force-pushed the update_to_f35 branch 2 times, most recently from 24d631a to 8d852c0 Compare September 29, 2021 20:52

cevich marked this pull request as draft September 30, 2021 13:46

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 30, 2021

cevich changed the title ~~Cirrus: Bump Fedora to release 35 (beta)~~ WIP: Cirrus: Bump Fedora to release 35 (beta) Sep 30, 2021

containers deleted a comment from rhatdan Sep 30, 2021

cevich force-pushed the update_to_f35 branch 3 times, most recently from 1887c4d to 4ed90cb Compare October 1, 2021 14:46

cevich force-pushed the update_to_f35 branch from 4ed90cb to 962f644 Compare October 9, 2021 17:37

cevich force-pushed the update_to_f35 branch 2 times, most recently from cd39642 to 9744f90 Compare October 12, 2021 18:09

cevich force-pushed the update_to_f35 branch from 9744f90 to 18a6740 Compare October 12, 2021 20:03

cevich added 8 commits November 18, 2021 13:50

Minor Makefile fix

d6d1ce9

Signed-off-by: Chris Evich <cevich@redhat.com>

Cirrus: Log more things in bindings and unit tests

3aa7076

These tasks run earlier on, so it's useful to have more detail about the test VM (in general) in case something goes terribly wrong. Signed-off-by: Chris Evich <cevich@redhat.com>

Cirrus: Fix bindings test hang b/c logging config mismatch

f3021f3

Massive thanks to @edsantiago for tracking this down. Ref: containers#12175 Signed-off-by: Chris Evich <cevich@redhat.com>

Revert "Cirrus: Temp. disable prior-fedora testing"

226be65

This reverts commit f35d7f4. Signed-off-by: Chris Evich <cevich@redhat.com>

Cirrus: Partially revert catatonit --force install

3ee2d23

VM Images created as of this commit contain the new/required version. Remove the `--force` install, but retain the hack script's ability to support this in the future. Signed-off-by: Chris Evich <cevich@redhat.com>

cevich force-pushed the update_to_f35 branch from 8a2620a to 7f52bd8 Compare November 18, 2021 18:50

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2021

openshift-ci bot assigned edsantiago Nov 18, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 18, 2021

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2021

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2021

openshift-merge-robot merged commit de39241 into containers:main Nov 18, 2021

edsantiago mentioned this pull request Nov 22, 2021

podman exec into a "-it" container: container create failed (no logs from conmon): EOF #10927

Open

cevich deleted the update_to_f35 branch April 18, 2023 14:47

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 30, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cirrus: Bump Fedora to release 35 #11795

Cirrus: Bump Fedora to release 35 #11795

cevich commented Sep 29, 2021 •

edited

Loading

cevich commented Sep 30, 2021

cevich commented Oct 1, 2021

cevich commented Oct 4, 2021 •

edited

Loading

cevich commented Oct 4, 2021

adrianreber commented Oct 4, 2021

cevich commented Oct 4, 2021

lsm5 commented Oct 4, 2021 •

edited

Loading

adrianreber commented Oct 4, 2021

Luap99 commented Oct 4, 2021

cevich commented Oct 4, 2021

cevich commented Oct 6, 2021

adrianreber commented Oct 6, 2021

cevich commented Oct 6, 2021

cevich commented Oct 11, 2021

cevich commented Oct 12, 2021

cevich commented Oct 12, 2021 •

edited

Loading

adrianreber commented Oct 13, 2021

edsantiago commented Nov 18, 2021

cevich commented Nov 18, 2021

edsantiago commented Nov 18, 2021

cevich commented Nov 18, 2021 •

edited

Loading

vrothberg commented Nov 18, 2021

cevich commented Nov 18, 2021

cevich commented Nov 18, 2021

edsantiago commented Nov 18, 2021

rhatdan commented Nov 18, 2021

openshift-ci bot commented Nov 18, 2021

cevich commented Nov 19, 2021 •

edited

Loading

Cirrus: Bump Fedora to release 35 #11795

Cirrus: Bump Fedora to release 35 #11795

Conversation

cevich commented Sep 29, 2021 • edited Loading

What this PR does / why we need it:

How to verify it

Which issue(s) this PR fixes:

Special notes for your reviewer:

cevich commented Sep 30, 2021

cevich commented Oct 1, 2021

cevich commented Oct 4, 2021 • edited Loading

cevich commented Oct 4, 2021

adrianreber commented Oct 4, 2021

cevich commented Oct 4, 2021

lsm5 commented Oct 4, 2021 • edited Loading

adrianreber commented Oct 4, 2021

Luap99 commented Oct 4, 2021

cevich commented Oct 4, 2021

cevich commented Oct 6, 2021

adrianreber commented Oct 6, 2021

cevich commented Oct 6, 2021

cevich commented Oct 11, 2021

cevich commented Oct 12, 2021

cevich commented Oct 12, 2021 • edited Loading

adrianreber commented Oct 13, 2021

edsantiago commented Nov 18, 2021

cevich commented Nov 18, 2021

edsantiago commented Nov 18, 2021

cevich commented Nov 18, 2021 • edited Loading

vrothberg commented Nov 18, 2021

cevich commented Nov 18, 2021

cevich commented Nov 18, 2021

edsantiago commented Nov 18, 2021

rhatdan commented Nov 18, 2021

openshift-ci bot commented Nov 18, 2021

cevich commented Nov 19, 2021 • edited Loading

cevich commented Sep 29, 2021 •

edited

Loading

cevich commented Oct 4, 2021 •

edited

Loading

lsm5 commented Oct 4, 2021 •

edited

Loading

cevich commented Oct 12, 2021 •

edited

Loading

cevich commented Nov 18, 2021 •

edited

Loading

cevich commented Nov 19, 2021 •

edited

Loading