Future of Power CI under P10/PowerVM #2473

ravanelli · 2021-10-01T15:14:54Z

I'm creating this issue for us to have a common place to discuss the next steps for Power CI. So, we can get more insights from multiarch folks around, and decide the best way to more forward.

With gangplank we are improving our CI to create a more multi-arch world for FCOS/RHCOS/Cosa, and also to resolve eliminate some issues as duplicated CIs around. The arm64 was successfully added, and now we are looking for Power and s390x to be part of this beautiful world.

Unfortunately, there are some strugglers with Power looking for the future. As we know P10 dropped baremetal support (PowerVM only) as RHEL9 also dropped support for kvm on Power.

Our entire ci is based on qemu/kvm. It will be really hard to change it to accommodate only Power.

Recently, I was trying to enable gangplank remote in Power, using a server provided for IBM in IBM cloud. Nonetheless, this server is a P9 using PowerVM, and here is where we can start to feel the issues working with PowerVM/kvm.

I reached to folks in IBM to understand better the options we here, and the feedback I got so far is:

kvm_pr has not been supported for a long time, and Red Hat removed it from the tree from RHEL a few months ago (should be available on RHEL 8.4). There's no upstream support neither.
As for TCG, pseries+tcg works on PowerVM without problems. The problem is that it is considerably slower than pseries+kvm. Not officially supported by IBM/Red Hat, only upstream support is provided.

I also able to build fcos with a couple of TGC warning . However, --basic-qemu-scenarios was kept running for more than 1 hour with no results back.
kvm_hv never ran on PowerVM. Maybe... could be plans to make this happen, but it depends on the roadmap for PowerVM.

Looking for these scenarios looks we are not really able to run kvm under a PowerVm.

More details:
https://bugzilla.redhat.com/show_bug.cgi?id=2008271

The text was updated successfully, but these errors were encountered:

dustymabe · 2021-10-01T15:46:13Z

Thank you for writing this up @ravanelli.

Looking for these scenarios looks we are not really able to run kvm under a PowerVm.

Ouch.. That really breaks our existing model and will force us to carry quite the delta just to add that architecture.

mkumatag · 2021-10-01T18:01:37Z

cc @clnperez @manojnkumar

laggarcia · 2021-10-01T21:02:21Z

Here it is a summary of the discussion we had with Renata on this topic. If I got something wrong, please, let me know, as I am not knowledgeable on COSA/FCOS/RHCOS.

The CI infrastructure controller you have today run on an x86 environment. At some point in the process, this controller will contact a Power server to actually build the Power images and run basic build verification tests on them. There are two requirements on the Power server so that it can seamlessly integrate with your infrastructure:

It needs to have a public IP address.
It needs to support running KVM/QEMU guests, as the built images will be tested by launching a KVM/QEMU VM on the Power server.

In order to fulfill these requirements, you will have to run your build process on a POWER9 bare metal machine. You will need to find one that is available with a public IP address. Given that is available, you should have no issues in running the build process on that machine and spawning VMs with the built image to do your basic verification of the build process.

Availability of a Power10 system with KVM support should not be an impediment here. The build process usually targets old processor versions because of compatibility and support reasons. Just as an example, IIRC, RHEL 8 is built targeting POWER8 processors as it needs to run on both POWER8 and Power9 processors. So, for the foreseeable future, using a Power9 bare metal machine to build the FCOS image and test the build process with KVM should be enough. This environment will be supported for many years to come yet.

Please, let me know in case you have any additional questions on this.

ravanelli · 2021-10-01T21:58:14Z

Thanks @laggarcia for all the discussion related to this topic.

Right now, we don't have any bare metal Power server around with public ip access, to allow us to continue with the FCOS improvements for Power. Unless we can find it, there is no other option but to wait.

jcajka · 2021-10-04T11:37:52Z

@laggarcia my understanding has been that FCOS CI/pipeline requires openstack/aws/ocp(nowadays it should be just the first two) like cloud infra and is not really able to work with stable VMs/hosts. @dustymabe please correct me if I'm wrong.
@ravanelli We should have around kvm based power9 VMs that can be provided(if there is no issue with them being outside of the Fedora infra), hosted at Brno University of Technology. Possibly even one whole bare metal p8 box. AFAIK nested kvm should work there.

ravanelli · 2021-10-04T15:25:37Z

@jcajka How reliable is the support for the Brno University? I tried to use the minicloud in Unicamp, but lack of support is really an issue there. I had to wait more than a month to get a firmware update.

clnperez · 2021-10-04T18:35:14Z

You can also get an openstack environment from OSU: https://osuosl.org/services/powerdev/request_hosting/. I've only ever requested standalone VMs, but have had very good stability and support from them. Not suggesting over Brno, but if we need another option that's one to consider. I believe this project falls under the "Free and Open Source" restriction.

dustymabe · 2021-10-04T19:28:46Z

@laggarcia my understanding has been that FCOS CI/pipeline requires openstack/aws/ocp(nowadays it should be just the first two) like cloud infra and is not really able to work with stable VMs/hosts. @dustymabe please correct me if I'm wrong.

We can work with a single bare metal machine and talk to it over SSH. That's what we're doing currently for aarch64

jcajka · 2021-10-05T08:22:32Z

@dustymabe cool, good to know. I still assumed that it is in aws was essential for various reasons, mostly redeployment, etc.
@ravanelli what are your expectations, requirements? Most of issues, if there are solutions(new FW) available from the HW vendor, I can probably resolve under a week(I'm one of the admins there). But formally it is not commercial offering, so best effort.

clnperez · 2021-11-15T22:00:05Z

Can we pick this conversation back up? We're getting a couple of new ping from customers about OKD.

mtarsel · 2024-09-11T21:20:59Z

So I have built the Fedora CoreOS images for ppc64le using a Power10 Rainier using firmware 1060.10 with Fedora 40 using kernel version 6.10.7-200.fc40.ppc64le. KVM has been enabled from the HMC and the kvm_hv module is loaded.

I thought this issue would be the best place to update my status about this effort but I am available on slack to discuss next steps if that's easier.

I followed the instructions from the docs…

Ran build.sh
create new dir
cosa init fcos-url
cosa fetch; cosa build

This machine is using Legacy Compatibility interrupt mode which is referred to as XICS in QEMU. As such, the following warning happens when running the tests:

qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off

Currently KVM on LPAR doesnt support native XIVE, so qemu doesnt have kernel-irq support which means the KVM interrupt controller is turned off. Suggested flags would be to use something like

qemu-system-ppc64 -accel kvm -machine pseries,ic-mode=xics

I ran the tests like this

cosa kola run --parallel 4

however for the past couple weeks i have not been able to get a complete test run. The tests stall and im not sure how to further debug this.

In my build dir, the ./tmp/kola/reports dir is empty and in test.tap I see:


[root@f40-de kola]# cat test.tap 
1..89
ok - ext.config.networking.ifname-karg.udev-rule-firstboot-propagation
ok - ext.config.networking.nameserver
ok - fcos.users.shells
ok - coreos.unique.boot.failure
ok - ext.config.gshadow
ok - ext.config.boot.grub2-install
ok - ext.config.var-mount.luks

Is there another output dir where the tests would be stored?
Is there an existing deny-list for ppc64le?

Additionally, Oregon State University Open Source Lab (OSU OSL) does have Power10 machines available that will have kvm enabled. I’m hoping to replicate this setup at OSU on an LPAR and this could provide us with a p10 kvm setup without a vpn to run tests long term. More info

jlebon · 2024-09-13T01:22:23Z

Thank you for working on this!

This machine is using Legacy Compatibility interrupt mode which is referred to as XICS in QEMU. As such, the following warning happens when running the tests:
qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off
Currently KVM on LPAR doesnt support native XIVE, so qemu doesnt have kernel-irq support which means the KVM interrupt controller is turned off. Suggested flags would be to use something like

qemu-system-ppc64 -accel kvm -machine pseries,ic-mode=xics

Yeah, we've seen that warning for a while now and haven't dug into it. Feel free to submit a patch to choose the right set of arguments based on $factors.

I ran the tests like this

cosa kola run --parallel 4

however for the past couple weeks i have not been able to get a complete test run. The tests stall and im not sure how to further debug this.

In my build dir, the ./tmp/kola/reports dir is empty and in test.tap I see:
[root@f40-de kola]# cat test.tap 
1..89
ok - ext.config.networking.ifname-karg.udev-rule-firstboot-propagation
ok - ext.config.networking.nameserver
ok - fcos.users.shells
ok - coreos.unique.boot.failure
ok - ext.config.gshadow
ok - ext.config.boot.grub2-install
ok - ext.config.var-mount.luks
Is there another output dir where the tests would be stored? Is there an existing deny-list for ppc64le?

Which tests stall? You should see log files under e.g. tmp/kola/qemu-latest/. You can upload that directory.

dustymabe mentioned this issue Oct 1, 2021

Produce official ppc64le architecture artifacts coreos/fedora-coreos-tracker#987

Closed

clnperez mentioned this issue Nov 15, 2021

Add "Build OKD for ppc64le" proposal openshift/enhancements#722

Closed

cgwalters mentioned this issue Sep 12, 2022

RHCOS 4.12 ppc64le - Fatal glibc error: CPU lacks ISA 3.00 support (POWER9 or later required) openshift/os#1000

Closed

cgwalters mentioned this issue Nov 13, 2023

Relationship with https://github.com/osbuild/osbuild/pull/1402 osbuild/bootc-image-builder#4

Closed

travier added enhancement New feature or request jira for syncing to jira labels Jan 25, 2024

mtarsel mentioned this issue Sep 19, 2024

[ppc64le] Remove P8 and P9 hack so tests pass for Power10 #3887

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future of Power CI under P10/PowerVM #2473

Future of Power CI under P10/PowerVM #2473

ravanelli commented Oct 1, 2021

dustymabe commented Oct 1, 2021

mkumatag commented Oct 1, 2021

laggarcia commented Oct 1, 2021

ravanelli commented Oct 1, 2021

jcajka commented Oct 4, 2021

ravanelli commented Oct 4, 2021 •

edited

Loading

clnperez commented Oct 4, 2021

dustymabe commented Oct 4, 2021

jcajka commented Oct 5, 2021 •

edited

Loading

clnperez commented Nov 15, 2021

mtarsel commented Sep 11, 2024

jlebon commented Sep 13, 2024

Future of Power CI under P10/PowerVM #2473

Future of Power CI under P10/PowerVM #2473

Comments

ravanelli commented Oct 1, 2021

dustymabe commented Oct 1, 2021

mkumatag commented Oct 1, 2021

laggarcia commented Oct 1, 2021

ravanelli commented Oct 1, 2021

jcajka commented Oct 4, 2021

ravanelli commented Oct 4, 2021 • edited Loading

clnperez commented Oct 4, 2021

dustymabe commented Oct 4, 2021

jcajka commented Oct 5, 2021 • edited Loading

clnperez commented Nov 15, 2021

mtarsel commented Sep 11, 2024

jlebon commented Sep 13, 2024

ravanelli commented Oct 4, 2021 •

edited

Loading

jcajka commented Oct 5, 2021 •

edited

Loading