-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rook + Ceph clusters do not work on OKD releases 4.12.0-0.okd-2023-02-04-212953 and 4.11.0-0.okd-2023-01-14-152430 #1505
Comments
That's the result of https://bugzilla.redhat.com/show_bug.cgi?id=2159066 see also: The fix is already in okd-machine-os: Just need a new OKD build that includes it (4.12.0-0.okd-2023-02-11-023427 or newer) to pass CI and make it to stable for a long-term fix. I'm on 4.12.0-0.okd-2023-01-21-055900 with kernel 6.0.15-300.fc37.x86_64 in the meantime. |
Thanks for this issue and the root cause! Then it would be really nice there is a 4.12 release with a DIRECT upgrade path from the latest 'Ceph' working 4.11, which is the one before last if I am not mistaken. |
I believe there is a direct upgrade path to 4.12.0-0.okd-2023-01-21-055900 - or at least there was when I took it, lol. Has that edge since been blocked? I imagine that will be ironed out for the next release that should land this weekend. |
This should be resolved in https://github.com/okd-project/okd/releases/tag/4.12.0-0.okd-2023-02-18-033438 Since we now use layering you could have built a custom OS image with updated kernel - see https://github.com/vrutkovs/custom-okd-os/blob/main/drbd/Dockerfile for instance |
https://docs.okd.io/4.12/post_installation_configuration/coreos-layering.html So this I wasn't aware of. I'm going to give this a shot tonight before updating to the new release, just to see how it works! |
@vrutkovs Are you sure it's included in https://github.com/okd-project/okd/releases/tag/4.12.0-0.okd-2023-02-18-033438? This release uses Am I missing something? |
I see in fedora-coreos-config I see in the submodule's HEAD commit that we should be on 6.1.10 here However....
|
Oh, sorry, we're still using the FCOS from January (a bad commit sneaked in - openshift/okd-machine-os@e83e32a). openshift/okd-machine-os#532 would fix it |
+1 For priority on this as it has serious impact on us and it breaks ceph completely for us. Community should also consider either:
Unless it is possible somehow for Rook/Ceph to make change on their end? Otherwise everybody using rook+ceph will be stuck completely without being able to upgrade as there will be no stable upgrade path. I am sure you guys know it but still feels right to highlight it. Thanks all for working on this. |
New OKD 4.12 nightly should be based on FCOS 37.20230205.3.0 and have kernel 6.1.9-200.fc37.x86_64 with the fix. I'll add upgrade edges from 4.11 to the next 4.12 stable As for a workaround which can be applied before upgrade - I don't know if its possible, this is a kernel issue so its not easy to workaround |
@vrutkovs Thanks for letting us know. I can confirm that upgrading to version 4.12.0-0.okd-2023-03-03-055825 fixed all the issues regarding rook ceph cluster and volumes are mounting again. I used the following command to upgrade directly from 4.11.0-0.okd-2023-01-14-152430: $ oc adm upgrade --allow-explicit-upgrade --allow-upgrade-with-warnings --to-image registry.ci.openshift.org/origin/release@sha256:a2e94c433f8747ad9899bd4c9c6654982934595a42f5af2c9257b351786fd379 |
Perfect, thank you. We'll release a new stable over the weekend then |
I could successfully upgrade the affected cluster to the released version (with some machine-config-daemon hand-holding) and have a stable ceph cluster. I guess this can be closed. |
Hello! I just tried installing rook-ceph version 1.11.0 on OKD 4.12.0-0.okd-2023-03-05-022504 and fedora coreos 37.20230205.3.0 and see the errors still. Kernel version on the storage nodes is 6.0.18-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:10:00 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
My bad, I will try with 37.20230303.2.0 instead |
Are there any News on it? Can I use rook on OpenShift/OKD 4.12? |
I have no problems with the current OKD 4.12 version. |
Hi everybody, |
Describe the bug
Rook + Ceph clusters stop functioning or greatly degrade in performance on the OKD releases 4.12.0-0.okd-2023-02-04-212953 and 4.11.0-0.okd-2023-01-14-152430. I'm opening this ticket to serve as a tracking issue for OKD specifically, as it seems others have opened several tickets and discussions elsewhere and I couldn't find one here.
Version
4.12.0-0.okd-2023-02-04-212953 and 4.11.0-0.okd-2023-01-14-152430
How reproducible
Pretty much 100% of the time. Symptoms include many/all PGs going inactive, slow i/o on a cluster that was previously performing fine, components like RGW and CSI mounts stop functioning, probably other stuff too.
Current workaround
As of 2023-02-11, it seems the only workaround is to downgrade the cluster to a previous version, which seems to fix things.
Related issues and discussions
The text was updated successfully, but these errors were encountered: