-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tolerance of pre-existing software-RAID #38
base: master
Are you sure you want to change the base?
Conversation
…for user Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
Adding disk vendor and model would be useful, but can take too much horizontal space, especially for good old 80x25 resolution, so we use a shorter string with just the disk size for disambiguation. Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
Reasons to have this here include: * don't require changes to srpm when the set of installed files change * make installer development faster by deploying directly to a filesystem This installation process still depends by default on the sm package being installed, but through SM_ROOT variable it can now be run on a dev machine where this is not the case, it just needs an unpacked version of the package. Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
mdadm output is like "MD_DEVNAME=127 which results in getHumanDiskName giving a name of "RAID: 127(sda,sdb)". This fixes the problem and gets us "RAID: md127(sda,sdb)" instead. Also slightly improve readability with tuple unpacking. Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
This prevents 65-md-incremental.rules from assembling every RAID it identifies, we want to have control over that, so we can reuse a former RAID member for other purposes. Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
This causes the "Back" button in primary-disk selection to trigger a rescan for products. This provides support for allowing the user to enable layers of block-device composition (e.g. "md" software RAID, not part of this PR) and then scan those RAID volumes for products. This could even be used for arbitrary manual block-device setup from shell on alternate console. Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
It has value for a user expecting an existing product to get notified that none was found. Moreover, this will allow a user to get to the "assemble RAID" choice even when no product was detected. Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
… RAID This adds an option to enable *all* RAID devices found. This is less flexible than a more selective mechanism would be, but really simple and covering all known use-cases. It filters out from choices presented to the user both "update of" and "restore to" entries refering to a disk that is part of a md RAID (though in practice only the "update" case is likely to appear). This avoids the case where a user gets those entries before explicitly assembling RAIDs, and thus avoids breaking a RAID by mistake. Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
…er to assemble RAID This commit is kept as separate until decided this is the way to go, but would be better squashed. Introduce installer feature flags, and hide the RAID-assembling feature behind the `raid-assemble` flag.
…ound This will avoid issues where users attempt to upgrade from a very old version, and a dom0 kernel driver evolution changed the /dev/disk/by-id/ identifier for the disk, and when the backup became invalid after a hardware change. Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
@@ -229,7 +229,7 @@ def get_installation_type(answers): | |||
# RAID members, for filtering out from upgradable-products and | |||
# backups, and to decide whether to propose to activate existing RAID. | |||
raid_members = [] | |||
if "assemble-raid" not in answers: | |||
if constants.HAS_RAID_ASSEMBLE and "assemble-raid" not in answers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature flag to enable from presenting the "assemble RAID" option also disables the safety measure of filtering out upgrade/restore entries that the user should not select, as they indeed live on a RAID volume. It is I think a bad idea to allow the user to do this, but then hiding those options without letting the user assemble RAID volumes looks like a bad idea too. I would rather just let the behavior added by this PR be the default, and just not add this feature flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As XenServer doesn't support installing on software RAID at all, it is better to keep the feature flag to hide the "assemble RAID" option. The button would be too confusing for users in a XenServer context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands, there is a problem in the chaining of screens: we rely on LEFT_BACKWARDS
from get_installation_type()
to bring us back to the (new) scan_existing()
step, which
- prevents getting back to EULA / hardware warnings screens
- in the case where the
overwrite_warning()
step would trigger (likely rare, but still), the rescan after RAID assembly would not even trigger
This is summarized in this sketch:
Our proposal to address this is to bring all of existing-product scanning, overwrite warning, and RAID activation, inside the the get_installation_type()
step, so we can have REPEAT_STEP
do the rescan before re-presenting choices, and LEFT_BACKWARDS
behave as expected.
We identified one (small) downside: getting back from the primary-disk selection screen will trigger a rescan before bringing us back to the installation-type selection.
This is summarized in this sketch:
Since this change is a bit disruptive, we're looking for feedback before implementing it.
This feature is one we would like in XCP-ng 8.3. As usual, we favour a common design and implementation over XCP-ng-specific patches. Could someone at XenServer review our above proposition (last comment) so that we may quickly decide how to move forward? |
The biggest problem is likely to be that XenServer doesn't support software RAID at all and explicitly refuses to install on software RAID. This is also quite a sizeable chunk of code, which would not be getting tested within a XenServer environment so the potential for XenServer making a change elsewhere which lead to a regressing failure in this code is sufficiently far form 0 to be of concern to both parties. |
You raise a crucial question. Do only XenServer needs count, or can XCP-ng contribution also - in a measured way - introduce features which XenServer doesn't need? We don't mean to overflow you with PRs you don't care about for XenServer. But in a healthy collaboration, we also need some of our PRs to be accepted, when they don't cause a big risk of regression or bitrot for XenServer. XCP-ng has supported software RAID for dom0 almost since the beginning. We have experience with it and can maintain and test the related code. Indeed, there's a risk with it not being tested first at XenServer, but we have it already anyway because the soft RAID support in XCP-ng currently lives as out of branch patches, which is worse. Also, the primary objective in this PR is not supporting software RAID: it's making the installer able to install on a disk even if this disk was previously used for software RAID. A situation which is unlikely when installing on a brand new server, but more likely when you re-use hardware, as often happens with XCP-ng users. We see it as an improvement XenServer benefits from, be it in a small way. The installer should not assemble software RAID by default as it does currently. The only part which is specific to XCP-ng is the support for detecting existing installations and backups on software RAID, and the assemble RAID button, which won't show up unless you enable the feature. I understand this pull request is not very interesting for XenServer, and our proposition in the comments even causes a workflow change which may affect existing test suites. I just hope it is acceptable. It falls under the "trying to merge our legacy patches into common code and maintain them there, so that we can focus on more important common matters". It's too late for XCP-ng to remove Soft RAID support for dom0. It is a popular feature. So either we can maintain it in the upstream code, or we have to keep maintaining our own branches, with potential conflicts and breakage each time upstream code changes. |
I didn't mean to imply that either only XenServer's or XCP-ng's requirements count. But where we do have quite divergent requirements here there is a problem, for both parties, relating to ensuring that code not used by one party does not inadvertently get broken by unrelated changes. A robust set of unit tests would go a long way to ensure that regressions would not be introduced by future changes. |
I definitely agree about testing the installer more thoroughly, and ideally as github actions on PRs. Do you have any plans already in this regard? Maybe as part of the python3 port? |
Primary goal of this PR is avoid a pre-existing software RAID configuration to block usage of the disks that were used for that RAID. Those were at best not shown as available, or caused installation errors when selected (on disks with partitions that were part of a RAID), or even successful-looking installations that were not booting (when attempting to install on a RAID whose members were partitions).
The basic idea is to prevent assembling of pre-existing RAID volumes, and add an explicit option to trigger such assembling, allowing to upgrade/restore a system installed on RAID. It also avoids, while RAIDs are not assembled, giving the user the choice to upgrade/restore a system installed on RAID.
This filtering-out of restoration choice is extended to any backup partition referring to a non-accessible device, which as a side-effect will make it impossible to start a "restore" operation of a too-old version, which would be doomed to fail because of a dom0 device id change (as seen e.g. in https://xcp-ng.org/forum/topic/6573/upgrade-xenserver-7-to-xcp-ng-8-2-1).
Note PR is based on #20 to help the user understand which exact choices are really presented. The display can be limited to commits relevant to this PR.
Note that some cases are explicitly not addressed: