diff --git a/adoc/SAP-convergent-mediation-ha-setup-sle15-docinfo.xml b/adoc/SAP-convergent-mediation-ha-setup-sle15-docinfo.xml index 8ba32678..262d6003 100644 --- a/adoc/SAP-convergent-mediation-ha-setup-sle15-docinfo.xml +++ b/adoc/SAP-convergent-mediation-ha-setup-sle15-docinfo.xml @@ -61,7 +61,7 @@ optimized in various ways for SAP* applications. This document explains how to deploy an SAP Convergent Mediation ControlZone High Availability Cluster solution. - It is based on SUSE Linux Enterprise Server for SAP Applications 15. The concept however can also be used with + It is based on SUSE Linux Enterprise Server for SAP Applications 15 SP5. The concept however can also be used with newer service packs of SUSE Linux Enterprise Server for SAP Applications. diff --git a/adoc/SAP-convergent-mediation-ha-setup-sle15.adoc b/adoc/SAP-convergent-mediation-ha-setup-sle15.adoc index f19b1e12..cc7fb44e 100644 --- a/adoc/SAP-convergent-mediation-ha-setup-sle15.adoc +++ b/adoc/SAP-convergent-mediation-ha-setup-sle15.adoc @@ -14,7 +14,7 @@ // :toc: -include::Variables_s4_2021.adoc[] +include::Var_SAP-convergent-mediation.adoc[] // //// @@ -23,35 +23,39 @@ TODO PRIOx: example == About this guide -The following sections focus on background information and the purpose of the document at hand. +The following sections focus on background information and the purpose of the +document at hand. === Introduction -{sles4sapReg} is the optimal platform to -run {sapReg} applications with high availability. Together with a redundant layout -of the technical infrastructure, single points of failure can be eliminated. +{sles4sapReg} is the optimal platform to run {sapReg} applications with high +availability. Together with a redundant layout of the technical infrastructure, +single points of failure can be eliminated. TODO === Abstract -This guide describes planning, setup, and basic testing of {sles4sap} 15 (TODO variable) -as an high availability cluster for an {sap} Convergent Mediation ControlZone platform. +This guide describes planning, setup, and basic testing of {sles4sap} {prodNr} +{prodSP} as an high availability cluster for an {sap} {ConMed} ControlZone +platform. TODO From the application perspective the following variants are covered: -- Convergent Mediation platform service running alone +- {ConMed} platform service running alone -- Convergent Mediation platform and UI services running together +- {ConMed} platform and UI services running together -- Convergent Meditation binaries stored and started on central NFS (not recommended) +- {ConMed} binaries stored and started on central NFS (not recommended) -- Convergent Meditation binaries copied to and started from local disks +- {ConMed} binaries copied to and started from local disks -- TODO +- Java VM stored and started on central NFS (not recommended) + +- Java VM started from local disks From the infrastructure perspective the following variants are covered: @@ -65,27 +69,25 @@ From the infrastructure perspective the following variants are covered: - Public cloud deployment (usually needs additional documentation on cloud specific details) -Deployment automation simplifies roll-out. There are several options available, particularly on public cloud platfoms. Ask your public cloud provider or your SUSE contact for details. +Deployment automation simplifies roll-out. There are several options available, +particularly on public cloud platfoms. Ask your public cloud provider or your +SUSE contact for details. [id="sec.resources"] === Additional documentation and resources -Several chapters in this document contain links to additional documentation resources that -are either available on the system or on the Internet. +Several chapters in this document contain links to additional documentation resources +that are either available on the system or on the Internet. For the latest product documentation updates, see https://documentation.suse.com/. -More whitepapers, guides and best practices documents referring to SUSE Linux Enterprise Server and SAP can be -found and downloaded at the SUSE Best Practices Web page: +More whitepapers, guides and best practices documents referring to {SLES} and {SAP} +can be found and downloaded at the SUSE Best Practices Web page: https://documentation.suse.com/sbp/sap/ -Here you can access guides for {SAPHANA} system replication -automation and High Availability (HA) scenarios for {SAPNw} and {s4hana}. - -Additional resources, such as customer references, brochures or flyers, can be found at -the {sles4sap} resource library: -https://www.suse.com/products/sles-for-sap/resource-library/. +Here you can access guides for {SAPHANA} system replication automation and High Availability +(HA) scenarios for {SAPNw} and {s4hana}. Supported high availability solutions by {sles4sap} overview: https://documentation.suse.com/sles-sap/sap-ha-support/html/sap-ha-support/article-sap-ha-support.html @@ -102,7 +104,7 @@ include::common_intro_feedback.adoc[] == Overview TODO -Convergent Mediation (CM) +{ConMed} (CM) The CM ControlZone platform is responsible for providing services to other instances. Several platform containers may exist in a CM system, for high availability, @@ -113,13 +115,13 @@ NFS shares with work directories can be mounted statically on all nodes. The HA does not need to control that filesystems. TODO -=== High availabilty for the Convergent Mediation ControlZone platform +=== High availabilty for the {ConMed} ControlZone platform TODO The ControlZone services platform, and optinally UI, are handled as active/passive resources. The related virtual IP adress is managed by the HA cluster as well. The HA cluster does not control filesystems used by the ControlZone services. However, -optionally this filesystem could be monitored. +this filesystem might be monitored. TODO picture @@ -131,7 +133,7 @@ application itself. Client-side write caching has to be disabled. A Filesystem resource is configured for a bind-mount of the real NFS share. This resource is grouped with the ControlZone platform and IP address. In case of filesystem failures, -the whole group gets restarted. No mount or umount on the real NFS share is done. +the cluster takes action. No mount or umount on the real NFS share is done. TODO this filesyetm resource is optional @@ -145,26 +147,46 @@ TODO [id="sec.prerequisites"] === Prerequisites -TODO Requirements of Convergent Mediation ControlZone +TODO Requirements of {ConMed} ControlZone TODO Requirements of the SUSE high availability solution for CM ControlZone are: -- Convergent Mediation ControlZone version 9.0.0.0 or higher is installed and -configured on both cluster nodes. If the software is installed into a shared NFS -filesystem, the binaries are copied into both cluster nodes´ local filesystems. +- {ConMed} ControlZone version 9.0.1.1 or higher is installed and configured on +both cluster nodes. If the software is installed into a shared NFS filesystem, the +binaries are copied into both cluster nodes´ local filesystems. Finally the local +configuration has to be adjusted. Please refer to {ConMed} documentation for details. + +- CM ControlZone is configured identically on both cluster nodes. User, path +names and environment settings are the same. + +- Only one ControlZone instance per Linux cluster. Thus one platform service and +one UI service per cluster. -- Only one ControlZone instance per Linux cluster. +- The platform and UI are installed into the same MZ_HOME. + +- Linux shell of the mzadmin user is /bin/bash. + +- The mzadmin´s ~/.bashrc inherits MZ_HOME, JAVA_HOME and MZ_PLATFORM from +SAPCMControlZone RA. This variables need to be set as described in the RA´s +documentation, i.e. manual page ocf_suse_SAPCMControlZone(7). + +- When called by the resource agent, mzsh connnects to CM ControlZone services +via network. The service´s virtual hostname or virtual IP address managed by the +cluster should not be used when called by RA monitor actions. - Technical users and groups are defined locally in the Linux system. If users are -resolved by remote service, local caching is neccessary. Substitute user (su) to -the mz-user (e.g. "mzadmin") needs to work reliable and without customized actions or -messages. +resolved by remote service, local caching is neccessary. Substitute user (su) to +the mzadmin needs to work reliable and without customized actions or messages. + +- Name resolution for hostnames and virtual hostnames is crucial. Hostnames of +cluster nodes and services are resolved locally in the Linux system. -- Strict time synchronization between the cluster nodes, e.g. NTP. All nodes of a +- Strict time synchronization between the cluster nodes, e.g. NTP. All nodes of a cluster have configured the same timezone. -- Needed NFS shares (e.g. /mnt/platform/) mounted statically or by automounter. -No client-side write caching. +- Needed NFS shares (e.g. /usr/sap/) are mounted statically or by automounter. +No client-side write caching. File locking might be configured for application +needs. - The RA monitoring operations have to be active. @@ -174,18 +196,22 @@ Linux cluster. The infrastructure needs to allow these call-outs to return in ti - The ControlZone application is not started/stopped by OS. Thus there is no SystemV, systemd or cron job. -- As long as the ControlZone application is managed by the Linux cluster, the application -is not started/stopped/moved from outside. Thus no manual actions are done. +- As long as the ControlZone application is managed by the Linux cluster, the +application is not started/stopped/moved from outside. Thus no manual actions are +done. The Linux cluster does not prevent from administrative mistakes. +However, if the Linux cluster detects the application running at both sites in +parallel, it will stop both and restart one. -- Interface for the RA to the ControlZone platform is the command mzsh. -The mzsh is accessed on the cluster nodes´ local filesystems. The mzsh is called -with the arguments startup, shutdown, status and kill. Its output is parsed by the RA. -Thus the command and its output needs to be stable. +- Interface for the RA to the ControlZone services is the command mzsh. Ideally, +the mzsh should be accessed on the cluster nodes´ local filesystems. +The mzsh is called with the arguments startup, shutdown and status. Its return +code and output is interpreted by the RA. Thus the command and its output needs +to be stable. The mzsh shall not be customized. Particularly environment +variables set thru ~/.bashrc must not be changed. - The mzsh is called on the active node with a defined interval for regular resource -monitor operations. It also is called on the active or passive node in certain situations. Those calls might run in parallel. - -- TODO +monitor operations. It also is called on the active or passive node in certain situations. +Those calls might run in parallel. === The setup procedure at a glance @@ -275,7 +301,7 @@ TODO TODO on one node -== Integrating Convergent Mediation ControlZone with the Linux cluster +== Integrating {ConMed} ControlZone with the Linux cluster TODO @@ -309,7 +335,8 @@ rsc_defaults rsc-options: \ migration-threshold=3 \ failure-timeout=86400 op_defaults op-options: \ - timeout=120 + timeout=120 \ + record-pending=true ---- ==== Adapting SBD STONITH resource @@ -342,33 +369,57 @@ See manual page ocf_heartbeat_IPAddr2(7) for more details. ==== Filesystem resource (only monitoring) -TODO +A shared filesystem migth be statically mounted by OS on both cluster nodes. +This filesystem holds work directories. It must not be confused with the +ControlZone application itself. Client-side write caching has to be disabled. + +A Filesystem resource is configured for a bind-mount of the real NFS share. +This resource is grouped with the ControlZone platform and IP address. In case +of filesystem failures, the node gets fenced. +No mount or umount on the real NFS share is done. +Example for the real NFS share is /usr/sap/{mySid}/.check/, example for the +bind-mount is /mnt/check/{mySid}/ . Both mount points have to be created before +the cluster resource is activated. [subs="specialchars,attributes"] ---- primitive rsc_fs_{mySid} ocf:heartbeat:Filesystem \ - params device=/mnt/platform/check/ directory=/mnt/check/ \ + params device=/usr/sap/{mySid}/.check/ directory=/mnt/check/{mySid}/ \ fstype=nfs4 options=bind,rw,noac,sync,defaults \ - op monitor interval=120 timeout=120 on-fail=restart \ + op monitor interval=90 timeout=120 on-fail=restart \ op_params OCF_CHECK_LEVEL=20 \ op start timeout=120 \ op stop timeout=120 \ meta target-role=stopped ---- -See manual page ocf_heartbeat_Filesystem(7) for more details. +See also manual page SAPCMControlZone_basic_cluster(7), ocf_heartbeat_Filesystem(7) +and nfs(5). -==== SAP Convergent Mediation ControlZone resource +==== SAP Convergent Mediation ControlZone platform resource -TODO +A ControlZone platform resoure rsc_cz_{mySid} is configured, handled by OS user +{mySapAdm}. The local /opt/cm/{mySid}/bin/mzsh is used for monitoring, but for other +actions /usr/sap/{mySid}/bin/mzsh is used. + +In case of ControlZone platform failure (or monitor timeout), the resource gets +restarted until it gains success or migration-threshold is reached. In case of +IP address failure, the resource group gets restarted until it gains success or +migration-threshold is reached. If migration-threshold is exceeded, or if the +node fails where the group is running, the group will be moved to the other +node. A priority is configured for correct fencing in split-brain situations. [subs="specialchars,attributes"] ---- primitive rsc_cz_{mySid} ocf:suse:SAPCMControlZone \ - params SERVICE=platform MZSHELL=/opt/mz/bin/mzsh \ + params SERVICE=platform USER={mySapAdm} \ + MZSHELL=/opt/cm/{mySid}/bin/mzsh;/usr/sap/{mySid}/bin/mzsh \ + MZHOME=/opt/cm/{mySid}/;/usr/sap/{mySid}/ \ + MZPLATFORM=http://localhost:9000 \ + JAVAHOME=/opt/cm/{mySid}/sapmachine17 \ op monitor interval=60 timeout=120 on-fail=restart \ - op start timeout=120 interval=0 \ - op stop timeout=120 interval=0 \ + op start timeout=300 interval=0 \ + op stop timeout=300 interval=0 \ meta priority=100 maintenance=true ---- @@ -380,40 +431,67 @@ primitive rsc_cz_{mySid} ocf:suse:SAPCMControlZone \ |Description |USER -|OS user who calls mzsh, owner of $MZ_HOME. +|OS user who calls mzsh, owner of $MZ_HOME (might be different from $HOME). Optional. Unique, string. Default value: "mzadmin". |SERVICE -|The ControlZone service to be managed by the resoure agent. +|The ControlZone service to be managed by the resource agent. Optional. Unique, [ platform \| ui ]. Default value: "platform". |MZSHELL -|Path to mzsh. -Optional. Unique, string. Default value: "/usr/bin/mzsh". - -|CALL_TIMEOUT -|Define timeout how long calls to the ControlZone platform for checking the -status can take. If the timeout is reached, the return code will be 124. If you -increase this timeout for ControlZone calls, you should also adjust the monitor -operation timeout of your Linux cluster resources. (Not yet implemented.) -Optional. Unique, integer. Default value: 60. - -|SHUTDOWN_RETRIES -|Number of retries to check for process shutdown. Passed to mzsh. -If you increase the number of shutdown retries, you should also adjust the stop -operation timeout of your Linux cluster resources. (Not yet implemented.) -Optional. Unique, integer. Default: mzsh builtin value. +|Path to mzsh. Could be one or two full paths. If one path is given, that path +is used for all actions. In case two paths are given, the first one is used for +monitor actions, the second one is used for start/stop actions. If two paths are +given, the first needs to be on local disk, the second needs to be on the central +NFS share with the original CM ControlZone installation. Two paths are separated +by a semi-colon (;). The mzsh contains settings that need to be consistent with +MZ_PLATFORM, MZ_HOME, JAVA_HOME. Please refer to Convergent Mediation product +documentation for details. +Optional. Unique, string. Default value: "/opt/cm/bin/mzsh". + +|MZHOME +|Path to CM ControlZone installation directory, owned by the mzadmin user. +Could be one or two full paths. If one path is given, that path is used for all +actions. In case two paths are given, the first one is used for monitor actions, +the second one is used for start/stop actions. If two paths are given, the +first needs to be on local disk, the second needs to be on the central NFS share +with the original CM ControlZone installation. See also JAVAHOME. Two paths are +separated by semi-colon (;). +Optional. Unique, string. Default value: "/opt/cm/". + +|MZPLATFORM +|URL used by mzsh for connecting to CM ControlZone services. +Could be one or two URLs. If one URL is given, that URL is used for all actions. +In case two URLs are given, the first one is used for monitor and stop actions, +the second one is used for start actions. Two URLs are separated by semi-colon +(;). Should usually not be changed. The service´s virtual hostname or virtual IP +address managed by the cluster must never be used for RA monitor actions. +Optional. Unique, string. Default value: "http://localhost:9000". + +|JAVAHOME +|Path to Java virtual machine used for CM ControlZone. +Could be one or two full paths. If one path is given, that path is used for all +actions. In case two paths are given, the first one is used for monitor actions, +the second one is used for start/stop actions. If two paths are given, the +first needs to be on local disk, the second needs to be on the central NFS share +with the original CM ControlZone installation. See also MZHOME. Two paths are +separated by semi-colon (;). +Optional. Unique, string. Default value: "/usr/lib64/jvm/jre-17-openjdk". + |=== -See manual page ocf_suse_SAPCNControlZone(7) for more details. +See manual page ocf_suse_SAPCMControlZone(7) for more details. -==== ControlZone resource group +==== CM ControlZone resource group -TODO +ControlZone platform and UI resources rsc_cz_{mySid} and rsc_ui_{mySid} are grouped +with filesystem rsc_fs_{mySid}, IP address resource rsc_ip_{mySid} into group +grp_cz_{mySid}. The filesystem starts first, then platform, IP address starts before +UI. The resource group might run on either node, but never in parallel. [subs="specialchars,attributes"] ---- -group grp_cz_{mySid} rsc_fs_{mySid} rsc_ip_{mySid} rsc_cz_{mySid} \ +group grp_cz_{mySid} rsc_fs_{mySid} rsc_cz_{mySid} rsc_ip_{mySid} rsc_ui_{mySid} \ meta maintenance=true ---- diff --git a/adoc/Var_SAP-convergent-mediation.adoc b/adoc/Var_SAP-convergent-mediation.adoc index bd002349..b1d7cb3f 100644 --- a/adoc/Var_SAP-convergent-mediation.adoc +++ b/adoc/Var_SAP-convergent-mediation.adoc @@ -29,7 +29,7 @@ :myIPNode1: 192.168.1.100 :myIPNode2: 192.168.1.101 -:myVipAAscs: 192.168.1.112 +:myVipAAscs: 192.168.1.112 :myVipNM: /24 :myHaNetIf: eth0 @@ -65,3 +65,6 @@ :DigRoute: Digital Route :ConMed: Convergent Mediation +:prodNr: 15 +:prodSP: SP5 +