Skip to content

Commit

Permalink
Merge pull request #2405 from sbasu96/CP_to_OIM_doc_changes
Browse files Browse the repository at this point in the history
CP to OIM doc changes
  • Loading branch information
priti-parate authored Jan 20, 2025
2 parents eccd8f0 + bed049a commit 0871600
Show file tree
Hide file tree
Showing 125 changed files with 504 additions and 504 deletions.
8 changes: 4 additions & 4 deletions docs/source/Logging/ControlPlaneLogs.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Control plane logs
-------------------
OIM logs
----------

.. caution:: It is not recommended to delete the below log files or the directories they reside in.

Expand Down Expand Up @@ -31,7 +31,7 @@ Logs of individual containers
Provisioning logs
--------------------

Logs pertaining to actions taken during ``discovery_provision.yml`` can be viewed in ``/var/log/xcat/cluster.log`` and ``/var/log/xcat/computes.log`` on the control plane.
Logs pertaining to actions taken during ``discovery_provision.yml`` can be viewed in ``/var/log/xcat/cluster.log`` and ``/var/log/xcat/computes.log`` on the OIM.

.. note:: As long as a node has been added to a cluster by Omnia, deployment events taking place on the node will be updated in ``/var/log/xcat/cluster.log``.

Expand All @@ -47,7 +47,7 @@ Logs pertaining to actions taken by Omnia or iDRAC telemetry can be viewed in ``
Grafana Loki
--------------

After `telemetry.yml <../Telemetry/index.html>`_ is run, Grafana services are installed on the control plane.
After `telemetry.yml <../Telemetry/index.html>`_ is run, Grafana services are installed on the OIM.

i. Get the Grafana IP using ``kubectl get svc -n grafana``.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/Logging/LogManagement.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ With the above settings:

* Data upto 4 weeks old is backed up. Any log backup older than four weeks will be deleted.

.. caution:: Since these logs take up ``/var`` space, sufficient space must be allocated to ``/var`` partition if it's created. If ``/var`` partition space fills up, control plane might crash.
.. caution:: Since these logs take up ``/var`` space, sufficient space must be allocated to ``/var`` partition if it's created. If ``/var`` partition space fills up, OIM might crash.
16 changes: 8 additions & 8 deletions docs/source/OmniaInstallGuide/Maintenance/cleanup.rst
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
Uninstalling the control plane tools
-------------------------------------------
Uninstalling the OIM tools
------------------------------

Run this script to roll back all modifications made to the control plane, such as configured local repositories, provisioning tools, and telemetry configurations.
Run this script to roll back all modifications made to the OIM, such as configured local repositories, provisioning tools, and telemetry configurations.

To run the script: ::

cd utils
ansible-playbook control_plane_cleanup.yml
ansible-playbook oim_cleanup.yml

To skip the deletion of the configured local repositories (stored in ``repo_store_path`` and xCAT repositories), run: ::

ansible-playbook control_plane_cleanup.yml –-skip-tags downloads
ansible-playbook oim_cleanup.yml –-skip-tags downloads

To delete the changes made by ``local_repo.yml`` while retaining the ``repo_store_path`` folder, run: ::

ansible-playbook control_plane_cleanup.yml -–tags local_repo --skip-tags downloads
ansible-playbook oim_cleanup.yml -–tags local_repo --skip-tags downloads

To delete the changes made by ``local_repo.yml`` including the ``repo_store_path`` folder, run: ::

ansible-playbook control_plane_cleanup.yml –-tags local_repo
ansible-playbook oim_cleanup.yml –-tags local_repo


.. note:: After you run the ``control_plane_cleanup.yml`` playbook, ensure to reboot the control plane node.
.. note:: After you run the ``oim_cleanup.yml`` playbook, ensure to reboot the OIM node.

.. caution::
* When re-provisioning your cluster (that is, re-running the ``discovery_provision.yml`` playbook) after a clean-up, ensure to use a different ``admin_nic_subnet`` in ``input/provision_config.yml`` to avoid a conflict with newly assigned servers. Alternatively, disable any OS available in the ``Boot Option Enable/Disable`` section of your BIOS settings (``BIOS Settings`` > ``Boot Settings`` > ``UEFI Boot Settings``) on all target nodes.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Configuring custom repositories
-------------------------------

Use the local repository feature to create a customized set of local repositories on the control plane for the cluster nodes to access.
Use the local repository feature to create a customized set of local repositories on the OIM for the cluster nodes to access.

1. Ensure the ``custom`` entry is included in the ``software_config.json`` file. ::

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Granting Kubernetes access

Omnia grants Kubernetes node access to users defined on the ``kube_control_plane`` using the ``k8s_access.yml`` playbook.

**Prerequisites**
**Prerequisite**

* Ensure that the Kubernetes cluster is up and running.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,15 +93,15 @@ Prerequisites
Installation Process
---------------------

1. Once ``secret.yaml`` and ``values.yaml`` is filled up with the necessary details, copy both files to any directory on the control plane. For example, ``/tmp/secret.yaml`` and ``/tmp/values.yaml``.
1. Once ``secret.yaml`` and ``values.yaml`` is filled up with the necessary details, copy both files to any directory on the OIM. For example, ``/tmp/secret.yaml`` and ``/tmp/values.yaml``.

2. Add the ``csi_driver_powerscale`` entry along with the driver version to the ``omnia/input/software_config.json`` file: ::

{"name": "csi_driver_powerscale", "version":"v2.11.0"}

.. note:: By default, the ``csi_driver_powerscale`` entry is not present in the ``input/software_config.json``.

3. Execute the ``local_repo.yml`` playbook to download the required artifacts to the control plane: ::
3. Execute the ``local_repo.yml`` playbook to download the required artifacts to the OIM: ::

cd local_repo
ansible-playbook local_repo.yml
Expand Down Expand Up @@ -225,9 +225,9 @@ Once the storage class is created, the same can be used to create PVC.
mountPath: /data
env:
- name: http_proxy
value: "http://<control plane IP>:3128"
value: "http://<OIM IP>:3128"
- name: https_proxy
value: "http://<control plane IP>:3128"
value: "http://<OIM IP>:3128"
volumes:
- name: data
persistentVolumeClaim:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,30 @@ Alternate method to install the AMD ROCm platform

The accelerator role allows users to set up the `AMD ROCm <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/>`_ platform. This tools allow users to unlock the potential of installed AMD GPUs.

Ensure that the ROCm local repositories are configured using the `local_repo.yml <../CreateLocalRepo/index.html>`_ script.
**Prerequisites**

Ensure that the ``input/software_config.json`` contains valid amdgpu and rocm version. See `input parameters <../CreateLocalRepo/InputParameters.html>`_ for more information.
* Ensure that the ROCm local repositories are configured using the `local_repo.yml <../CreateLocalRepo/index.html>`_ script.
* Ensure that the ``input/software_config.json`` contains valid amdgpu and rocm version. See `input parameters <../CreateLocalRepo/InputParameters.html>`_ for more information.

.. note::
* Nodes provisioned using the Omnia provision tool do not require a RedHat subscription to run ``accelerator.yml`` on RHEL target nodes.
* For RHEL target nodes not provisioned by Omnia, ensure that RedHat subscription is enabled on all target nodes. Every target node will require a RedHat subscription.
* AMD ROCm driver installation is not supported by Omnia on Rocky Linux cluster nodes.

To install all the latest GPU drivers and toolkits, run: ::

cd accelerator
ansible-playbook accelerator.yml -i inventory
* AMD ROCm driver installation is not supported by Omnia on Rocky Linux cluster nodes.

**Playbook configurations**

The following configurations take place when running ``accelerator.yml``
The following configurations takes place while running the ``accelerator.yml`` playbook:

i. Servers with AMD GPUs are identified and the latest GPU drivers and ROCm platforms are downloaded and installed.
ii. Servers with no GPU are skipped.

**Executing the playbook**

To install all the latest GPU drivers and toolkits, run: ::

cd accelerator
ansible-playbook accelerator.yml -i inventory

User permissions for ROCm platforms
------------------------------------

Expand All @@ -33,7 +37,7 @@ User permissions for ROCm platforms
.. note::
* <user> is the system name of the end user.
* This command must be run with ``root`` permissions.
* If the root user wants to provide access to other users and their individual GPU nodes, the previous command needs to be run on all of them separately. ::
* If the root user wants to provide access to other users and their individual GPU nodes, the previous command needs to be run on all of them separately.

* To enable users to use rocm tools, use the following command as shown in the below added sample file: ::

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Prerequisites
===============

1. Set the hostname of the control plane in the "hostname.domain name" format.
1. Set the hostname of the OIM in the "hostname.domain name" format.

.. include:: ../../../Appendices/hostnamereqs.rst

Expand All @@ -14,12 +14,12 @@ For example, ``controlplane.omnia.test`` is acceptable. ::
.. note::

* The ``user_registry`` in ``input/local_repo_config.yml`` supports only nerdctl and docker registries.
* If you define the ``cert_path`` variable, ensure that it points to the absolute path of the user registry certificate present on the Omnia control plane.
* If you define the ``cert_path`` variable, ensure that it points to the absolute path of the user registry certificate present on the Omnia OIM.
* To avoid docker pull limits, provide docker credentials (``docker_username``, ``docker_password``) in ``input/provision_config_credentials.yml``.

.. caution:: In order to download the software images from an user registry, the user needs to ensure that the ``user_registry`` address provided in ``input/local_repo_config.yml`` is accessible from the Omnia control plane. If the ``user_registry`` is not accessible from the control plane, Omnia will download all the software images listed in ``input/software_config.json`` to the Omnia-registry. Use the ``curl -k <user_registry>`` to check.
.. caution:: In order to download the software images from an user registry, the user needs to ensure that the ``user_registry`` address provided in ``input/local_repo_config.yml`` is accessible from the Omnia OIM. If the ``user_registry`` is not accessible from the OIM, Omnia will download all the software images listed in ``input/software_config.json`` to the Omnia-registry. Use the ``curl -k <user_registry>`` to check.

Images listed in ``user_registry`` in ``input/local_repo_config.yml`` are accessed from user defined registries. To ensure that the control plane can correctly access the registry, ensure that the following naming convention is used to save the image: ::
Images listed in ``user_registry`` in ``input/local_repo_config.yml`` are accessed from user defined registries. To ensure that the OIM can correctly access the registry, ensure that the following naming convention is used to save the image: ::

<host>/<image name>:v<version number>

Expand Down
Original file line number Diff line number Diff line change
@@ -1,30 +1,30 @@
Running local repo
------------------

The local repository feature will help create offline repositories on the control plane which all the cluster nodes will access.
The local repository feature will help create offline repositories on the OIM which all the cluster nodes will access.

**Configurations made by the playbook**

* A registry is created on the control plane at <Control Plane hostname>:5001.
* A registry is created on the OIM at <OIM hostname>:5001.

* If ``repo_config`` in ``local_repo_config.yml`` is set to ``always`` or ``partial``, all images present in the ``input/config/<cluster_os_type>/<cluster_os_version>`` folder will be downloaded to the control plane.
* If ``repo_config`` in ``local_repo_config.yml`` is set to ``always`` or ``partial``, all images present in the ``input/config/<cluster_os_type>/<cluster_os_version>`` folder will be downloaded to the OIM.


* If the image is defined using a tag, the image will be tagged using <control plane hostname>:5001/<image_name>:<version> and pushed to the Omnia local registry.
* If the image is defined using a tag, the image will be tagged using <OIM hostname>:5001/<image_name>:<version> and pushed to the Omnia local registry.

* If the image is defined using a digest, the image will be tagged using <control plane hostname>:5001/<image_name>:omnia and pushed to the Omnia local registry.repositories
* If the image is defined using a digest, the image will be tagged using <OIM hostname>:5001/<image_name>:omnia and pushed to the Omnia local registry.repositories


* When ``repo_config`` in ``local_repo_config.yml`` is set to ``always``, the control plane is set as the default registry mirror.
* When ``repo_config`` in ``local_repo_config.yml`` is set to ``always``, the OIM is set as the default registry mirror.

* When ``repo_config`` in ``local_repo_config`` is set to ``partial``, the ``user_registry`` (if defined) and the control plane are set as default registry mirrors.
* When ``repo_config`` in ``local_repo_config`` is set to ``partial``, the ``user_registry`` (if defined) and the OIM are set as default registry mirrors.

To create local repositories, run the following commands: ::

cd local_repo
ansible-playbook local_repo.yml

.. caution:: During the execution of ``local_repo.yml``, Omnia 1.7 will remove packages such as ``podman``, ``containers-common``, and ``buildah`` (if they are already installed), as they conflict with the installation of ``containerd.io`` on RHEL/Rocky Linux OS control plane.
.. caution:: During the execution of ``local_repo.yml``, Omnia 1.7 will remove packages such as ``podman``, ``containers-common``, and ``buildah`` (if they are already installed), as they conflict with the installation of ``containerd.io`` on RHEL/Rocky Linux OS OIM.

Verify changes made by the playbook by running ``cat /etc/containerd/certs.d/_default/hosts.toml`` on compute nodes.

Expand All @@ -42,7 +42,7 @@ To fetch images from the ``user_registry`` or the Omnia local registry, run the
.. note::


* After ``local_repo.yml`` has run, the value of ``repo_config`` in ``input/software_config.json`` cannot be updated without running the `control_plane_cleanup.yml <../../Maintenance/cleanup.html>`_ script first.
* After ``local_repo.yml`` has run, the value of ``repo_config`` in ``input/software_config.json`` cannot be updated without running the `oim_cleanup.yml <../../Maintenance/cleanup.html>`_ playbook first.

* To configure additional local repositories after running ``local_repo.yml``, update ``software_config.json`` and re-run ``local_repo.yml``.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/OmniaInstallGuide/RHEL/CreateLocalRepo/index.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Step 2: Create Local repositories for the cluster
==================================================

The ``local_repo.yml`` playbook creates offline repositories on the control plane server, which all the cluster nodes will access. This playbook execution requires inputs from ``input/software_config.json`` and ``input/local_repo_config.yml``.
The ``local_repo.yml`` playbook creates offline repositories on the OIM server, which all the cluster nodes will access. This playbook execution requires inputs from ``input/software_config.json`` and ``input/local_repo_config.yml``.

.. caution:: If you have a proxy server set up for your control plane, you must configure the proxy environment variables on the control plane before running any Omnia playbooks. For more information, `click here <../Setup_CP_proxy.html>`_.
.. caution:: If you have a proxy server set up for your OIM, you must configure the proxy environment variables on the OIM before running any Omnia playbooks. For more information, `click here <../Setup_CP_proxy.html>`_.

.. toctree::
Prerequisite
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Configuring specific local repositories


.. note::
* If the package version is customized, ensure that the ``version`` value is updated in ``software_config.json```.
* If the package version is customized, ensure that the ``version`` value is updated in ``software_config.json``.
* If the target cluster runs on RHEL or Rocky Linux, ensure the "dkms" package is included in ``input/config/<cluster_os_type>/8.x/cuda.json`` as illustrated above.


Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Setup Jupyterhub
-----------------

Using Jupyterhub helm chart (version 3.2.0), Omnia installs Jupyterhub (version 4.0.2) on Kubernetes clusters. Once Jupyterhub is deployed, log into the GUI to create your own Jupyter notebook. For more information, `click here <https://z2jh.jupyter.org/en/stable/jupyterhub/customization.html>`_.
Omnia installs Jupyterhub (version 3.2.0) on Kubernetes clusters. Once Jupyterhub is deployed, log into the GUI to create your own Jupyter notebook. For more information, `click here <https://z2jh.jupyter.org/en/stable/jupyterhub/customization.html>`_.

**Prerequisites**

Expand Down Expand Up @@ -39,7 +39,7 @@ Using Jupyterhub helm chart (version 3.2.0), Omnia installs Jupyterhub (version

**Accessing the Jupyterhub GUI**

1. Login to kube control plane and verify that the Jupyterhub service is running.
1. Login to the ``kube_control_plane`` and verify that the Jupyterhub service is running.
2. Find the IP address of the Jupyterhub service using:

::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Kserve is an open-source serving platform that simplifies the deployment, scalin
istiod ClusterIP 10.233.18.185 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 44h
knative-local-gateway ClusterIP 10.233.37.248 <none> 80/TCP 44h

3. To access inferencing from the ingressgateway with HOST header, run the below command from the kube_control_plane or kube_node: ::
3. To access inferencing from the ingressgateway with HOST header, run the below command from the ``kube_control_plane`` or ``kube_node``: ::

curl -v -H "Host: <service url>" -H "Content-Type: application/json" "http://<istio-ingress external IP>:<istio-ingress port>/v1/models/<model name>:predict" -d @./iris-input.json

Expand Down
Loading

0 comments on commit 0871600

Please sign in to comment.