Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preflight OS, CPU, RAM, Swap, and Filesystem checks #326

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

Kushal-deb
Copy link

@Kushal-deb Kushal-deb commented Feb 3, 2025

  • Implemented OS, NIC and Other preflight checks to validate system requirements before Ceph cluster creation.

    • Checks include:
      • OS version (RHEL 9+ required)
      • SELinux enforcing mode
      • Firewalld installation and status
      • Required package availability (rpcbind, podman, firewalld)
      • Podman version check (>= 3.3)
      • RHEL software profile validation
      • Tuned profile check
      • CPU, RAM, Swap, and Filesystem (part of other checks)
      • Check whether jumbo frames are enabled
      • Is it configured with DHCP or static IP
      • Is the bandwidth sufficient
      • Collect and output current NIC options set (e.g. Bonding, not bridged or virtual)
      • Check and report network latency (ping) with all hosts provided in the inventory file
      • Separate NICs for front-end and back-end networks

Enhancements:

❯ ansible-playbook -i ~/ansible-inventory/inventory.ini cephadm-preflight.yml                                                                                                                                                              ─╯

PLAY [insecure_registries] *******************************************************************************************************************************************************************************************************************

TASK [fail if insecure_registry is undefined] ************************************************************************************************************************************************************************************************
skipping: [rhel-ceph-admin]

PLAY [preflight] *****************************************************************************************************************************************************************************************************************************

TASK [fail when ceph_origin is custom with no repository defined] ****************************************************************************************************************************************************************************
skipping: [rhel-ceph-admin]

TASK [fail if baseurl is not defined for ceph_custom_repositories] ***************************************************************************************************************************************************************************
skipping: [rhel-ceph-admin]

PLAY [all] ***********************************************************************************************************************************************************************************************************************************

❯ ansible-playbook -i ~/ansible-inventory/inventory.ini cephadm-preflight.yml                                                                                                                                                              ─╯

PLAY [insecure_registries] *******************************************************************************************************************************************************************************************************************

TASK [fail if insecure_registry is undefined] ************************************************************************************************************************************************************************************************
skipping: [rhel-ceph-admin]

PLAY [preflight] *****************************************************************************************************************************************************************************************************************************

TASK [fail when ceph_origin is custom with no repository defined] ****************************************************************************************************************************************************************************
skipping: [rhel-ceph-admin]

TASK [fail if baseurl is not defined for ceph_custom_repositories] ***************************************************************************************************************************************************************************
skipping: [rhel-ceph-admin]

PLAY [Preflight Checks for Ceph Deployment] **************************************************************************************************************************************************************************************************

TASK [Initialize preflight results list] *****************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Collect installed package facts] *******************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Check if OS is RHEL 9+] ****************************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Ensure SELinux is set to Enforcing mode] ***********************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Determine SELinux Check Result] ********************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Determine SELinux Failure Reason] ******************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Determine Package Installation Check Result] *******************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Determine Package Installation Failure Reason] *****************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Fetch Firewalld status] ****************************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Extract Podman version if installed] ***************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Determine if Podman meets version requirement (>=3.3)] *********************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Validate RHEL software profile] ********************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Define RHEL Profile Check Result] ******************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Define RHEL Profile Check Reason] ******************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Get current tuned profile] *************************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Define Tuned Profile Check Result] *****************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Define Tuned Profile Check Reason] *****************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Check CPU x86-64-v2 support] ***********************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Define CPU, RAM, Swap, and Filesystem Check Variables] *********************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Ping all hosts in inventory to measure latency] ****************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin] => (item=rhel-ceph-admin)

TASK [Define networking facts] ***************************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Store all preflight check results] *****************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Generate preflight check report file] **************************************************************************************************************************************************************************************************
changed: [rhel-ceph-admin -> localhost]

TASK [Load the preflight check report] *******************************************************************************************************************************************************************************************************
ok: [rhel-ceph-admin]

TASK [Final Check - Fail if any critical checks failed] **************************************************************************************************************************************************************************************
fatal: [rhel-ceph-admin]: FAILED! => changed=false 
  msg: 'Preflight checks failed for the following: Tuned Profile, RHEL Profile, Minimum RAM, Swap Space, /var Partition, Root Filesystem, Jumbo Frames Enabled, NIC Static IP Configuration, NIC Bandwidth. Please resolve these issues before proceeding.'

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
rhel-ceph-admin            : ok=25   changed=1    unreachable=0    failed=1    skipped=3    rescued=0    ignored=0  

=================================================================================================

❯ cat preflight_report.txt                                                                                                                                                                                                                 ─╯
==================================================
               **  Preflight Check Report **
==================================================

 System Checks
--------------------------------------------------
- OS Version: ✅ Passed

- Tuned Profile: ❌ Failed
    - Reason: Incorrect tuned profile. Expected: throughput-performance

- RHEL Profile: ❌ Failed
    - Reason: Incorrect RHEL software profile. Expected: Server with File and Storage Server.

- Firewalld Running: ✅ Passed

- Podman Installed: ✅ Passed

- SELinux: ✅ Passed

- Required Packages Installed: ✅ Passed

- Minimum RAM (8GB): ❌ Failed
    - Reason: System has only 7684 MB RAM, required: 8192MB

- Swap Space (1.5x RAM): ❌ Failed
    - Reason: System has only 5119 MB Swap, required: 11526 MB

- CPU x86-64-v2: ✅ Passed

- CPU Cores >= 4: ✅ Passed

- /var is a separate partition: ❌ Failed
    - Reason: /var is not a separate partition

- Root Filesystem >= 100GB: ❌ Failed
    - Reason: Root FS is only 43GB, required: 100GB

- NIC Configuration: ℹ️ INFO
    - Reason: Available network interfaces: ens3 | Speeds (Mbps): -1

- Jumbo Frames Enabled: ❌ Failed
    - Reason: MTU is 1500, recommended > 1500

- NIC Static IP Configuration: ❌ Failed
    - Reason: NIC is using DHCP, static IP is recommended

- NIC Bandwidth (10GbE Recommended): ❌ Failed
    - Reason: NIC speed is -1 Mbps, recommended is 10GbE

- Network Latency: ℹ️ INFO
    - Reason: Average latency (ms): ['0.111']

==================================================
** Summary **
--------------------------------------------------
❌ Critical Failures Detected:
   - Tuned Profile, RHEL Profile, Minimum RAM, Swap Space, /var Partition, Root Filesystem, Jumbo Frames Enabled, NIC Static IP Configuration, NIC Bandwidth

** Action Required: Please resolve these issues before proceeding.

❯ pwd                                                                                                                                                                                                                                      ─╯
/home/kushaldeb/Github/cephadm-ansible/reports

░▒▓ ~/Github/cephadm-ansible/reports  on implement_os_preflight_checks *1 

❯ ls -l                                                                                                                                                                                                                                    ─╯
total 4
-rw-r--r--. 1 kushaldeb kushaldeb 1872 Feb 24 22:04 rhel-ceph-admin_preflight_report.txt


Copy link
Collaborator

@guits guits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, avoid as much as possible using ignore_errors: true

@Kushal-deb Kushal-deb force-pushed the implement_os_preflight_checks branch from 5d002a4 to 280a9cf Compare February 5, 2025 07:53
@Kushal-deb Kushal-deb requested a review from guits February 5, 2025 07:57
@Kushal-deb Kushal-deb force-pushed the implement_os_preflight_checks branch from 280a9cf to 2a6cb0f Compare February 5, 2025 14:30
@Kushal-deb Kushal-deb requested a review from guits February 5, 2025 14:41
@Kushal-deb Kushal-deb changed the title Add preflight OS checks Add preflight OS and other checks Feb 6, 2025
@Kushal-deb Kushal-deb force-pushed the implement_os_preflight_checks branch from 2a6cb0f to e837ef9 Compare February 6, 2025 11:38
@Kushal-deb Kushal-deb requested a review from guits February 6, 2025 11:39
@Kushal-deb Kushal-deb changed the title Add preflight OS and other checks Add preflight OS , CPU, RAM, Swap, and Filesystem checks Feb 6, 2025
@Kushal-deb Kushal-deb changed the title Add preflight OS , CPU, RAM, Swap, and Filesystem checks Add preflight OS, CPU, RAM, Swap, and Filesystem checks Feb 6, 2025
@Kushal-deb Kushal-deb closed this Feb 6, 2025
@Kushal-deb Kushal-deb deleted the implement_os_preflight_checks branch February 6, 2025 11:43
@Kushal-deb Kushal-deb restored the implement_os_preflight_checks branch February 6, 2025 11:43
@Kushal-deb Kushal-deb reopened this Feb 6, 2025
@Kushal-deb Kushal-deb force-pushed the implement_os_preflight_checks branch from e837ef9 to 6e47331 Compare February 6, 2025 17:19
@Kushal-deb Kushal-deb force-pushed the implement_os_preflight_checks branch from 6e47331 to 9546e44 Compare February 11, 2025 08:25
@Kushal-deb Kushal-deb requested a review from guits February 11, 2025 09:49
@Kushal-deb Kushal-deb force-pushed the implement_os_preflight_checks branch from 9546e44 to 39a250e Compare February 11, 2025 15:06
@Kushal-deb Kushal-deb requested a review from guits February 11, 2025 15:10
@asm0deuz
Copy link
Collaborator

jenkins test el9-functional

2 similar comments
@Kushal-deb
Copy link
Author

jenkins test el9-functional

@asm0deuz
Copy link
Collaborator

jenkins test el9-functional

@guits
Copy link
Collaborator

guits commented Feb 18, 2025

@Kushal-deb Please consider using a single set_fact task, like the following, instead of multiple set_fact tasks for each check:

    - name: Store all check results
      set_fact:
        preflight_results: >-
          {{ preflight_results + [
            {'Check': 'OS Version', 'Result': os_check, 'Reason': os_reason},
            {'Check': 'Tuned Profile', 'Result': tuned_profile_check, 'Reason': tuned_profile_reason},
            {'Check': 'RHEL Profile', 'Result': rhel_profile_check, 'Reason': rhel_profile_reason},
            {'Check': 'Firewalld Running', 'Result': firewalld_check, 'Reason': firewalld_reason},
            {'Check': 'Podman Installed', 'Result': podman_check, 'Reason': podman_reason},
            {'Check': 'SELinux', 'Result': selinux_check, 'Reason': selinux_reason},
            {'Check': 'Minimum RAM (8GB)', 'Result': memory_checks['ram']['result'], 'Reason': memory_checks['ram']['reason']},
            {'Check': 'Swap Space (1.5x RAM)', 'Result': memory_checks['swap']['result'], 'Reason': memory_checks['swap']['reason']},
            {'Check': 'CPU x86-64-v2', 'Result': cpu_checks['x86_64_v2']['result'], 'Reason': cpu_checks['x86_64_v2']['reason']},
            {'Check': 'CPU Cores >= 4', 'Result': cpu_checks['cores']['result'], 'Reason': cpu_checks['cores']['reason']},
            {'Check': '/var is a separate partition', 'Result': filesystem_checks['var_partition']['result'], 'Reason': filesystem_checks['var_partition']['reason']},
            {'Check': 'Root Filesystem >= 100GB', 'Result': filesystem_checks['root_fs']['result'], 'Reason': filesystem_checks['root_fs']['reason']},
            {'Check': 'SELinux', 'Result': selinux_check, 'Reason': selinux_reason},
            {'Check': 'Jumbo Frames Enabled', 'Result': jumbo_frames_check, 'Reason': jumbo_frames_reason},
            {'Check': 'Network Latency', 'Result': 'INFO', 'Reason': 'Latency results: ' ~ ping_results.results | map(attribute='ping') | list},
            {'Check': 'NIC Static IP Configuration', 'Result': nic_config_check, 'Reason': nic_config_reason},
            {'Check': 'NIC Bandwidth (10GbE Recommended)', 'Result': nic_speed_check, 'Reason': nic_speed_reason},

          ] }}
        preflight_failures: >-
          {{ preflight_failures
             + (['OS Version'] if os_check == 'FAIL' else [])
             + (['Tuned Profile'] if tuned_profile_check == 'FAIL' else [])
             + (['RHEL Profile'] if rhel_profile_check == 'FAIL' else [])
             + (['SELinux'] if selinux_check == 'FAIL' else [])
             + (['Firewalld Running'] if firewalld_check == 'FAIL' else [])
             + (['Podman Installed'] if not podman_installed else [])
             + (['Minimum RAM'] if memory_checks['ram']['result'] == 'FAIL' else [])
             + (['Swap Space'] if memory_checks['swap']['result'] == 'FAIL' else [])
             + (['CPU x86-64-v2'] if cpu_checks['x86_64_v2']['result'] == 'FAIL' else [])
             + (['CPU Cores'] if cpu_checks['cores']['result'] == 'FAIL' else [])
             + (['/var Partition'] if filesystem_checks['var_partition']['result'] == 'FAIL' else [])
             + (['Root Filesystem'] if filesystem_checks['root_fs']['result'] == 'FAIL' else [])
             + (['SELinux'] if selinux_check == 'FAIL' else []) }}

@Kushal-deb Kushal-deb force-pushed the implement_os_preflight_checks branch from 39a250e to 3157d41 Compare February 19, 2025 09:37
@Kushal-deb Kushal-deb requested a review from guits February 19, 2025 09:50
@Kushal-deb Kushal-deb force-pushed the implement_os_preflight_checks branch 2 times, most recently from 4feee23 to 11764b3 Compare February 19, 2025 16:04
@Kushal-deb Kushal-deb requested a review from guits February 19, 2025 16:05
Copy link
Collaborator

@guits guits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kushal-deb what happens if you have more than 1 node in your inventory host ?
At the end of the playbook, the report seems to be generated only once, for the first node only.

@Kushal-deb Kushal-deb force-pushed the implement_os_preflight_checks branch from 11764b3 to 3fef793 Compare February 24, 2025 16:37
@Kushal-deb
Copy link
Author

Kushal-deb commented Feb 24, 2025

@Kushal-deb what happens if you have more than 1 node in your inventory host ? At the end of the playbook, the report seems to be generated only once, for the first node only.

Hi, I have updated the implementation to generate the result files on the controller node under reports directory.

example:

░▒▓ ~/Github/cephadm-ansible/reports  on implement_os_preflight_checks *1 ?8 ▓▒░·······································································································································░▒▓ at 22:09:10 ▓▒░─╮
❯ pwd                                                                                                                                                                                                                                      ─╯
/home/kushaldeb/Github/cephadm-ansible/reports

░▒▓ ~/Github/cephadm-ansible/reports  on implement_os_preflight_checks *1 ?8 ▓▒░·······································································································································░▒▓ at 22:09:11 ▓▒░─╮
❯ ls -l                                                                                                                                                                                                                                    ─╯
total 4
-rw-r--r--. 1 kushaldeb kushaldeb 1872 Feb 24 22:04 rhel-ceph-admin_preflight_report.txt

@Kushal-deb Kushal-deb requested a review from guits February 24, 2025 16:42
- Implemented OS preflight checks to validate system requirements before Ceph cluster creation.
- Checks include:
  - OS version (RHEL 9+ required)
  - SELinux enforcing mode
  - Firewalld installation and status
  - Required package availability (rpcbind, podman, firewalld)
  - Podman version check (>= 3.3)
  - RHEL software profile validation
  - Tuned profile check
  - CPU, RAM, Swap, and Filesystem (part of other checks)
  - Check whether jumbo frames are enabled
  - Is it configured with DHCP or static IP
  - Is the bandwidth sufficient
  - Collect and output current NIC options set (e.g. Bonding, not bridged or virtual)
  - Check and report network latency (ping) with all hosts provided in the inventory file
  - Listing all NICs

Signed-off-by: Kushal Deb <Kushal.Deb@ibm.com>
@Kushal-deb Kushal-deb force-pushed the implement_os_preflight_checks branch from 3fef793 to 1fa20b2 Compare February 24, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants