Skip to content

Commit

Permalink
[show] Add 'show' CLI for system-health feature (#971)
Browse files Browse the repository at this point in the history
* Add 'show' CLI for system-health feature

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>

* Add unit test for 'system-health' feature, add support for testing in 'show' script, Fix comments

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>

* Fix additional comments

* Fix comments

* Update Command-Reference.md

Add a CLI reference for system-health feature.

* Fix LGTM alerts

* Fix comment

Change 'Ignore' to 'Ignored'

* Update Command-Reference.md

Fix example output

* Update Command-Reference.md

* Change 'summary' output and adapt test and reference to the new change

* Update main.py

* Fix multiline output for expected output

* keep output aligned

* Fix import for unit testing after community change

* Add clicommon for @cli.group after community change

* Align changes in the feature to the CLI on commit
8ea2ab5

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>

* Update main.py

* Move new group CLI into a separate file

* Organize imports per PEP8 standards

* Organize imports per PEP8 standards

* Reformat docstring for readability
  • Loading branch information
shlomibitton committed Oct 12, 2020
1 parent 561d133 commit a71c72b
Show file tree
Hide file tree
Showing 4 changed files with 733 additions and 1 deletion.
185 changes: 185 additions & 0 deletions doc/Command-Reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@
* [System State](#system-state)
* [Processes](#processes)
* [Services & Memory](#services--memory)
* [System-Health](#System-Health)
* [VLAN & FDB](#vlan--fdb)
* [VLAN](#vlan)
* [VLAN show commands](#vlan-show-commands)
Expand Down Expand Up @@ -5940,6 +5941,190 @@ NOTE: This command is not working. It crashes as follows. A bug ticket is opened
Go Back To [Beginning of the document](#) or [Beginning of this section](#System-State)
Go Back To [Beginning of the document](#) or [Beginning of this section](#System-Health)
### System-Health
These commands are used to monitor the system current running services and hardware state.
**show system-health summary**
This command displays the current status of 'Services' and 'Hardware' under monitoring.
If any of the elements under each of these two sections is 'Not OK' a proper message will appear under the relevant section.
- Usage:
```
show system-health summary
```
- Example:
```
admin@sonic:~$ show system-health summary
System status summary
System status LED red
Services:
Status: Not OK
Not Running: 'telemetry', 'sflowmgrd'
Hardware:
Status: OK
```
```
admin@sonic:~$ show system-health summary
System status summary
System status LED green
Services:
Status: OK
Hardware:
Status: OK
```
**show system-health monitor-list**
This command displays a list of all current 'Services' and 'Hardware' being monitored, their status and type.
- Usage:
```
show system-health monitor-list
```
- Example:
```
admin@sonic:~$ show system-health monitor-list
System services and devices monitor list
Name Status Type
-------------- -------- ----------
telemetry Not OK Process
orchagent Not OK Process
neighsyncd OK Process
vrfmgrd OK Process
dialout_client OK Process
zebra OK Process
rsyslog OK Process
snmpd OK Process
redis_server OK Process
intfmgrd OK Process
vxlanmgrd OK Process
lldpd_monitor OK Process
portsyncd OK Process
var-log OK Filesystem
lldpmgrd OK Process
syncd OK Process
sonic OK System
buffermgrd OK Process
portmgrd OK Process
staticd OK Process
bgpd OK Process
lldp_syncd OK Process
bgpcfgd OK Process
snmp_subagent OK Process
root-overlay OK Filesystem
fpmsyncd OK Process
sflowmgrd OK Process
vlanmgrd OK Process
nbrmgrd OK Process
PSU 2 OK PSU
psu_1_fan_1 OK Fan
psu_2_fan_1 OK Fan
fan11 OK Fan
fan10 OK Fan
fan12 OK Fan
ASIC OK ASIC
fan1 OK Fan
PSU 1 OK PSU
fan3 OK Fan
fan2 OK Fan
fan5 OK Fan
fan4 OK Fan
fan7 OK Fan
fan6 OK Fan
fan9 OK Fan
fan8 OK Fan
```
**show system-health detail**
This command displays the current status of 'Services' and 'Hardware' under monitoring.
If any of the elements under each of these two sections is 'Not OK' a proper message will appear under the relevant section.
In addition, displays a list of all current 'Services' and 'Hardware' being monitored and a list of ignored elements.
- Usage:
```
show system-health detail
```
- Example:
```
admin@sonic:~$ show system-health detail
System status summary
System status LED red
Services:
Status: Not OK
Not Running: 'telemetry', 'orchagent'
Hardware:
Status: OK
System services and devices monitor list
Name Status Type
-------------- -------- ----------
telemetry Not OK Process
orchagent Not OK Process
neighsyncd OK Process
vrfmgrd OK Process
dialout_client OK Process
zebra OK Process
rsyslog OK Process
snmpd OK Process
redis_server OK Process
intfmgrd OK Process
vxlanmgrd OK Process
lldpd_monitor OK Process
portsyncd OK Process
var-log OK Filesystem
lldpmgrd OK Process
syncd OK Process
sonic OK System
buffermgrd OK Process
portmgrd OK Process
staticd OK Process
bgpd OK Process
lldp_syncd OK Process
bgpcfgd OK Process
snmp_subagent OK Process
root-overlay OK Filesystem
fpmsyncd OK Process
sflowmgrd OK Process
vlanmgrd OK Process
nbrmgrd OK Process
PSU 2 OK PSU
psu_1_fan_1 OK Fan
psu_2_fan_1 OK Fan
fan11 OK Fan
fan10 OK Fan
fan12 OK Fan
ASIC OK ASIC
fan1 OK Fan
PSU 1 OK PSU
fan3 OK Fan
fan2 OK Fan
fan5 OK Fan
fan4 OK Fan
fan7 OK Fan
fan6 OK Fan
fan9 OK Fan
fan8 OK Fan
System services and devices ignore list
Name Status Type
----------- -------- ------
psu.voltage Ignored Device
```
Go Back To [Beginning of the document](#) or [Beginning of this section](#System-Health)
## VLAN & FDB
Expand Down
5 changes: 4 additions & 1 deletion show/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,15 @@
import mlnx
import utilities_common.cli as clicommon
import vlan
import system_health

from sonic_py_common import device_info
from swsssdk import ConfigDBConnector, SonicV2Connector
from tabulate import tabulate
from utilities_common.db import Db
import utilities_common.multi_asic as multi_asic_util


# Global Variables
PLATFORM_JSON = 'platform.json'
HWSKU_JSON = 'hwsku.json'
Expand Down Expand Up @@ -126,6 +129,7 @@ def cli(ctx):
cli.add_command(interfaces.interfaces)
cli.add_command(kube.kubernetes)
cli.add_command(vlan.vlan)
cli.add_command(system_health.system_health)

#
# 'vrf' command ("show vrf")
Expand Down Expand Up @@ -2431,6 +2435,5 @@ def tunnel():

click.echo(tabulate(table, header))


if __name__ == '__main__':
cli()
Loading

0 comments on commit a71c72b

Please sign in to comment.