Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[show] Add 'show' CLI for system-health feature #971

Merged
merged 25 commits into from
Oct 12, 2020
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d58195e
Add 'show' CLI for system-health feature
Jun 15, 2020
61d90ac
Add unit test for 'system-health' feature, add support for testing in…
Jun 30, 2020
0fc2d48
Merge branch 'master' of https://github.com/Azure/sonic-utilities int…
Jul 8, 2020
04d0672
Fix additional comments
Jul 13, 2020
c04dfed
Fix comments
Jul 14, 2020
d33454c
Update Command-Reference.md
shlomibitton Jul 15, 2020
28f9625
Fix LGTM alerts
Jul 15, 2020
393c3de
Fix comment
shlomibitton Jul 16, 2020
99aa1d5
Update Command-Reference.md
shlomibitton Jul 19, 2020
bfe5b2e
Update Command-Reference.md
shlomibitton Jul 26, 2020
9138ab4
Change 'summary' output and adapt test and reference to the new change
Jul 28, 2020
bd7c529
Update main.py
shlomibitton Jul 28, 2020
2d90fc3
Fix multiline output for expected output
Aug 2, 2020
112147f
keep output aligned
Aug 5, 2020
0e59ae4
Fix import for unit testing after community change
Aug 12, 2020
713051e
Merge branch 'master' of https://github.com/Azure/sonic-utilities int…
Aug 12, 2020
f65320e
Add clicommon for @cli.group after community change
Aug 12, 2020
d9be51f
Merge branch 'master' into shlomi_system_health_cli
shlomibitton Sep 7, 2020
eb6409f
Align changes in the feature to the CLI on commit
Sep 9, 2020
e0794ae
Update main.py
shlomibitton Sep 9, 2020
872030f
Move new group CLI into a separate file
Sep 10, 2020
bd57109
Merge branch 'master' into shlomi_system_health_cli
shlomibitton Sep 10, 2020
ee85d43
Organize imports per PEP8 standards
shlomibitton Sep 13, 2020
5be5fd6
Organize imports per PEP8 standards
shlomibitton Sep 14, 2020
5b1c981
Reformat docstring for readability
Sep 16, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 189 additions & 0 deletions doc/Command-Reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@
* [System State](#system-state)
* [Processes](#processes)
* [Services & Memory](#services--memory)
* [System-Health](#System-Health)
* [VLAN & FDB](#vlan--fdb)
* [VLAN](#vlan)
* [VLAN show commands](#vlan-show-commands)
Expand Down Expand Up @@ -5907,6 +5908,194 @@ NOTE: This command is not working. It crashes as follows. A bug ticket is opened

Go Back To [Beginning of the document](#) or [Beginning of this section](#System-State)

Go Back To [Beginning of the document](#) or [Beginning of this section](#System-Health)

### System-Health

These commands are used to monitor the system current running services and hardware state.

**show system-health summary**

This command displays the current status of 'Services' and 'Hardware' under monitoring.
If any of the elements under each of these two sections is 'Not OK' a proper message will appear under the relevant section.

- Usage:
```
show system-health summary
```

- Example:
```
admin@sonic:~$ show system-health summary
System status summary
---------------------
System status LED red

Services Not OK
telemetry is not Running
Hardware OK

```
```
admin@sonic:~$ show system-health summary
System status summary
---------------------
System status LED green

Services OK
Hardware OK

```

**show system-health monitor-list**

This command displays a list of all current 'Services' and 'Hardware' being monitored, their status and type.

- Usage:
```
show system-health monitor-list
```

- Example:
```
admin@sonic:~$ show system-health monitor-list
System services and devices monitor list
----------------------------------------

Name Status Type
-------------- -------- ----------
telemetry Not OK Process
neighsyncd OK Process
vrfmgrd OK Process
dialout_client OK Process
zebra OK Process
rsyslog OK Process
snmpd OK Process
redis_server OK Process
intfmgrd OK Process
orchagent OK Process
vxlanmgrd OK Process
lldpd_monitor OK Process
portsyncd OK Process
var-log OK Filesystem
lldpmgrd OK Process
syncd OK Process
sonic OK System
buffermgrd OK Process
portmgrd OK Process
staticd OK Process
bgpd OK Process
lldp_syncd OK Process
bgpcfgd OK Process
snmp_subagent OK Process
root-overlay OK Filesystem
fpmsyncd OK Process
sflowmgrd OK Process
vlanmgrd OK Process
nbrmgrd OK Process
PSU 2 OK PSU
psu_1_fan_1 OK Fan
psu_2_fan_1 OK Fan
fan11 OK Fan
fan10 OK Fan
fan12 OK Fan
ASIC OK ASIC
fan1 OK Fan
PSU 1 OK PSU
fan3 OK Fan
fan2 OK Fan
fan5 OK Fan
fan4 OK Fan
fan7 OK Fan
fan6 OK Fan
fan9 OK Fan
fan8 OK Fan

```

**show system-health detail**

This command displays the current status of 'Services' and 'Hardware' under monitoring.
If any of the elements under each of these two sections is 'Not OK' a proper message will appear under the relevant section.
In addition, displays a list of all current 'Services' and 'Hardware' being monitored and a list of ignored elements.

- Usage:
```
show system-health detail
```

- Example:
```
admin@sonic:~$ show system-health detail
System status summary
---------------------
System status LED red

Services Not OK
telemetry is not Running
shlomibitton marked this conversation as resolved.
Show resolved Hide resolved
Hardware OK

System services and devices monitor list
----------------------------------------

Name Status Type
-------------- -------- ----------
telemetry Not OK Process
neighsyncd OK Process
vrfmgrd OK Process
dialout_client OK Process
zebra OK Process
rsyslog OK Process
snmpd OK Process
redis_server OK Process
intfmgrd OK Process
orchagent OK Process
vxlanmgrd OK Process
lldpd_monitor OK Process
portsyncd OK Process
var-log OK Filesystem
lldpmgrd OK Process
syncd OK Process
sonic OK System
buffermgrd OK Process
portmgrd OK Process
staticd OK Process
bgpd OK Process
lldp_syncd OK Process
bgpcfgd OK Process
snmp_subagent OK Process
root-overlay OK Filesystem
fpmsyncd OK Process
sflowmgrd OK Process
vlanmgrd OK Process
nbrmgrd OK Process
PSU 2 OK PSU
psu_1_fan_1 OK Fan
psu_2_fan_1 OK Fan
fan11 OK Fan
fan10 OK Fan
fan12 OK Fan
ASIC OK ASIC
fan1 OK Fan
PSU 1 OK PSU
fan3 OK Fan
fan2 OK Fan
fan5 OK Fan
fan4 OK Fan
fan7 OK Fan
fan6 OK Fan
fan9 OK Fan
fan8 OK Fan

System services and devices ignore list
---------------------------------------

Name Status Type
----------- -------- ------
psu.voltage Ignore Device
shlomibitton marked this conversation as resolved.
Show resolved Hide resolved

```
Go Back To [Beginning of the document](#) or [Beginning of this section](#System-Health)

## VLAN & FDB

Expand Down
161 changes: 161 additions & 0 deletions show/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -3419,5 +3419,166 @@ def tunnel():

click.echo(tabulate(table, header))

#
# 'system-health' command ("show system-health")
#
@cli.group(name='system-health', cls=AliasedGroup)
def system_health():
"""SONiC command line - 'show system-health' command"""
return

@system_health.command()
def summary():
"""Show system-health summary information"""
# Mock the redis for unit test purposes #
try:
if os.environ["UTILITIES_UNIT_TESTING"] == "1":
modules_path = os.path.join(os.path.dirname(__file__), "..")
sys.path.insert(0, modules_path)
from system_health_test import MockerManager
from system_health_test import MockerChassis
HealthCheckerManager = MockerManager
Chassis = MockerChassis
except Exception:
# Normal run... #
if os.geteuid():
click.echo("Root privileges are required for this operation")
return
from health_checker.manager import HealthCheckerManager
from sonic_platform.chassis import Chassis

manager = HealthCheckerManager()
chassis = Chassis()
state, stat = manager.check(chassis)
if state == HealthCheckerManager.STATE_BOOTING:
click.echo("System is currently booting...")
return
if state == HealthCheckerManager.STATE_RUNNING:
chassis.initizalize_system_led()
led = chassis.get_status_led()
fault_counter = 0
click.echo("System status summary\n---------------------\nSystem status LED " + led + '\n')
for category, elements in stat.items():
for element in elements:
if elements[element]['status'] != "OK":
fault_counter += 1
if fault_counter == 1:
click.echo(category + "\tNot OK")
click.echo('\t\t' + elements[element]['message'])
if not fault_counter:
click.echo(category + "\tOK")
fault_counter = 0

@system_health.command()
def detail():
"""Show system-health detail information"""
# Mock the redis for unit test purposes #
try:
if os.environ["UTILITIES_UNIT_TESTING"] == "1":
modules_path = os.path.join(os.path.dirname(__file__), "..")
sys.path.insert(0, modules_path)
from system_health_test import MockerManager
from system_health_test import MockerChassis
HealthCheckerManager = MockerManager
Chassis = MockerChassis
except Exception:
# Normal run... #
if os.geteuid():
click.echo("Root privileges are required for this operation")
return
from health_checker.manager import HealthCheckerManager
from sonic_platform.chassis import Chassis

manager = HealthCheckerManager()
chassis = Chassis()
state, stat = manager.check(chassis)
if state == HealthCheckerManager.STATE_BOOTING:
click.echo("System is currently booting...")
return
if state == HealthCheckerManager.STATE_RUNNING:
#summary output
chassis.initizalize_system_led()
led = chassis.get_status_led()
fault_counter = 0
click.echo("System status summary\n---------------------\nSystem status LED " + led + '\n')
for category, elements in stat.items():
for element in elements:
if elements[element]['status'] != "OK":
fault_counter += 1
if fault_counter == 1:
click.echo(category + "\tNot OK")
click.echo('\t\t' + elements[element]['message'])
if not fault_counter:
click.echo(category + "\tOK")
fault_counter = 0

click.echo('\nSystem services and devices monitor list\n----------------------------------------\n')
header = ['Name', 'Status', 'Type']
table = []
for category, elements in stat.items():
for element in sorted(elements.items(), key=lambda (x, y): y['status']):
entry = []
entry.append(element[0])
entry.append(element[1]['status'])
entry.append(element[1]['type'])
table.append(entry)
click.echo(tabulate(table, header))
click.echo('\nSystem services and devices ignore list\n---------------------------------------\n')
table = []
if manager.config.ignore_services:
for element in manager.config.ignore_services:
entry = []
entry.append(element)
entry.append("Ignore")
shlomibitton marked this conversation as resolved.
Show resolved Hide resolved
entry.append("Service")
table.append(entry)
if manager.config.ignore_devices:
for element in manager.config.ignore_devices:
entry = []
entry.append(element)
entry.append("Ignored")
entry.append("Device")
table.append(entry)
click.echo(tabulate(table, header))

@system_health.command()
def monitor_list():
"""Show system-health monitored services and devices name list"""
# Mock the redis for unit test purposes #
try:
if os.environ["UTILITIES_UNIT_TESTING"] == "1":
modules_path = os.path.join(os.path.dirname(__file__), "..")
sys.path.insert(0, modules_path)
from system_health_test import MockerManager
from system_health_test import MockerChassis
HealthCheckerManager = MockerManager
Chassis = MockerChassis
except Exception:
# Normal run... #
if os.geteuid():
click.echo("Root privileges are required for this operation")
return
from health_checker.manager import HealthCheckerManager
from sonic_platform.chassis import Chassis

manager = HealthCheckerManager()
chassis = Chassis()
state, stat = manager.check(chassis)
if state == HealthCheckerManager.STATE_BOOTING:
click.echo("System is currently booting...")
return
if state == HealthCheckerManager.STATE_RUNNING:
click.echo('\nSystem services and devices monitor list\n----------------------------------------\n')
header = ['Name', 'Status', 'Type']
table = []
for category, elements in stat.items():
for element in sorted(elements.items(), key=lambda (x, y): y['status']):
entry = []
entry.append(element[0])
entry.append(element[1]['status'])
entry.append(element[1]['type'])
table.append(entry)
click.echo(tabulate(table, header))

if __name__ == '__main__':
cli()
Loading