Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support health monitor system #15

Conversation

bratashX
Copy link

@bratashX bratashX commented Dec 13, 2021

Signed-off-by: Petro Bratash petrox.bratash@intel.com

How I did it

  1. Add system_health_monitoring_config.json file for:
    - Montara
    - Newport
    - Mavericks

  2. Add stub for system LED API

How to verify it

Run:

show system-health detail 
show system-health monitor-list
show system-health summary
redis-cli -n 6 hgetall SYSTEM_HEALTH_INFO

Description

Command output on Montara:

admin@sonic:~$ show ver

SONiC Software Version: SONiC.202012.57796-dirty-20211208.192459
Distribution: Debian 10.11
Kernel: 4.19.0-12-2-amd64
Build commit: 6a6512246
Build date: Wed Dec  8 19:33:41 UTC 2021
Built by: AzDevOps@sonic-build-workers-000YSK

Platform: x86_64-accton_wedge100bf_32x-r0
HwSKU: montara
ASIC: barefoot
ASIC Count: 1
Serial Number: AH43050469
Uptime: 15:23:30 up 17 min,  1 user,  load average: 1.20, 0.86, 0.52

Docker images:
REPOSITORY                    TAG                                  IMAGE ID            SIZE
docker-syncd-bfn              202012.57796-dirty-20211208.192459   fede4c01f951        1.11GB
docker-syncd-bfn              latest                               fede4c01f951        1.11GB
docker-snmp                   202012.57796-dirty-20211208.192459   d06705cbe325        414MB
docker-snmp                   latest                               d06705cbe325        414MB
docker-teamd                  202012.57796-dirty-20211208.192459   81e641ff44cb        383MB
docker-teamd                  latest                               81e641ff44cb        383MB
docker-router-advertiser      202012.57796-dirty-20211208.192459   e26007336971        372MB
docker-router-advertiser      latest                               e26007336971        372MB
docker-lldp                   202012.57796-dirty-20211208.192459   afbedd64feb8        412MB
docker-lldp                   latest                               afbedd64feb8        412MB
docker-database               202012.57796-dirty-20211208.192459   352e191ff8f8        372MB
docker-database               latest                               352e191ff8f8        372MB
docker-sonic-mgmt-framework   202012.57796-dirty-20211208.192459   3186d11b2ad9        785MB
docker-sonic-mgmt-framework   latest                               3186d11b2ad9        785MB
docker-orchagent              202012.57796-dirty-20211208.192459   00d81af85844        401MB
docker-orchagent              latest                               00d81af85844        401MB
docker-nat                    202012.57796-dirty-20211208.192459   8dc59876ede7        386MB
docker-nat                    latest                               8dc59876ede7        386MB
docker-dhcp-relay             202012.57796-dirty-20211208.192459   7d4219356683        386MB
docker-dhcp-relay             latest                               7d4219356683        386MB
docker-sonic-telemetry        202012.57796-dirty-20211208.192459   fc2e7371831c        462MB
docker-sonic-telemetry        latest                               fc2e7371831c        462MB
docker-mux                    202012.57796-dirty-20211208.192459   b24c77a6a5a3        425MB
docker-mux                    latest                               b24c77a6a5a3        425MB
docker-fpm-frr                202012.57796-dirty-20211208.192459   89aec92b0fd2        401MB
docker-fpm-frr                latest                               89aec92b0fd2        401MB
docker-sflow                  202012.57796-dirty-20211208.192459   a6fed1651c36        384MB
docker-sflow                  latest                               a6fed1651c36        384MB
docker-platform-monitor       202012.57796-dirty-20211208.192459   65fe04e13439        554MB
docker-platform-monitor       latest                               65fe04e13439        554MB

admin@sonic:~$ sudo show system-health detail 
chassis.set_status_led is not implemented
System status summary

  System status LED  UNknown
  Services:
    Status: OK
  Hardware:
    Status: Not OK
    Reasons: Invalid temperature data for PSU 2, temperature=N/A, threshold=N/A
             PSU 1 is out of power
             FAN-5R speed is out of range, speed=32.0, range=[0.0,0.0]
             FAN-5F speed is out of range, speed=32.0, range=[0.0,0.0]
             FAN-4R speed is out of range, speed=32.0, range=[0.0,0.0]
             FAN-4F speed is out of range, speed=32.0, range=[0.0,0.0]
             FAN-3R speed is out of range, speed=32.0, range=[0.0,0.0]
             FAN-3F speed is out of range, speed=32.0, range=[0.0,0.0]
             FAN-2R speed is out of range, speed=32.0, range=[0.0,0.0]
             FAN-2F speed is out of range, speed=32.0, range=[0.0,0.0]
             FAN-1R speed is out of range, speed=32.0, range=[0.0,0.0]
             FAN-1F speed is out of range, speed=32.0, range=[0.0,0.0]
             Failed to get ASIC temperature

System services and devices monitor list

Name                        Status    Type
--------------------------  --------  ----------
sonic                       OK        System
rsyslog                     OK        Process
root-overlay                OK        Filesystem
var-log                     OK        Filesystem
routeCheck                  OK        Program
diskCheck                   OK        Program
container_checker           OK        Program
vnetRouteCheck              OK        Program
container_memory_telemetry  OK        Program
snmp:snmpd                  OK        Process
snmp:snmp-subagent          OK        Process
telemetry:telemetry         OK        Process
telemetry:dialout           OK        Process
lldp:lldpd                  OK        Process
lldp:lldp-syncd             OK        Process
lldp:lldpmgrd               OK        Process
syncd:syncd                 OK        Process
teamd:teammgrd              OK        Process
teamd:teamsyncd             OK        Process
teamd:tlm_teamd             OK        Process
swss:orchagent              OK        Process
swss:portsyncd              OK        Process
swss:neighsyncd             OK        Process
swss:fdbsyncd               OK        Process
swss:vlanmgrd               OK        Process
swss:intfmgrd               OK        Process
swss:portmgrd               OK        Process
swss:buffermgrd             OK        Process
swss:vrfmgrd                OK        Process
swss:nbrmgrd                OK        Process
swss:vxlanmgrd              OK        Process
swss:coppmgrd               OK        Process
swss:tunnelmgrd             OK        Process
bgp:zebra                   OK        Process
bgp:staticd                 OK        Process
bgp:bgpd                    OK        Process
bgp:fpmsyncd                OK        Process
bgp:bgpcfgd                 OK        Process
database:redis              OK        Process
ASIC                        Not OK    ASIC
FAN-1F                      Not OK    Fan
FAN-1R                      Not OK    Fan
FAN-2F                      Not OK    Fan
FAN-2R                      Not OK    Fan
FAN-3F                      Not OK    Fan
FAN-3R                      Not OK    Fan
FAN-4F                      Not OK    Fan
FAN-4R                      Not OK    Fan
FAN-5F                      Not OK    Fan
FAN-5R                      Not OK    Fan
PSU 1                       Not OK    PSU
PSU 2                       Not OK    PSU

System services and devices ignore list

Name    Status    Type
------  --------  ------

Command output on Newport:

admin@igk-7-dut:~$ show version

SONiC Software Version: SONiC.master.50092-dirty-20211110.180007
Distribution: Debian 10.11
Kernel: 4.19.0-12-2-amd64
Build commit: e2bffdf9e
Build date: Wed Nov 10 18:10:02 UTC 2021
Built by: AzDevOps@sonic-build-workers-000VMH

Platform: x86_64-accton_as9516_32d-r0
HwSKU: newport
ASIC: barefoot
ASIC Count: 1
Serial Number: 9516D2042028
Model Number: NP5ZZ8632007A
Hardware Revision: N/A
Uptime: 16:11:58 up 25 min,  3 users,  load average: 1.51, 1.77, 1.59

Docker images:
REPOSITORY                    TAG                                  IMAGE ID            SIZE
docker-syncd-bfn              latest                               207914a8634b        1.59GB
docker-syncd-bfn              master.50092-dirty-20211110.180007   207914a8634b        1.59GB
docker-dhcp-relay             latest                               9841365a8c4b        435MB
docker-sflow                  latest                               fc0ce6260063        435MB
docker-sflow                  master.50092-dirty-20211110.180007   fc0ce6260063        435MB
docker-teamd                  latest                               244873742b22        434MB
docker-teamd                  master.50092-dirty-20211110.180007   244873742b22        434MB
docker-nat                    latest                               0373b4c7d6a0        437MB
docker-nat                    master.50092-dirty-20211110.180007   0373b4c7d6a0        437MB
docker-platform-monitor       latest                               3efa4c39eb51        686MB
docker-platform-monitor       master.50092-dirty-20211110.180007   3efa4c39eb51        686MB
docker-lldp                   latest                               0e1299587bf8        462MB
docker-lldp                   master.50092-dirty-20211110.180007   0e1299587bf8        462MB
docker-snmp                   latest                               53cd0eba2d56        464MB
docker-snmp                   master.50092-dirty-20211110.180007   53cd0eba2d56        464MB
docker-database               latest                               19c44c3a9701        422MB
docker-database               master.50092-dirty-20211110.180007   19c44c3a9701        422MB
docker-sonic-mgmt-framework   latest                               d82636774f6d        577MB
docker-sonic-mgmt-framework   master.50092-dirty-20211110.180007   d82636774f6d        577MB
docker-router-advertiser      latest                               708cc6ea6670        422MB
docker-router-advertiser      master.50092-dirty-20211110.180007   708cc6ea6670        422MB
docker-orchagent              latest                               65a33f3a7d48        453MB
docker-orchagent              master.50092-dirty-20211110.180007   65a33f3a7d48        453MB
docker-macsec                 latest                               9df759f3b99c        437MB
docker-macsec                 master.50092-dirty-20211110.180007   9df759f3b99c        437MB
docker-sonic-telemetry        latest                               e7de80788aed        510MB
docker-sonic-telemetry        master.50092-dirty-20211110.180007   e7de80788aed        510MB
docker-fpm-frr                latest                               da7ca20dc042        452MB
docker-fpm-frr                master.50092-dirty-20211110.180007   da7ca20dc042        452MB
docker-mux                    latest                               13fff8a2d68e        474MB
docker-mux                    master.50092-dirty-20211110.180007   13fff8a2d68e        474MB

admin@igk-7-dut:~$ sudo show system-health detail
chassis.set_status_led is not implemented
System status summary

  System status LED  UNknown
  Services:
    Status: OK
  Hardware:
    Status: Not OK
    Reasons: PSU 2 is out of power
             Invalid temperature data for PSU 1, temperature=N/A, threshold=N/A
             Failed to get fan information

System services and devices monitor list

Name                        Status    Type
--------------------------  --------  ----------
igk-7-dut                   OK        System
rsyslog                     OK        Process
root-overlay                OK        Filesystem
var-log                     OK        Filesystem
routeCheck                  OK        Program
diskCheck                   OK        Program
container_checker           OK        Program
vnetRouteCheck              OK        Program
container_memory_telemetry  OK        Program
Fan                         Not OK    Fan
PSU 1                       Not OK    PSU
PSU 2                       Not OK    PSU

System services and devices ignore list

Name    Status    Type
------  --------  ------

Signed-off-by: Petro Bratash <petrox.bratash@intel.com>
@akokhan akokhan removed the request for review from lguohan December 13, 2021 12:58
@akokhan akokhan merged commit 4b7dba8 into akokhan:newport_platform_api Dec 13, 2021
akokhan pushed a commit that referenced this pull request Dec 17, 2021
Signed-off-by: Petro Bratash <petrox.bratash@intel.com>
akokhan pushed a commit that referenced this pull request Dec 21, 2021
Signed-off-by: Petro Bratash <petrox.bratash@intel.com>
akokhan pushed a commit that referenced this pull request Dec 22, 2021
Signed-off-by: Petro Bratash <petrox.bratash@intel.com>
akokhan pushed a commit that referenced this pull request Jan 20, 2022
* [BFN] Updated platform APIs impl

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* Extended BFN platform SFP APIs implementation

* Update sfp.py

* [BFN] Extended SFP platform plugin implementation

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* [BFN] Extended Fans platform plugin implementation

* [BFN] divided classes Fan and  FanDrawer into 2 files

* Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>

What I did
	Add get_model() function
	Add get_low_critical_threshold() function
	Change __get(...) function.
How I did it
	Differnece from previous implementation of __get(...) function is return real value or -9999.9 if value is not provided by thrift API

* Add get_presence() function and revised __get() function

Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>

* [BFN] Updated PSU platform APIs impl

Signed-off-by: Dmytro Lytvynenko <dmytrox.lytvynenko@intel.com>

* Added BFN PSU cache (#9)

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* [BFN]  Fans and Fantray platform APIs update (#7)

* [BFN] Updated SFP platform APIs (#10)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>

* [BFN] Updated platform API for thermal (#8)

* Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>

* Revert "[BFN]  Fans and Fantray platform APIs update (#7)" (#11)

This reverts commit c62a733.

* Add support health monitor system (#15)

Signed-off-by: Petro Bratash <petrox.bratash@intel.com>

* Update chassis.py

* [BFN] Updated FANs and FAN Tray platform API (#14)

* Fix fix_alignment (#17)

Signed-off-by: Petro Bratash <petrox.bratash@intel.com>

* [BFN] Improvement show environment (#16)

* Added PSU temperature skip into platform.json (#18)

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* Do not skip psud on Newport

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* [BFN] fix fan status from Not OK to Ok (#19)

* [BFN] Updated SFP platform plugin (#13)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>

* [DPB] Fix typo for Ethernet0 2x200G[100G,40G] breakout mode (#21)

Signed-off-by: Mykola Gerasymenko <mykolax.gerasymenko@intel.com>

* [barefoot] Tmp fix vendor_rev (#22)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>

* Fixed python issues in sonic_platform/fan_drawer.py

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* Updated fan_drawer.py

* Fixing trailing white spaces in fan_drawer.py

* [BFN] Fix thrift for SFPs API

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>

* In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* [Newport] Thermal manager  (#23)

* Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>

* Revert "In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue"

This reverts commit 1e73127.

* Removed 'controllable' options from platform.json to fix factory default config generation

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

* Update thermal_manager.py

* Migrated SFP plugin to sonic_xcvr API (#30)

Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

Co-authored-by: KostiantynYarovyiBf <kostiantynx.yarovyi@intel.com>
Co-authored-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>
Co-authored-by: Dmytro Lytvynenko <dmytrox.lytvynenko@intel.com>
Co-authored-by: Volodymyr Boiko <volodymyrx.boiko@intel.com>
Co-authored-by: Petro Bratash <petrox.bratash@intel.com>
Co-authored-by: Mykola Gerasymenko <mykolax.gerasymenko@intel.com>
akokhan pushed a commit that referenced this pull request Jan 27, 2022
[sonic-linkmgrd][master] submodule update

Commits added:
0c23756 Jing Zhang      2022-01-19      Linkmgrd subscribing State DB route event  (#13)
12b9951 Longxiang Lyu   2021-12-13      Add TLV support to ICMP payload (#11)
3eedda3 Longxiang Lyu   2022-01-06      Add missing intermediate states (#16)
8da4982 Ying Xie        2022-01-04      [linkmgrd] update README, set coding style guidance (#15)
a897cf8 Longxiang Lyu   2021-12-13      Improve PR template (#16)
6fec701 Jing Zhang      2021-12-06      Add pull request template for linkmgrd repo (#9)


signed-off-by: Jing Zhang zhangjing@microsoft.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants