-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Critical Resource Monitoring High Level Design
Rev | Date | Author | Change Description |
---|---|---|---|
0.1 | Volodymyr Samotiy | Initial version |
This document provides general information about the Critical Resource Monitoring feature implementation in SONiC.
This document describes the high level design of the Critical Resource Monitoring feature.
Definitions/Abbreviation | Description |
---|---|
CRM | Critical Resource Monitoring |
API | Application Programmable Interface |
SAI | Swich Abstraction Interface |
Detailed description of the Critical Resource Monitoring feature requirements is here: CRM Requirements.
This section describes the SONiC requirements for Critical Resource Monitoring (CRM) feature. CRM should monitor utilization of ASIC resources by polling SAI attributes.
At a high level the following should be supported:
- CRM should log a message if there are any resources that exceed defined threshold value.
- CLI commands to check current usage and availability of monitored resources.
- IPv4 routes: query currently used and available number of entries
- IPv6 routes: query currently used and available number of entries
- IPv4 Nexthops: query currently used available number of entries
- IPv6 Nexthops: query currently used and available number of entries
- IPv4 Neighbors: query currently used and available number of entries
- IPv6 Neighbors: query currently used and available number of entries
- Next-hop group member: query currently used and available number of entries
- Next-hop group objects: query currently used and available number of entries
-
ACL: query currently used and available number of entries
- ACL Table
- ACL Group
- ACL Entries
- ACL Counters/Statistics
- FDB entries: query currently used and available entries
Monitoring process should periodically poll SAI counters for all required resources, then it should check whether retrieved values exceed defined thresholds and log appropriate SYSLOG message.
- User should be able to configure LOW and HIGH thresholds.
- User should be able to configure thresholds in the following formats:
- percentage
- actual used count
- actual free count
- CRM feature should log "SYSLOG" message if there are any resources that exceed LOW or HIGH threshold.
- CRM should support two types of "SYSLOG" messages:
- EXCEEDED for high threshold.
- CLEAR for low threshold.
- "SYSLOG" messages should be in the following format:
"<Date/Time> WARNING <Process name>: THRESHOLD_EXCEEDED for <TH_TYPE> <%> Used count <value> free count <value>"
"<Date/Time> NOTICE <Process name>: THRESHOLD_CLEAR for <TH_TYPE> <%> Used count <value> free count <value>"
<TH_TYPE> = <TH_PERCENTAGE, TH_USED, TH_FREE>
- Default polling interval should be set to 5 minutes.
- Default HIGH threshold should be set to 85%.
- Default LOW threshold should be set to 70%.
- CRM feature should suppress SYSLOG messages after printing for 10 times.
- User should be able to query usage and availability of monitored resources.
- User should be able to configure thresholds values.
New "CRM" table should be added to ConfigDB in order to store CRM related configuration: polling interval and LOW/HIGH threshold values.
; Defines schema for CRM configuration attributes
key = CRM ; CRM configuration
; field = value
CRM_POLLING_INTERVAL = 1*4DIGIT ; CRM polling interval
CRM_IPV4_ROUTE_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'ipv4 route' resource
CRM_IPV6_ROUTE_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'ipv6 route' resource
CRM_IPV4_NEXTHOP_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'ipv4 next-hop' resource
CRM_IPV6_NEXTHOP__THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'ipv6 next-hop' resource
CRM_IPV4_NEIGHBOR_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'ipv4 neighbor' resource
CRM_IPV6_NEIGHBOR_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'ipv6 neighbor' resource
CRM_NEXTHOP_GROUP_MEMBER_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'next-hop group member' resource
CRM_NEXTHOP_GROUP_OBJECT_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'next-hop group object' resource
CRM_ACL_TABLE_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'acl table' resource
CRM_ACL_GROUP_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'acl group' resource
CRM_ACL_ENTRY_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'acl entry' resource
CRM_ACL_COUNTER_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'acl counter' resource
CRM_FDB_ENTRY_THRESHOLD_TYPE = "percentage" / "used" / "free" ; CRM threshold type for 'fdb entry' resource
CRM_IPV4_ROUTE_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'ipv4 route' resource
CRM_IPV6_ROUTE_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'ipv6 route' resource
CRM_IPV4_NEXTHOP_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'ipv4 next-hop' resource
CRM_IPV6_NEXTHOP_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'ipv6 next-hop' resource
CRM_IPV4_NEIGHBOR_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'ipv4 neighbor' resource
CRM_IPV6_NEIGHBOR_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'ipv6 neighbor' resource
CRM_NEXTHOP_GROUP_MEMBER_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'next-hop group member' resource
CRM_NEXTHOP_GROUP_OBJECT_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'next-hop group object' resource
CRM_ACL_TABLE_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'acl table' resource
CRM_ACL_GROUP_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'acl group' resource
CRM_ACL_ENTRY_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'acl entry' resource
CRM_ACL_COUNTER_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'acl counter' resource
CRM_FDB_ENTRY_LOW_THRESHOLD = 1*4DIGIT ; CRM low threshold for 'fdb entry' resource
CRM_IPV4_ROUTE_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'ipv4 route' resource
CRM_IPV6_ROUTE_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'ipv6 route' resource
CRM_IPV4_NEXTHOP_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'ipv4 next-hop' resource
CRM_IPV6_NEXTHOP_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'ipv6 next-hop' resource
CRM_IPV4_NEIGHBOR_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'ipv4 neighbor' resource
CRM_IPV6_NEIGHBOR_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'ipv6 neighbor' resource
CRM_NEXTHOP_GROUP_MEMBER_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'next-hop group member' resource
CRM_NEXTHOP_GROUP_OBJECT_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'next-hop group object' resource
CRM_ACL_TABLE_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'acl table' resource
CRM_ACL_GROUP_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'acl group' resource
CRM_ACL_ENTRY_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'acl entry' resource
CRM_ACL_COUNTER_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'acl counter' resource
CRM_FDB_ENTRY_HIGH_THRESHOLD = 1*4DIGIT ; CRM high threshold for 'fdb entry' resource
Two new tables should be added to the CountersDB in order to represent currently used and available entries for the CRM resources.
This table should store all global CRM stats.
; Defines schema for CRM counters attributes
key = CRM_STATS ; CRM stats entry
; field = value
CRM_STATS_IPV4_ROUTE_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv4 route' resource
CRM_STATS_IPV6_ROUTE_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv6 route' resource
CRM_STATS_IPV4_NEXTHOP_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv4 next-hop' resource
CRM_STATS_IPV6_NEXTHOP_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv6 next-hop' resource
CRM_STATS_IPV4_NEIGHBOR_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv4 neighbor' resource
CRM_STATS_IPV6_NEIGHBOR_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv6 neighbor' resource
CRM_STATS_NEXTHOP_GROUP_MEMBER_AVAILABLE = 1*20DIGIT ; number of available entries for 'next-hop group member' resource
CRM_STATS_NEXTHOP_GROUP_OBJECT_AVAILABLE = 1*20DIGIT ; number of available entries for 'next-hop group object' resource
CRM_STATS_ACL_TABLE_AVAILABLE = 1*20DIGIT ; number of available entries for 'acl table' resource
CRM_STATS_ACL_GROUP_AVAILABLE = 1*20DIGIT ; number of available entries for 'acl group' resource
CRM_STATS_FDB_ENTRY_AVAILABLE = 1*20DIGIT ; number of available entries for 'fdb entry' resource
CRM_STATS_IPV4_ROUTE_AVAILABLE = 1*20DIGIT ; number of available entries for 'ipv4 route' resource
CRM_STATS_IPV6_ROUTE_USED = 1*20DIGIT ; number of used entries for 'ipv6 route' resource
CRM_STATS_IPV4_NEXTHOP_USED = 1*20DIGIT ; number of used entries for 'ipv4 next-hop' resource
CRM_STATS_IPV6_NEXTHOP_USED = 1*20DIGIT ; number of used entries for 'ipv6 next-hop' resource
CRM_STATS_IPV4_NEIGHBOR_USED = 1*20DIGIT ; number of available entries for 'ipv4 neighbor' resource
CRM_STATS_IPV6_NEIGHBOR_USED = 1*20DIGIT ; number of available entries for 'ipv6 neighbor' resource
CRM_STATS_NEXTHOP_GROUP_MEMBER_USED = 1*20DIGIT ; number of used entries for 'next-hop group member' resource
CRM_STATS_NEXTHOP_GROUP_OBJECT_USED = 1*20DIGIT ; number of used entries for 'next-hop group object' resource
CRM_STATS_ACL_TABLE_USED = 1*20DIGIT ; number of used entries for 'acl table' resource
CRM_STATS_ACL_GROUP_USED = 1*20DIGIT ; number of used entries for 'acl group' resource
CRM_STATS_FDB_ENTRY_USED = 1*20DIGIT ; number of used entries for 'fdb entry' resource
This table should store all "per ACL group" CRM stats .
; Defines schema for CRM counters attributes
key = CRM_ACL_GROUP_STATS:OID ; CRM ACL group stats entry
; field = value
CRM_STATS_ACL_ENTRY_AVAILABLE = 1*20DIGIT ; number of available entries for 'acl entry' resource
CRM_STATS_ACL_COUNTER_AVAILABLE = 1*20DIGIT ; number of available entries for 'acl counter' resource
CRM_STATS_ACL_ENTRY_USED = 1*20DIGIT ; number of used entries for 'acl entry' resource
CRM_STATS_ACL_COUNTER_USED = 1*20DIGIT ; number of used entries for 'acl counter' resource
New "CrmOrch" class should be implemented and it should run new CRM thread for all monitoring logic.
CRM thread should check whether some threshold is exceeded and log appropriate (CLEAR/EXCEEDED) SYSLOG message. Also number of already logged EXCEEDED messages should be tracked and once it reached the pre-defined value all CRM SYSLOG messages should be suppressed. When CLEAR message is logged then counter for number of logged messages should be cleared.
CLI show command should be able to display currently USED and AVAILABLE number of entries, but SAI provides API to query the current AVAILABLE entries. So, OrchAgent (appropriate agent for each resource) should track respective entries that are programmed and update appropriate counter in "CrmOrch" cache. Also, "CrmOrch" should provide public API in order to allow other agents update local cache and then CRM thread should periodically update CountersDB from the cache.
Shown below table represents all the SAI attributes which should be used to get required CRM counters.
CRM resource | SAI attribute |
---|---|
IPv4 routes | SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY |
IPv6 routes | SAI_SWITCH_ATTR_AVAILABLE_IPV6_ROUTE_ENTRY |
IPv4 next-hops | SAI_SWITCH_ATTR_AVAILABLE_IPV4_NEXTHOP_ENTRY |
IPv6 next-hops | SAI_SWITCH_ATTR_AVAILABLE_IPV6_NEXTHOP_ENTRY |
IPv4 neighbors | SAI_SWITCH_ATTR_AVAILABLE_IPV4_NEIGHBOR_ENTRY |
IPv6 neighbors | SAI_SWITCH_ATTR_AVAILABLE_IPV6_NEIGHBOR_ENTRY |
Next-hop group members | SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_MEMBER_ENTRY |
Next-hop group objects | SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_ENTRY |
ACL tables | SAI_SWITCH_ATTR_AVAILABLE_ACL_TABLE |
ACL groups | SAI_SWITCH_ATTR_AVAILABLE_ACL_TABLE_GROUP |
ACL entries | SAI_ACL_TABLE_GROUP_MEMBER_ATTR_AVAILABLE_ACL_ENTRY |
ACL counters | SAI_ACL_TABLE_GROUP_MEMBER_ATTR_AVAILABLE_ACL_COUNTER |
FDB entries | SAI_SWITCH_ATTR_AVAILABLE_FDB_ENTRY |
New CRM utility script should be implement in "sonic-utilities" in order to configure and display all CRM related information.
crm
Usage: crm [OPTIONS] COMMAND [ARGS]...
Utility to operate with CRM configuration and resources.
Options:
--help Show this message and exit.
Commands:
config Set CRM configuration.
show Show CRM information.
polling interval <value>
thresholds all type [percentage|used|count]
thresholds all [low|high] <value>
thresholds [ipv4|ipv6] route type [percentage|used|count]
thresholds [ipv4|ipv6] route [low|high] <value>
thresholds [ipv4|ipv6] neighbor type [percentage|used|count]
thresholds [ipv4|ipv6] neighbor [low|high] <value>
thresholds [ipv4|ipv6] nexthop type [percentage|used|count]
thresholds [ipv4|ipv6] nexthop [low|high] <value>
thresholds nexthop group [member|object] type [percentage|used|count]
thresholds nexthop group [member|object] [low|high] <value>
thresholds acl [table|group] type [percentage|used|count]
thresholds acl [table|group] [low|high] <value>
thresholds acl group [entry|counter] type [percentage|used|count]
thresholds acl group [entry|counter] [low|high] <value>
thresholds fdb type [percentage|used|count]
thresholds fdb [low|high] <value>
summary
[resources|thresholds] all
[resources|thresholds] [ipv4|ipv6] [route|neighbor|nexthop]
[resources|thresholds] nexthop group [member|object]
[resources|thresholds] acl [table|group]
[resources|thresholds] acl group [entry|counter]
[resources|thresholds] fdb
Config command should be extended in order to add "crm" alias to the CRM utility.
Usage: config [OPTIONS] COMMAND [ARGS]...
SONiC command line - 'config' command
Options:
--help Show this message and exit.
Commands:
...
crm CRM related configuration.
Show command should be extended in order to add "crm" alias to the CRM utility.
show
Usage: show [OPTIONS] COMMAND [ARGS]...
SONiC command line - 'show' command
Options:
-?, -h, --help Show this message and exit.
Commands:
...
crm Show CRM related information
-
For Users
-
For Developers
-
Subgroups/Working Groups
-
Presentations
-
Join Us