From 8f88dee1f5458c0dc8cafe7b1a4554715061e84c Mon Sep 17 00:00:00 2001 From: Prasanth Kunjum Veettil Date: Mon, 24 May 2021 10:00:58 -0700 Subject: [PATCH 1/4] Link-flap error-disable: CLI and RESTCONF section update Signed-off-by: Prasanth Kunjum Veettil --- system/intf-dampening-HLD.md | 94 +++++++++++++++++++++++++++++++++++- 1 file changed, 92 insertions(+), 2 deletions(-) diff --git a/system/intf-dampening-HLD.md b/system/intf-dampening-HLD.md index 0252f4195384..36db6b42e528 100644 --- a/system/intf-dampening-HLD.md +++ b/system/intf-dampening-HLD.md @@ -19,6 +19,7 @@ Port Link Flap Error Disable | 0.1 | 04/14/2021 | Steven Lu | Initial version for requirements | | 0.2 | 04/20/2021 | Steven Lu | Change feature name to Port Link Flap Error Disable | | 0.3 | 05/11/2021 | Steven Lu | Add design details | +| 0.4 | 05/24/2021 | Prasanth Kunjum Veettil | Add CLI and RESTCONF details | # About this Manual This document provides general information about the Port Link Flap Error Disable feature implementation in SONiC. @@ -96,8 +97,56 @@ The Interface Error Disable feature exist in below modules and containers: # 2 Functionality ## 2.1 CLI - - +### 2.1.1 Configuration Commands +- *link-error-disable flap-threshold sampling-interval recovery-interval * +Example: +sonic(conf-if-Ethernet0)# link-error-disable flap-threshold 10 sampling-time 3 recovery-timeout 10 + +In this example, the values for the parameters are as follows: + +The flap-threshold is set at 10 times. This interval is the number of times that the port's link state goes from up to down and down to up before the recovery-timeout is activated. Enter a valid value range from 1-50. Default is 3. + + +The sampling-time is set to 3 seconds. This time period is the amount of time during which the specified flap-threshold can be crossed. If the flap-threshold is crossed during this sampling-time, port will be error-disabled. Enter a value between 1 and 65565 seconds. Default is 10. + + +The recovery-timeout is set to 10 seconds. This period of time is the amount of time the port remains disabled (down) before it becomes enabled. Entering 0 indicates that the port will stay down until an administrative override occurs. Enter a value between 0 and 65565 seconds. Default is 30. + + +This config command can be executed on a range of interfaces as well. Example: +``` +sonic(conf-if-range-eth**)# link-error-disable flap-threshold 10 sampling-time 3 recovery-timeout 10 +``` +### 2.1.2 Show Commands + +"show errdisable recovery" is an existing CLI command. This output will be updated to list the ports in recovery period. +- *show errdisable recovery* +This command displays the err-disable recovery features. Link-flap is one among the features where in err-disable recovery option can be enabled. +Example: +``` +sonic#show errdisable recovery +Err-Disable Reason Timer Status +----------------------------------- +udld Disabled +bpduguard Disabled +xcvrd Disabled +link-flap Enabled + +Interfaces that will be enabled at the next timeout: +Interface Errdisable reason Time left(sec) +----------------------------------------------------- +Ethernet0 link-flap 24 +``` +- *show errdisable link-flap* +Status and configuration details of link-flap error-disable is shown with this command. +Example: +``` +sonic#show errdisable link-flap +Interface Flap-threshold Sampling-time Recovery-timeout Status +--------------------------------------------------------------------------- +Ethernet0 10 3 30 Errdisabled +Ethernet4 10 3 60 Not-errdisabled +``` # 2.2 Functional Description # 3 Design @@ -172,6 +221,47 @@ Can be reference to YANG if applicable. Also cover gNMI here. Refer to Functionality ### 3.6.3 REST API Support +POST "/restconf/data/openconfig-errdisable-ext:errdisable-port/port=/link-flap" +Request body: +{ + "openconfig-errdisable-ext:link-flap": { + "config": { + "error-disable": , + "flap-threshold": , + "sampling-interval": , + "recovery-interval": + } + } +} + +Example: +``` +POST "/restconf/data/openconfig-errdisable-ext:errdisable-port/port=Ethernet24/link-flap" +{ + "openconfig-errdisable-ext:link-flap": { + "config": { + "error-disable": "on", + "flap-threshold": 10, + "sampling-interval": 20, + "recovery-interval": 300 + } + } +} +``` + +GET "/restconf/data/openconfig-errdisable-ext:errdisable-port/port=/link-flap/state" + +Example: +``` +GET "/restconf/data/openconfig-errdisable-ext:errdisable-port/port=Ethernet24/link-flap/state" +Response data: +{ + "openconfig-errdisable-ext:state": { + "time-left": 14 + } +} +``` + ### 3.6.4 Service and Docker Management No new service ot docker introduced From becc15d801bda4126dca31ca9aa6f4323573ee22 Mon Sep 17 00:00:00 2001 From: Steven LU <45245946+stevenlu99@users.noreply.github.com> Date: Wed, 26 May 2021 11:10:41 -0700 Subject: [PATCH 2/4] Update intf-dampening-HLD.md --- system/intf-dampening-HLD.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/system/intf-dampening-HLD.md b/system/intf-dampening-HLD.md index 36db6b42e528..b217bfa4c7fb 100644 --- a/system/intf-dampening-HLD.md +++ b/system/intf-dampening-HLD.md @@ -40,7 +40,7 @@ When Port Link Flap Error Disable is enabled, the system monitors the number of The sampling time or window (the time during which the specified toggle threshold can occur before the wait period is activated) is triggered when the first "up to down" transition occurs. -If the port link state toggles from up to down for a specified number of times within a specified period, the interface is physically disabled for the specified wait period. Once the wait period expires, the port link state is re-enabled. However, if the wait period is set to zero (0) seconds, the port link state will remain disabled until it is manually re-enabled. +If the port link state toggles from up to down for a specified number of times within a specified period, the interface is physically disabled for the specified wait period. Once the wait period expires, the port link state is re-enabled. However, if the wait period is set to zero (0) seconds, the port link state will remain disabled until it is manually disabled and re-enabled or Port Link Flap Error Disable is disabled on this port. ## 1.1 Requirements From e90b219cc75f8931cafaecdac48d6fa738b2efe3 Mon Sep 17 00:00:00 2001 From: Prasanth Kunjum Veettil Date: Mon, 31 May 2021 04:54:50 -0700 Subject: [PATCH 3/4] CLI section review comments updates. Signed-off-by: Prasanth Kunjum Veettil --- system/intf-dampening-HLD.md | 43 +++++++++++++++++++----------------- 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/system/intf-dampening-HLD.md b/system/intf-dampening-HLD.md index b217bfa4c7fb..7a7fb4e3aa57 100644 --- a/system/intf-dampening-HLD.md +++ b/system/intf-dampening-HLD.md @@ -18,7 +18,7 @@ Port Link Flap Error Disable |:---:|:-----------:|:------------------:|------------------------------------------------------| | 0.1 | 04/14/2021 | Steven Lu | Initial version for requirements | | 0.2 | 04/20/2021 | Steven Lu | Change feature name to Port Link Flap Error Disable | -| 0.3 | 05/11/2021 | Steven Lu | Add design details | +| 0.3 | 05/11/2021 | Steven Lu | Add design details | | 0.4 | 05/24/2021 | Prasanth Kunjum Veettil | Add CLI and RESTCONF details | # About this Manual @@ -31,7 +31,7 @@ This document describes the high level design of Port Link Flap Error Disable fe ### Table 1: Abbreviations | **Term** | **Meaning** | |--------------------------|-------------------------------------| -| XYZ | Term description | +| xcvrd | Transceiver Daemon | # 1 Feature Overview The Port Link Flap Error Disable feature uses an exponential decay mechanism to prevent excessive interface flapping events from adversely affecting routing protocols and routing tables in the network. Suppressing port state change events to protect the system resources. @@ -48,27 +48,27 @@ System shall be able to suppress interfaces state change events to protect syste User shall be able to enable or disable the feature on individual interfaces and globally. The feature must be disabled on all interfaces by default. The feature shall be supported on physical interfaces. -There must be two sets of configuration parameters (sample-interval, waiting-period, and toggling-frequency) a per-interface set and a global set. If both global and per-interface are configured, the per-interface values are used only for given interfaces. Global values are used for all other physical interfaces. +There must be two sets of configuration parameters (sample-interval, recovery-interval, and flap-threshold) a per-interface set and a global set. If both global and per-interface are configured, the per-interface values are used only for given interfaces. Global values are used for all other physical interfaces. If no values are specified by user, a default set of parameters are applied to all interfaces. User shall be able to save configuration parameters (both global and per-interface). The configuration parameters (both global and per-interface) must be preserved across device reboot. ### 1.1.1 Functional Requirements Port Link Flap Error Disable shall use below parameters to supress and protect system. -- toggle-frequency +- flap-threshold Specifies the number of times a port link state goes from up to down before the wait period is activated. The value ranges from 1 through 50. - sample-interval -Specifies the amount of time, in seconds, during which the specified toggle threshold can occur before the wait period is activated. The default value is 0 and indicates that the time is forever. The value ranges from 0 through 65535. -- waiting-period -Specifies the amount of time in seconds, for which the port remains disabled (down) before it becomes enabled. The value ranges from 0 through 65535. A value of 0 indicates that the port will stay down until an administrative override occurs. +Specifies the amount of time, in seconds, during which the specified toggle threshold can occur before the wait period is activated. The value ranges from 1 through 65535. +- recovery-interval +Specifies the amount of time in seconds, for which the port remains disabled (down) before it becomes enabled. The value ranges from 0 through 65534. A value of 0 indicates that the port will stay down until an administrative override occurs. ### 1.1.2 Configuration and Management Requirements - Port Link Flap Error Disable feature default is OFF on all physical interfaces and port-channels - When Port Link Flap Error Disable is enabled, use below default values: + flap-threshold: 3 sample-interval: 10 - toggle-frequency: 3 - waiting-period: 30 -- User shall be able to specify different sample-interval, toggle-frequency and waiting-period on a physical interface + recovery-interval: 300 +- User shall be able to specify different sample-interval, flap-threshold and recovery-interval on a physical interface - User shall be able to display current Port Link Flap Error Disable confiuration values. - User shall be able to display current interface status if it was surpresed by Port Link Flap Error Disable - User shall be able to display Link-Down-Reason if a port is disabled by Port Link Flap Error Disable feature @@ -107,10 +107,10 @@ In this example, the values for the parameters are as follows: The flap-threshold is set at 10 times. This interval is the number of times that the port's link state goes from up to down and down to up before the recovery-timeout is activated. Enter a valid value range from 1-50. Default is 3. -The sampling-time is set to 3 seconds. This time period is the amount of time during which the specified flap-threshold can be crossed. If the flap-threshold is crossed during this sampling-time, port will be error-disabled. Enter a value between 1 and 65565 seconds. Default is 10. +The sampling-time is set to 3 seconds. This time period is the amount of time during which the specified flap-threshold can be crossed. If the flap-threshold is crossed during this sampling-time, port will be error-disabled. Enter a value between 1 and 65535 seconds. Default is 10. -The recovery-timeout is set to 10 seconds. This period of time is the amount of time the port remains disabled (down) before it becomes enabled. Entering 0 indicates that the port will stay down until an administrative override occurs. Enter a value between 0 and 65565 seconds. Default is 30. +The recovery-timeout is set to 10 seconds. This period of time is the amount of time the port remains disabled (down) before it becomes enabled. Entering 0 indicates that the port will stay down until an administrative override occurs. Enter a value between 0 and 65534 seconds. Default is 300. This config command can be executed on a range of interfaces as well. Example: @@ -139,13 +139,16 @@ Ethernet0 link-flap 24 ``` - *show errdisable link-flap* Status and configuration details of link-flap error-disable is shown with this command. +The ports which does not have non-default error disable configurations will not be displayed in the output. + Example: ``` sonic#show errdisable link-flap Interface Flap-threshold Sampling-time Recovery-timeout Status --------------------------------------------------------------------------- -Ethernet0 10 3 30 Errdisabled -Ethernet4 10 3 60 Not-errdisabled +Ethernet0 10 3 30 Errdisabled +Ethernet4 10 3 60 Not-errdisabled +Ethernet8 5 10 300 Off ``` # 2.2 Functional Description @@ -157,9 +160,9 @@ For individual physcial interface "PORT|Ethernet124": { "error-disable": "on|off", - "toggle-frequency": "3", + "flap-threshold": "3", "sampling-interval": "5", - "wait-time-period": "10" + "recovery-interval": "10" }, ### 3.2.2 APP DB @@ -167,9 +170,9 @@ For individual physcial interface "PORT_TABLE|Ethernet124": { "error-disable": "on|off", - "toggle-frequency": "3", + "flap-threshold": "3", "sampling-interval": "5", - "wait-time-period": "10" + "recovery-interval": "10" }, To surpress interface: @@ -188,9 +191,9 @@ Record number of link flaps within sampling-interval: "supress-time": time, "error-disable": "on|off", - "toggle-frequency": "3", + "flap-threshold": "3", "sampling-interval": "5", - "wait-time-period": "10" + "recovery-interval": "10" } ### 3.2.4 ASIC DB From d8e7bce1da0a4b4d14285d39ae9d14da116c0726 Mon Sep 17 00:00:00 2001 From: Prasanth Kunjum Veettil Date: Fri, 4 Jun 2021 08:22:12 -0700 Subject: [PATCH 4/4] Added the global command to enable link-flap error-disable feature. Signed-off-by: Prasanth Kunjum Veettil --- system/intf-dampening-HLD.md | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/system/intf-dampening-HLD.md b/system/intf-dampening-HLD.md index 7a7fb4e3aa57..62cc9c901ee9 100644 --- a/system/intf-dampening-HLD.md +++ b/system/intf-dampening-HLD.md @@ -100,8 +100,9 @@ The Interface Error Disable feature exist in below modules and containers: ### 2.1.1 Configuration Commands - *link-error-disable flap-threshold sampling-interval recovery-interval * Example: +``` sonic(conf-if-Ethernet0)# link-error-disable flap-threshold 10 sampling-time 3 recovery-timeout 10 - +``` In this example, the values for the parameters are as follows: The flap-threshold is set at 10 times. This interval is the number of times that the port's link state goes from up to down and down to up before the recovery-timeout is activated. Enter a valid value range from 1-50. Default is 3. @@ -117,6 +118,26 @@ This config command can be executed on a range of interfaces as well. Example: ``` sonic(conf-if-range-eth**)# link-error-disable flap-threshold 10 sampling-time 3 recovery-timeout 10 ``` +Example for disabling link-flap error-disable on a port: +``` +sonic(conf-if-Ethernet0)#no link-error-disable +``` +This command shall be supported on interface range as well. Example: +``` +sonic(conf-if-range-eth**)#no link-error-disable +``` + +- *[no] errdisable recovery cause link-flap* +This is a global command to enable the link-flap error-disable feature. This is an existing command tree and link-flap CLI node is added. + +Link-flap feature has to be enabled globally to start this feature even when port level configurations are already present. +When the user executes 'no errdisable recovery cause link-flap' to disable the feature at system level, the current monitoring for link-flaps on all the ports will be stopped. + +``` +sonic(config)# errdisable recovery cause link-flap +sonic(config)# no errdisable recovery cause link-flap +``` + ### 2.1.2 Show Commands "show errdisable recovery" is an existing CLI command. This output will be updated to list the ports in recovery period.