-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PFC Watchdog Design
Rev | Date | Author | Change Description |
---|---|---|---|
0.1 | Marian Pritsak | Initial version | |
0.2 | Sihui Han | Update design with FLEX_COUNTER |
This document provides general information about the PFC Watchdog feature implementation in SONiC. PFC watchdog is designed to detect and mitigate PFC storm received for each port. PFC pause frames are used in lossless Ethernet to pause the link partner from sending packets. Such back-pressure mechanism could propagate to the whole network and cause the network stop forwarding traffic. PFC watchdog is to detect abnormal back-pressure caused by receiving excessive PFC pause frames, and mitigate such situation by disabling PFC caused pause temporarily. PFC watchdog has three functional blocks, i.e. detection, mitigation and restoration.
PFC Watchdog only works for lossless queues. It requires QoS (PFC) configuration on the device.
This document describes the high level design of the PFC WD feature.
Definitions/Abbreviation | Description |
---|---|
PFC | Priority Flow Control |
WD | Watchdog |
ACL | Access Control List |
PFC watchdog is designed to detect and mitigate PFC storm received for each port. PFC pause frames is used in lossless Ethernet to pause the link partner from sending packets. Such back-pressure mechanism could propagate to the whole network and cause the network stop forwarding traffic. PFC watchdog is to detect abnormal back-pressure caused by receiving excessive PFC pause frames, and mitigate such situation by disable PFC caused pause temporarily. PFC watchdog has three function blocks, i.e. detection, mitigation and restoration.
- Storm Detection: Detect the PFC pause storm if it keeps happening for certain time for a non-drop class.
- Storm Restoration: Restore the Storm by dropping all packets of the non-drop class.
The PFC storm detection is for a switch to detect a lossless queue is receiving PFC storm from its link partner and the queue is in a paused state over detection_time amount of time. Even when the queue is empty, as soon as the duration for a queue in paused state exceeds detection_time amount of time, the watchdog should detect such storm. detection_time is a port level parameter. The detection needs to enable/disable at per port level. Such detection mechanism is only available for lossless queue. By default, the detection mechanism is disabled. detection_time should be on the scale of hundred milliseconds.
Once PFC storm is detected on a queue, the watchdog can then have two actions, drop and forward at per queue level. When drop action is selected, following actions need to be implemented.
- All existing packets in the output queue are discarded
- All subsequent packets destine to the output queue are discarded
- all subsequent packets received by the corresponding priority group of this queue are discarded including the pause frames received. As a result, the switch should not generate any pause frame to its neighbor due to congestion of this output queue.
When forward action is selected, following actions need to be implemented.
- the queue no longer honor the PFC frames received. All packets destined to the queue are forwarded as well as those packets that were in the queue.
The default action is drop.
The watchdog should continue count the PFC frames received on the queue. If there is no PFC frame received over restoration_time period. Then, re-enable the PFC on the queue and stop dropping packets if the previous mitigation was drop. restoration_time is port level parameter. restoration_time should be on the scale of hundred milliseconds.
; Defines PFC WD configuration on a port
key = PFC_WD_TABLE:ifname ; configuration for watchdog on port
; field = value
detection_time = 1*3DIGIT ; pause storm happen for detection_time in msecs
; after which queue is considered to be in PFC
; storm, and watchdog action is triggered.
restoration_time = 1*3DIGIT ; restoration_time in msecs after which
; queue in PFC storm state is
; restored if no pause frames were received.
action = "drop"/"forward" ; action taken by watchdog in case
; of PFC storm detection.
Orchagent can have different strategies or criteria for storm detection, which might require different counters. Hence it needs to tell syncd which counters to poll. To achieve this, we apply FLEX_COUNTER DB (FLEX_COUNTER_TABLE and FLEX_COUNTER_GROUP_TABLE)
; Defines schema for PFC WD flex counter group
key = "FLEX_COUNTER_GROUP_TABLE:PFC_WD" ; WD group entry
; field = value
QUEUE_PLUGIN_LIST = 1*64VCHAR ; list of ',' separated redis lua scripts plugins
POLL_INTERVAL = 1*3DIGIT ; counter polling interval
FLEX_COUNTER_STATUS = 1*8VCHAR ; 'enable'/'disable'
; to indicate whether counter poll is enabled
; Defines schema for PFC WD flex counter entry
key = "FLEX_COUNTER_TABLE:PFC_WD:portID" ; WD port entry
; field = value
PORT_COUNTER_ID_LIST = 1*64VCHAR ; list of ',' separated port counter IDs
; Defines schema for PFC WD flex counter entry
key = "FLEX_COUNTER_TABLE:PFC_WD:queueID" ; WD queue entry
; field = value
QUEUE_COUNTER_ID_LIST = 1*64VCHAR ; list of ',' separated queue counter IDs
QUEUE_ATTR_ID_LIST = 1*64VCHAR ; list of ',' separated queue attribute IDs
; Defines schema for queue counters that are updated by PFC WD
key = "COUNTERS":""queueId" ; WD queue entry
; field = value
PFC_WD_QUEUE_STATS_STORM_DETECTED = 1*4DIGIT ; deadlock counter
PFC_WD_QUEUE_STATS_STORM_RESTORED = 1*4DIGIT ; restoration counter
PFC_WD_QUEUE_STATS_TX_PACKETS = 1*20DIGIT ; total packets transmitted during storm
PFC_WD_QUEUE_STATS_TX_DROPPED_PACKETS = 1*20DIGIT ; total Tx packets dropped due to storm
PFC_WD_QUEUE_STATS_RX_PACKETS = 1*20DIGIT ; total packets received during storm
PFC_WD_QUEUE_STATS_RX_DROPPED_PACKETS_LAST = 1*20DIGIT ; total Rx packets dropped due to storm
PFC_WD_QUEUE_STATS_TX_PACKETS_LAST = 1*20DIGIT ; packets transmitted during last storm
PFC_WD_QUEUE_STATS_TX_DROPPED_PACKETS_LAST = 1*20DIGIT ; Tx packets dropped due to last storm
PFC_WD_QUEUE_STATS_RX_PACKETS_LAST = 1*20DIGIT ; packets received during last storm
PFC_WD_QUEUE_STATS_RX_DROPPED_PACKETS_LAST = 1*20DIGIT ; Rx packets dropped due to last storm
As different vendors support different counters, there must be a way to let every ASIC vendor decide how to tell if queue is stormed. For instance, a possible criteria could be one based on pause duration counter:
(SAI_QUEUE_STAT_CURR_OCCUPANCY_BYTES > 0 && SAI_QUEUE_STAT_PACKETS.current - SAI_QUEUE_STAT_PACKETS.last == 0 && SAI_PORT_STAT_PFC_[queue]_RX_PKT.current - SAI_PORT_STAT_PFC_[queue]_RX_PKT > 0)
||
(SAI_QUEUE_STAT_CURR_OCCUPANCY_BYTES == 0 && SAI_QUEUE_STAT_PACKETS.current - SAI_QUEUE_STAT_PACKETS.last == 0 && SAI_PORT_STAT_PFC_[queue]_RX_PAUSE_DURATION.current == SAI_PORT_STAT_PFC_[queue]_RX_PAUSE_DURATION.last + t0 * delta)
// delta is a percentage of time that queue had to be paused (e. g. 0.9)
Those criteria are coded in .lua script for Redis DB and called periodically based on configured timers by orchagent.
PfcWdActionHandler class is defined to provide common interface to handle PFC storm. Different platforms can inherit from it to implement different handlers.
There is a set of external events that can fully or partially invalidate current state of watchdog.
In case of resetting counters PFC WD is in undefined state and should skip one polling interval. SONiC does not use API to reset counters on ASIC, so this event is ignored.
In case if port's state changes to DOWN, all queues marked as stormed, are restored.
Queue configuration must be going through PFC WD proxy in order to make it update its internal state. If PFC is disabled on a queue that was marked as stormed, queue will be restored.
In order to provide user an ability to set/view PFC WD configuration/statistics, pfcwatchdog CLI tool should provide following functionality:
- Show watchdog configuration (per port).
pfcwd show config
- Show watchdog statistics (per port/queue).
pfcwd show stats
- Enable watchdog on a specified port(s).
pfcwd start --action drop ports Ethernet116 detection-time 300 --restoration-time 300
- Disable watchdog on a specified port(s).
pfcwd stop <interfaceName>
The workflow of PFC Watchdog is shown as above figure.
(1) Users config PFC Watchdog configuration via CLI command. Users need to specify the detection_time, restoration_time and port name in the CLI and the command can be found in Section 2.7. Configurations are written into Config DB PFC_WD table.
(2) PFC Watchdog orchagent subscribe Config DB PFC_WD table. Once the configuration is written in Config DB, PFC Watchdog Orchagent get the change, and start the functionality on intended ports.
(3) PFC storm events are detected via counters. Syncd is the module to poll counters from ASIC. To start PFC Watchdog functionality on certain port, orchagent writes the port/queue counters needed to detect storm start/stop to Flex Counter DB. Then it will wait for notification in case storm start/stop event happens.
(4)(5) Syncd subscribes FLEX Counter DB and will periodically query the counters from ASIC and write those to counter DB.
(6) Lua scripts are embedded with Counter DB. The Lua script runs periodically to check whether storm start/stop with available counters. If the storm start/stop event happens, it will send notification to Orchagent.
Once Orchagent received storm start/stop notification from lua script, it will start/stop PFCWDActionHandler accordingly.
All port counters are stored in Counters DB. In order to have all required information from hardware, watchdog needs to extend it by making syncd read values from SAI: it subscribes queues for polling in WD database. syncd will call apropriate .lua script upon every counters update to check if queues changed their state and notify orchagent upon any change.
Orchagent is subscribed to syncd notifications of queue deadlock, and is supposed to apply configured action (drop/forward) upon receiving notification that queue became locked, or restore queue when opposite notification type is received.
Following mitigation handler disables PFC on marked queue and sets its reserved buffer to 0.
Another option is following mitigation handler disable PFC on marked queue, and bind the port to the ACL tables that has ingress/egress ACL rule to drop packets of that particular PG.
Following restoration handler returns reserved buffer value to initial value and enables PFC on unmarked queue.
If choosing the ACL option, unbind the port from ACL tables and enables PFC on unmarked queue.
Following mitigation handler disables PFC on stormed queue. It will no more respect pause frames from link partner, and forward all packets.
Following restore handler will reenable PFC on a queue so that it will continue to work in lossless mode as configured by user.
-
For Users
-
For Developers
-
Subgroups/Working Groups
-
Presentations
-
Join Us